Re: [PATCH 2/3] powerpc/mm/hash: Avoid multiple HPT resize-ups on memory hotplug

2021-04-18 Thread David Gibson
On Thu, Apr 08, 2021 at 11:51:36PM -0300, Leonardo Bras wrote:
> Hello David, thanks for the feedback!
> 
> On Mon, 2021-03-22 at 18:55 +1100, David Gibson wrote:
> > > +void hash_memory_batch_expand_prepare(unsigned long newsize)
> > > +{
> > > + /*
> > > +  * Resizing-up HPT should never fail, but there are some cases system 
> > > starts with higher
> > > +  * SHIFT than required, and we go through the funny case of resizing 
> > > HPT down while
> > > +  * adding memory
> > > +  */
> > > +
> > > + while (resize_hpt_for_hotplug(newsize, false) == -ENOSPC) {
> > > + newsize *= 2;
> > > + pr_warn("Hash collision while resizing HPT\n");
> > 
> > This unbounded increase in newsize makes me nervous - we should be
> > bounded by the current size of the HPT at least.  In practice we
> > should be fine, since the resize should always succeed by the time we
> > reach our current HPT size, but that's far from obvious from this
> > point in the code.
> 
> Sure, I will add bounds in v2.
> 
> > 
> > And... you're doubling newsize which is a value which might not be a
> > power of 2.  I'm wondering if there's an edge case where this could
> > actually cause us to skip the current size and erroneously resize to
> > one bigger than we have currently.
> 
> I also though that at the start, but it seems quite reliable.
> Before using this value, htab_shift_for_mem_size() will always round it
> to next power of 2. 
> Ex.
> Any value between 0b0101 and 0b1000 will be rounded to 0b1000 for shift
> calculation. If we multiply it by 2 (same as << 1), we have that
> anything between 0b01010 and 0b1 will be rounded to 0b1. 

Ah, good point.

> This works just fine as long as we are multiplying. 
> Division may have the behavior you expect, as 0b0101 >> 1 would become
> 0b010 and skip a shift.
>   
> > > +void memory_batch_expand_prepare(unsigned long newsize)
> > 
> > This wrapper doesn't seem useful.
> 
> Yeah, it does little, but I can't just jump into hash_* functions
> directly from hotplug-memory.c, without even knowing if it's using hash
> pagetables. (in case the suggestion would be test for disable_radix
> inside hash_memory_batch*)
> 
> > 
> > > +{
> > > + if (!radix_enabled())
> > > + hash_memory_batch_expand_prepare(newsize);
> > > +}
> > >  #endif /* CONFIG_MEMORY_HOTPLUG */
> > >  
> > > 
> > > + memory_batch_expand_prepare(memblock_phys_mem_size() +
> > > +  drmem_info->n_lmbs * drmem_lmb_size());
> > 
> > This doesn't look right.  memory_add_by_index() is adding a *single*
> > LMB, I think using drmem_info->n_lmbs here means you're counting this
> > as adding again as much memory as you already have hotplugged.
> 
> Yeah, my mistake. This makes sense.
> I will change it to something like 
> memblock_phys_mem_size() + drmem_lmb_size()
> 
> > > 
> > > + memory_batch_expand_prepare(memblock_phys_mem_size() + lmbs_to_add * 
> > > drmem_lmb_size());
> > > +
> > >   for_each_drmem_lmb_in_range(lmb, start_lmb, end_lmb) {
> > >   if (lmb->flags & DRCONF_MEM_ASSIGNED)
> > >   continue;
> > 
> > I don't see memory_batch_expand_prepare() suppressing any existing HPT
> > resizes.  Won't this just resize to the right size for the full add,
> > then resize several times again as we perform the add?  Or.. I guess
> > that will be suppressed by patch 1/3. 
> 
> Correct.
> 
> >  That's seems kinda fragile, though.
> 
> What do you mean by fragile here?
> What would you suggest doing different?
> 
> Best regards,
> Leonardo Bras
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH 2/3] powerpc/mm/hash: Avoid multiple HPT resize-ups on memory hotplug

2021-04-08 Thread Leonardo Bras
Hello David, thanks for the feedback!

On Mon, 2021-03-22 at 18:55 +1100, David Gibson wrote:
> > +void hash_memory_batch_expand_prepare(unsigned long newsize)
> > +{
> > +   /*
> > +* Resizing-up HPT should never fail, but there are some cases system 
> > starts with higher
> > +* SHIFT than required, and we go through the funny case of resizing 
> > HPT down while
> > +* adding memory
> > +*/
> > +
> > +   while (resize_hpt_for_hotplug(newsize, false) == -ENOSPC) {
> > +   newsize *= 2;
> > +   pr_warn("Hash collision while resizing HPT\n");
> 
> This unbounded increase in newsize makes me nervous - we should be
> bounded by the current size of the HPT at least.  In practice we
> should be fine, since the resize should always succeed by the time we
> reach our current HPT size, but that's far from obvious from this
> point in the code.

Sure, I will add bounds in v2.

> 
> And... you're doubling newsize which is a value which might not be a
> power of 2.  I'm wondering if there's an edge case where this could
> actually cause us to skip the current size and erroneously resize to
> one bigger than we have currently.

I also though that at the start, but it seems quite reliable.
Before using this value, htab_shift_for_mem_size() will always round it
to next power of 2. 
Ex.
Any value between 0b0101 and 0b1000 will be rounded to 0b1000 for shift
calculation. If we multiply it by 2 (same as << 1), we have that
anything between 0b01010 and 0b1 will be rounded to 0b1. 

This works just fine as long as we are multiplying. 
Division may have the behavior you expect, as 0b0101 >> 1 would become
0b010 and skip a shift.

> > +void memory_batch_expand_prepare(unsigned long newsize)
> 
> This wrapper doesn't seem useful.

Yeah, it does little, but I can't just jump into hash_* functions
directly from hotplug-memory.c, without even knowing if it's using hash
pagetables. (in case the suggestion would be test for disable_radix
inside hash_memory_batch*)

> 
> > +{
> > +   if (!radix_enabled())
> > +   hash_memory_batch_expand_prepare(newsize);
> > +}
> >  #endif /* CONFIG_MEMORY_HOTPLUG */
> >  
> > 
> > +   memory_batch_expand_prepare(memblock_phys_mem_size() +
> > +drmem_info->n_lmbs * drmem_lmb_size());
> 
> This doesn't look right.  memory_add_by_index() is adding a *single*
> LMB, I think using drmem_info->n_lmbs here means you're counting this
> as adding again as much memory as you already have hotplugged.

Yeah, my mistake. This makes sense.
I will change it to something like 
memblock_phys_mem_size() + drmem_lmb_size()

> > 
> > +   memory_batch_expand_prepare(memblock_phys_mem_size() + lmbs_to_add * 
> > drmem_lmb_size());
> > +
> >     for_each_drmem_lmb_in_range(lmb, start_lmb, end_lmb) {
> >     if (lmb->flags & DRCONF_MEM_ASSIGNED)
> >     continue;
> 
> I don't see memory_batch_expand_prepare() suppressing any existing HPT
> resizes.  Won't this just resize to the right size for the full add,
> then resize several times again as we perform the add?  Or.. I guess
> that will be suppressed by patch 1/3. 

Correct.

>  That's seems kinda fragile, though.

What do you mean by fragile here?
What would you suggest doing different?

Best regards,
Leonardo Bras



Re: [PATCH 2/3] powerpc/mm/hash: Avoid multiple HPT resize-ups on memory hotplug

2021-03-22 Thread David Gibson
On Fri, Mar 12, 2021 at 04:29:40AM -0300, Leonardo Bras wrote:
> Every time a memory hotplug happens, and the memory limit crosses a 2^n
> value, it may be necessary to perform HPT resizing-up, which can take
> some time (over 100ms in my tests).
> 
> It usually is not an issue, but it can take some time if a lot of memory
> is added to a guest with little starting memory:
> Adding 256G to a 2GB guest, for example will require 8 HPT resizes.
> 
> Perform an HPT resize before memory hotplug, updating HPT to its
> final size (considering a successful hotplug), taking the number of
> HPT resizes to at most one per memory hotplug action.
> 
> Signed-off-by: Leonardo Bras 
> ---
>  arch/powerpc/include/asm/book3s/64/hash.h   |  2 ++
>  arch/powerpc/include/asm/sparsemem.h|  2 ++
>  arch/powerpc/mm/book3s64/hash_utils.c   | 14 ++
>  arch/powerpc/mm/book3s64/pgtable.c  |  6 ++
>  arch/powerpc/platforms/pseries/hotplug-memory.c |  6 ++
>  5 files changed, 30 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/book3s/64/hash.h 
> b/arch/powerpc/include/asm/book3s/64/hash.h
> index d959b0195ad9..843b0a178590 100644
> --- a/arch/powerpc/include/asm/book3s/64/hash.h
> +++ b/arch/powerpc/include/asm/book3s/64/hash.h
> @@ -255,6 +255,8 @@ int hash__create_section_mapping(unsigned long start, 
> unsigned long end,
>int nid, pgprot_t prot);
>  int hash__remove_section_mapping(unsigned long start, unsigned long end);
>  
> +void hash_memory_batch_expand_prepare(unsigned long newsize);
> +
>  #endif /* !__ASSEMBLY__ */
>  #endif /* __KERNEL__ */
>  #endif /* _ASM_POWERPC_BOOK3S_64_HASH_H */
> diff --git a/arch/powerpc/include/asm/sparsemem.h 
> b/arch/powerpc/include/asm/sparsemem.h
> index d072866842e4..16b5f5300c84 100644
> --- a/arch/powerpc/include/asm/sparsemem.h
> +++ b/arch/powerpc/include/asm/sparsemem.h
> @@ -17,6 +17,8 @@ extern int remove_section_mapping(unsigned long start, 
> unsigned long end);
>  extern int memory_add_physaddr_to_nid(u64 start);
>  #define memory_add_physaddr_to_nid memory_add_physaddr_to_nid
>  
> +void memory_batch_expand_prepare(unsigned long newsize);
> +
>  #ifdef CONFIG_NUMA
>  extern int hot_add_scn_to_nid(unsigned long scn_addr);
>  #else
> diff --git a/arch/powerpc/mm/book3s64/hash_utils.c 
> b/arch/powerpc/mm/book3s64/hash_utils.c
> index cfb3ec164f56..1f6aa0bf27e7 100644
> --- a/arch/powerpc/mm/book3s64/hash_utils.c
> +++ b/arch/powerpc/mm/book3s64/hash_utils.c
> @@ -858,6 +858,20 @@ int hash__remove_section_mapping(unsigned long start, 
> unsigned long end)
>  
>   return rc;
>  }
> +
> +void hash_memory_batch_expand_prepare(unsigned long newsize)
> +{
> + /*
> +  * Resizing-up HPT should never fail, but there are some cases system 
> starts with higher
> +  * SHIFT than required, and we go through the funny case of resizing 
> HPT down while
> +  * adding memory
> +  */
> +
> + while (resize_hpt_for_hotplug(newsize, false) == -ENOSPC) {
> + newsize *= 2;
> + pr_warn("Hash collision while resizing HPT\n");

This unbounded increase in newsize makes me nervous - we should be
bounded by the current size of the HPT at least.  In practice we
should be fine, since the resize should always succeed by the time we
reach our current HPT size, but that's far from obvious from this
point in the code.

And... you're doubling newsize which is a value which might not be a
power of 2.  I'm wondering if there's an edge case where this could
actually cause us to skip the current size and erroneously resize to
one bigger than we have currently.

> + }
> +}
>  #endif /* CONFIG_MEMORY_HOTPLUG */
>  
>  static void __init hash_init_partition_table(phys_addr_t hash_table,
> diff --git a/arch/powerpc/mm/book3s64/pgtable.c 
> b/arch/powerpc/mm/book3s64/pgtable.c
> index 5b3a3bae21aa..f1cd8af0f67f 100644
> --- a/arch/powerpc/mm/book3s64/pgtable.c
> +++ b/arch/powerpc/mm/book3s64/pgtable.c
> @@ -193,6 +193,12 @@ int __meminit remove_section_mapping(unsigned long 
> start, unsigned long end)
>  
>   return hash__remove_section_mapping(start, end);
>  }
> +
> +void memory_batch_expand_prepare(unsigned long newsize)

This wrapper doesn't seem useful.

> +{
> + if (!radix_enabled())
> + hash_memory_batch_expand_prepare(newsize);
> +}
>  #endif /* CONFIG_MEMORY_HOTPLUG */
>  
>  void __init mmu_partition_table_init(void)
> diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c 
> b/arch/powerpc/platforms/pseries/hotplug-memory.c
> index 8377f1f7c78e..353c71249214 100644
> --- a/arch/powerpc/platforms/pseries/hotplug-memory.c
> +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
> @@ -671,6 +671,8 @@ static int dlpar_memory_add_by_count(u32 lmbs_to_add)
>   if (lmbs_available < lmbs_to_add)
>   return -EINVAL;
>  
> + memory_batch_expand_prepare(memblock_phys_mem_size() + lmbs_to_add * 
> drmem_lmb_size());
> +
>   

[PATCH 2/3] powerpc/mm/hash: Avoid multiple HPT resize-ups on memory hotplug

2021-03-11 Thread Leonardo Bras
Every time a memory hotplug happens, and the memory limit crosses a 2^n
value, it may be necessary to perform HPT resizing-up, which can take
some time (over 100ms in my tests).

It usually is not an issue, but it can take some time if a lot of memory
is added to a guest with little starting memory:
Adding 256G to a 2GB guest, for example will require 8 HPT resizes.

Perform an HPT resize before memory hotplug, updating HPT to its
final size (considering a successful hotplug), taking the number of
HPT resizes to at most one per memory hotplug action.

Signed-off-by: Leonardo Bras 
---
 arch/powerpc/include/asm/book3s/64/hash.h   |  2 ++
 arch/powerpc/include/asm/sparsemem.h|  2 ++
 arch/powerpc/mm/book3s64/hash_utils.c   | 14 ++
 arch/powerpc/mm/book3s64/pgtable.c  |  6 ++
 arch/powerpc/platforms/pseries/hotplug-memory.c |  6 ++
 5 files changed, 30 insertions(+)

diff --git a/arch/powerpc/include/asm/book3s/64/hash.h 
b/arch/powerpc/include/asm/book3s/64/hash.h
index d959b0195ad9..843b0a178590 100644
--- a/arch/powerpc/include/asm/book3s/64/hash.h
+++ b/arch/powerpc/include/asm/book3s/64/hash.h
@@ -255,6 +255,8 @@ int hash__create_section_mapping(unsigned long start, 
unsigned long end,
 int nid, pgprot_t prot);
 int hash__remove_section_mapping(unsigned long start, unsigned long end);
 
+void hash_memory_batch_expand_prepare(unsigned long newsize);
+
 #endif /* !__ASSEMBLY__ */
 #endif /* __KERNEL__ */
 #endif /* _ASM_POWERPC_BOOK3S_64_HASH_H */
diff --git a/arch/powerpc/include/asm/sparsemem.h 
b/arch/powerpc/include/asm/sparsemem.h
index d072866842e4..16b5f5300c84 100644
--- a/arch/powerpc/include/asm/sparsemem.h
+++ b/arch/powerpc/include/asm/sparsemem.h
@@ -17,6 +17,8 @@ extern int remove_section_mapping(unsigned long start, 
unsigned long end);
 extern int memory_add_physaddr_to_nid(u64 start);
 #define memory_add_physaddr_to_nid memory_add_physaddr_to_nid
 
+void memory_batch_expand_prepare(unsigned long newsize);
+
 #ifdef CONFIG_NUMA
 extern int hot_add_scn_to_nid(unsigned long scn_addr);
 #else
diff --git a/arch/powerpc/mm/book3s64/hash_utils.c 
b/arch/powerpc/mm/book3s64/hash_utils.c
index cfb3ec164f56..1f6aa0bf27e7 100644
--- a/arch/powerpc/mm/book3s64/hash_utils.c
+++ b/arch/powerpc/mm/book3s64/hash_utils.c
@@ -858,6 +858,20 @@ int hash__remove_section_mapping(unsigned long start, 
unsigned long end)
 
return rc;
 }
+
+void hash_memory_batch_expand_prepare(unsigned long newsize)
+{
+   /*
+* Resizing-up HPT should never fail, but there are some cases system 
starts with higher
+* SHIFT than required, and we go through the funny case of resizing 
HPT down while
+* adding memory
+*/
+
+   while (resize_hpt_for_hotplug(newsize, false) == -ENOSPC) {
+   newsize *= 2;
+   pr_warn("Hash collision while resizing HPT\n");
+   }
+}
 #endif /* CONFIG_MEMORY_HOTPLUG */
 
 static void __init hash_init_partition_table(phys_addr_t hash_table,
diff --git a/arch/powerpc/mm/book3s64/pgtable.c 
b/arch/powerpc/mm/book3s64/pgtable.c
index 5b3a3bae21aa..f1cd8af0f67f 100644
--- a/arch/powerpc/mm/book3s64/pgtable.c
+++ b/arch/powerpc/mm/book3s64/pgtable.c
@@ -193,6 +193,12 @@ int __meminit remove_section_mapping(unsigned long start, 
unsigned long end)
 
return hash__remove_section_mapping(start, end);
 }
+
+void memory_batch_expand_prepare(unsigned long newsize)
+{
+   if (!radix_enabled())
+   hash_memory_batch_expand_prepare(newsize);
+}
 #endif /* CONFIG_MEMORY_HOTPLUG */
 
 void __init mmu_partition_table_init(void)
diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c 
b/arch/powerpc/platforms/pseries/hotplug-memory.c
index 8377f1f7c78e..353c71249214 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -671,6 +671,8 @@ static int dlpar_memory_add_by_count(u32 lmbs_to_add)
if (lmbs_available < lmbs_to_add)
return -EINVAL;
 
+   memory_batch_expand_prepare(memblock_phys_mem_size() + lmbs_to_add * 
drmem_lmb_size());
+
for_each_drmem_lmb(lmb) {
if (lmb->flags & DRCONF_MEM_ASSIGNED)
continue;
@@ -734,6 +736,8 @@ static int dlpar_memory_add_by_index(u32 drc_index)
 
pr_info("Attempting to hot-add LMB, drc index %x\n", drc_index);
 
+   memory_batch_expand_prepare(memblock_phys_mem_size() +
+drmem_info->n_lmbs * drmem_lmb_size());
lmb_found = 0;
for_each_drmem_lmb(lmb) {
if (lmb->drc_index == drc_index) {
@@ -788,6 +792,8 @@ static int dlpar_memory_add_by_ic(u32 lmbs_to_add, u32 
drc_index)
if (lmbs_available < lmbs_to_add)
return -EINVAL;
 
+   memory_batch_expand_prepare(memblock_phys_mem_size() + lmbs_to_add * 
drmem_lmb_size());
+
for_each_drmem_lmb_in_range(lmb,