Re: [PATCH v3] pseries/drmem: don't cache node id in drmem_lmb struct

2020-09-10 Thread Scott Cheloha
On Fri, Aug 21, 2020 at 10:33:10AM +0200, Laurent Dufour wrote:
> Le 11/08/2020 à 03:51, Scott Cheloha a écrit :
> > 
> > [...]
> > 
> > @@ -631,7 +638,7 @@ static int dlpar_memory_remove_by_ic(u32 
> > lmbs_to_remove, u32 drc_index)
> >   static int dlpar_add_lmb(struct drmem_lmb *lmb)
> >   {
> > unsigned long block_sz;
> > -   int rc;
> > +   int nid, rc;
> > 
> > if (lmb->flags & DRCONF_MEM_ASSIGNED)
> > return -EINVAL;
> > @@ -642,11 +649,13 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
> > return rc;
> > }
> > 
> > -   lmb_set_nid(lmb);
> > block_sz = memory_block_size_bytes();
> > 
> > +   /* Find the node id for this address. */
> > +   nid = memory_add_physaddr_to_nid(lmb->base_addr);
> 
> I think we could be more efficient here.
> Here is the call stack behind memory_add_physaddr_to_nid():
> 
>   memory_add_physaddr_to_nid(lmb->base_addr)
> hot_add_scn_to_nid()
>   if (of_find_node_by_path("/ibm,dynamic-reconfiguration-memory")) == 
> true*
>   then
> hot_add_drconf_scn_to_nid()
>   for_each_drmem_lmb() to find the LMB based on lmb->base_addr
> of_drconf_to_nid_single(found LMB)
>   use lmb->aa_index to get the nid.
> 
> * that test is necessarily true when called from dlpar_add_lmb()
> otherwise the call to update_lmb_associativity_index() would have
> failed earlier.
> 
> Basically, we have a LMB and we later walk all the LMBs to find that lmb
> again.  In the case of dlpar_add_lmb(), it would be more efficient to
> directly call of_drconf_to_nid_single(). That function is not exported
> from arch/powerpc/mm/numa.c but it may be good to export it through that
> patch.

I've posted a patch for this:

https://lore.kernel.org/linuxppc-dev/20200910175637.2865160-1-chel...@linux.ibm.com/T/#u

The speedup is nice, especially for LMBs at the end of the array.


Re: [PATCH v3] pseries/drmem: don't cache node id in drmem_lmb struct

2020-09-09 Thread Michael Ellerman
On Mon, 10 Aug 2020 20:51:15 -0500, Scott Cheloha wrote:
> At memory hot-remove time we can retrieve an LMB's nid from its
> corresponding memory_block.  There is no need to store the nid
> in multiple locations.
> 
> Note that lmb_to_memblock() uses find_memory_block() to get the
> corresponding memory_block.  As find_memory_block() runs in sub-linear
> time this approach is negligibly slower than what we do at present.
> 
> [...]

Applied to powerpc/next.

[1/1] pseries/drmem: don't cache node id in drmem_lmb struct
  https://git.kernel.org/powerpc/c/e5e179aa3a39c818db8fbc2dce8d2cd24adaf657

cheers


Re: [PATCH v3] pseries/drmem: don't cache node id in drmem_lmb struct

2020-08-21 Thread Laurent Dufour

Le 11/08/2020 à 03:51, Scott Cheloha a écrit :

At memory hot-remove time we can retrieve an LMB's nid from its
corresponding memory_block.  There is no need to store the nid
in multiple locations.

Note that lmb_to_memblock() uses find_memory_block() to get the
corresponding memory_block.  As find_memory_block() runs in sub-linear
time this approach is negligibly slower than what we do at present.

In exchange for this lookup at hot-remove time we no longer need to
call memory_add_physaddr_to_nid() during drmem_init() for each LMB.
On powerpc, memory_add_physaddr_to_nid() is a linear search, so this
spares us an O(n^2) initialization during boot.

On systems with many LMBs that initialization overhead is palpable and
disruptive.  For example, on a box with 249854 LMBs we're seeing
drmem_init() take upwards of 30 seconds to complete:

[   53.721639] drmem: initializing drmem v2
[   80.604346] watchdog: BUG: soft lockup - CPU#65 stuck for 23s! [swapper/0:1]
[   80.604377] Modules linked in:
[   80.604389] CPU: 65 PID: 1 Comm: swapper/0 Not tainted 5.6.0-rc2+ #4
[   80.604397] NIP:  c00a4980 LR: c00a4940 CTR: 
[   80.604407] REGS: c0002dbff8493830 TRAP: 0901   Not tainted  (5.6.0-rc2+)
[   80.604412] MSR:  82009033   CR: 44000248  
XER: 000d
[   80.604431] CFAR: c00a4a38 IRQMASK: 0
[   80.604431] GPR00: c00a4940 c0002dbff8493ac0 c1904400 
c0003cfede30
[   80.604431] GPR04:  c0f4095a 002f 
1000
[   80.604431] GPR08: cbf7ecdb7fb8 cbf7ecc2d3c8 0008 
c00c0002fdfb2001
[   80.604431] GPR12:  c0001e8ec200
[   80.604477] NIP [c00a4980] hot_add_scn_to_nid+0xa0/0x3e0
[   80.604486] LR [c00a4940] hot_add_scn_to_nid+0x60/0x3e0
[   80.604492] Call Trace:
[   80.604498] [c0002dbff8493ac0] [c00a4940] 
hot_add_scn_to_nid+0x60/0x3e0 (unreliable)
[   80.604509] [c0002dbff8493b20] [c0087c10] 
memory_add_physaddr_to_nid+0x20/0x60
[   80.604521] [c0002dbff8493b40] [c10d4880] drmem_init+0x25c/0x2f0
[   80.604530] [c0002dbff8493c10] [c0010154] do_one_initcall+0x64/0x2c0
[   80.604540] [c0002dbff8493ce0] [c10c4aa0] 
kernel_init_freeable+0x2d8/0x3a0
[   80.604550] [c0002dbff8493db0] [c0010824] kernel_init+0x2c/0x148
[   80.604560] [c0002dbff8493e20] [c000b648] 
ret_from_kernel_thread+0x5c/0x74
[   80.604567] Instruction dump:
[   80.604574] 392918e8 e949 e90a000a e92a 80ea000c 1d080018 3908ffe8 
7d094214
[   80.604586] 7fa94040 419d00dc e9490010 714a0088 <2faa0008> 409e00ac e949 
7fbe5040
[   89.047390] drmem: 249854 LMB(s)

With a patched kernel on the same machine we're no longer seeing the
soft lockup.  drmem_init() now completes in negligible time, even when
the LMB count is large.

Signed-off-by: Scott Cheloha 
---
v1:
  - RFC

v2:
  - Adjusted commit message.
  - Miscellaneous cleanup.

v3:
  - Correct issue found by Laurent Dufour :
- Add missing put_device() call in dlpar_remove_lmb() for the
  lmb's associated mem_block.

  arch/powerpc/include/asm/drmem.h  | 21 
  arch/powerpc/mm/drmem.c   |  6 +
  .../platforms/pseries/hotplug-memory.c| 24 ---
  3 files changed, 17 insertions(+), 34 deletions(-)

diff --git a/arch/powerpc/include/asm/drmem.h b/arch/powerpc/include/asm/drmem.h
index 414d209f45bb..34e4e9b257f5 100644
--- a/arch/powerpc/include/asm/drmem.h
+++ b/arch/powerpc/include/asm/drmem.h
@@ -13,9 +13,6 @@ struct drmem_lmb {
u32 drc_index;
u32 aa_index;
u32 flags;
-#ifdef CONFIG_MEMORY_HOTPLUG
-   int nid;
-#endif
  };

  struct drmem_lmb_info {
@@ -104,22 +101,4 @@ static inline void 
invalidate_lmb_associativity_index(struct drmem_lmb *lmb)
lmb->aa_index = 0x;
  }

-#ifdef CONFIG_MEMORY_HOTPLUG
-static inline void lmb_set_nid(struct drmem_lmb *lmb)
-{
-   lmb->nid = memory_add_physaddr_to_nid(lmb->base_addr);
-}
-static inline void lmb_clear_nid(struct drmem_lmb *lmb)
-{
-   lmb->nid = -1;
-}
-#else
-static inline void lmb_set_nid(struct drmem_lmb *lmb)
-{
-}
-static inline void lmb_clear_nid(struct drmem_lmb *lmb)
-{
-}
-#endif
-
  #endif /* _ASM_POWERPC_LMB_H */
diff --git a/arch/powerpc/mm/drmem.c b/arch/powerpc/mm/drmem.c
index 59327cefbc6a..873fcfc7b875 100644
--- a/arch/powerpc/mm/drmem.c
+++ b/arch/powerpc/mm/drmem.c
@@ -362,10 +362,8 @@ static void __init init_drmem_v1_lmbs(const __be32 *prop)
if (!drmem_info->lmbs)
return;

-   for_each_drmem_lmb(lmb) {
+   for_each_drmem_lmb(lmb)
read_drconf_v1_cell(lmb, );
-   lmb_set_nid(lmb);
-   }
  }

  static void __init init_drmem_v2_lmbs(const __be32 *prop)
@@ -410,8 +408,6 @@ static void __init init_drmem_v2_lmbs(const __be32 *prop)

lmb->aa_index = dr_cell.aa_index;
  

Re: [PATCH v3] pseries/drmem: don't cache node id in drmem_lmb struct

2020-08-14 Thread Nathan Lynch
Nathan Lynch  writes:
> I'm satisfied that this change does not reintroduce that problem. The
> removal path still derives the node without going to the device tree,
> but now performs a more efficient lookup.

Er, actually it's slightly less efficient in the remove path, as you
note in your commit message. This doesn't alter my opinion of the
change. The higher-level algorithmic inefficiencies (e.g. all the
for_each_drmem_lmb loops) in this code are more of a concern.


Re: [PATCH v3] pseries/drmem: don't cache node id in drmem_lmb struct

2020-08-14 Thread Nathan Lynch
Scott Cheloha  writes:
> At memory hot-remove time we can retrieve an LMB's nid from its
> corresponding memory_block.  There is no need to store the nid
> in multiple locations.
>
> Note that lmb_to_memblock() uses find_memory_block() to get the
> corresponding memory_block.  As find_memory_block() runs in sub-linear
> time this approach is negligibly slower than what we do at present.
>
> In exchange for this lookup at hot-remove time we no longer need to
> call memory_add_physaddr_to_nid() during drmem_init() for each LMB.
> On powerpc, memory_add_physaddr_to_nid() is a linear search, so this
> spares us an O(n^2) initialization during boot.
>
> On systems with many LMBs that initialization overhead is palpable and
> disruptive.  For example, on a box with 249854 LMBs we're seeing
> drmem_init() take upwards of 30 seconds to complete:
>
> [   53.721639] drmem: initializing drmem v2
> [   80.604346] watchdog: BUG: soft lockup - CPU#65 stuck for 23s! 
> [swapper/0:1]
> [   80.604377] Modules linked in:
> [   80.604389] CPU: 65 PID: 1 Comm: swapper/0 Not tainted 5.6.0-rc2+ #4
> [   80.604397] NIP:  c00a4980 LR: c00a4940 CTR: 
> 
> [   80.604407] REGS: c0002dbff8493830 TRAP: 0901   Not tainted  (5.6.0-rc2+)
> [   80.604412] MSR:  82009033   CR: 
> 44000248  XER: 000d
> [   80.604431] CFAR: c00a4a38 IRQMASK: 0
> [   80.604431] GPR00: c00a4940 c0002dbff8493ac0 c1904400 
> c0003cfede30
> [   80.604431] GPR04:  c0f4095a 002f 
> 1000
> [   80.604431] GPR08: cbf7ecdb7fb8 cbf7ecc2d3c8 0008 
> c00c0002fdfb2001
> [   80.604431] GPR12:  c0001e8ec200
> [   80.604477] NIP [c00a4980] hot_add_scn_to_nid+0xa0/0x3e0
> [   80.604486] LR [c00a4940] hot_add_scn_to_nid+0x60/0x3e0
> [   80.604492] Call Trace:
> [   80.604498] [c0002dbff8493ac0] [c00a4940] 
> hot_add_scn_to_nid+0x60/0x3e0 (unreliable)
> [   80.604509] [c0002dbff8493b20] [c0087c10] 
> memory_add_physaddr_to_nid+0x20/0x60
> [   80.604521] [c0002dbff8493b40] [c10d4880] drmem_init+0x25c/0x2f0
> [   80.604530] [c0002dbff8493c10] [c0010154] 
> do_one_initcall+0x64/0x2c0
> [   80.604540] [c0002dbff8493ce0] [c10c4aa0] 
> kernel_init_freeable+0x2d8/0x3a0
> [   80.604550] [c0002dbff8493db0] [c0010824] kernel_init+0x2c/0x148
> [   80.604560] [c0002dbff8493e20] [c000b648] 
> ret_from_kernel_thread+0x5c/0x74
> [   80.604567] Instruction dump:
> [   80.604574] 392918e8 e949 e90a000a e92a 80ea000c 1d080018 3908ffe8 
> 7d094214
> [   80.604586] 7fa94040 419d00dc e9490010 714a0088 <2faa0008> 409e00ac 
> e949 7fbe5040
> [   89.047390] drmem: 249854 LMB(s)
>
> With a patched kernel on the same machine we're no longer seeing the
> soft lockup.  drmem_init() now completes in negligible time, even when
> the LMB count is large.

This has been a real annoyance, and it's great to see it fixed in a way
that both simplifies the code and reduces data structure size.

>
> Signed-off-by: Scott Cheloha 

I think this also should have:

Fixes: b2d3b5ee66f2 ("powerpc/pseries: Track LMB nid instead of using device 
tree")

since you're essentially reverting it.

The problem that commit was trying to address arose from the fact that
the removal path was determining the Linux node for a block from the
firmware description of the block:

memory_add_physaddr_to_nid
  hot_add_scn_to_nid
hot_add_drconf_scn_to_nid
  of_drconf_to_nid_single
[parses the DT for the nid]

However, this can become out of sync with Linux's NUMA assignment for
the memory due to VM migration or other events.

I'm satisfied that this change does not reintroduce that problem. The
removal path still derives the node without going to the device tree,
but now performs a more efficient lookup.

Reviewed-by: Nathan Lynch 


[PATCH v3] pseries/drmem: don't cache node id in drmem_lmb struct

2020-08-10 Thread Scott Cheloha
At memory hot-remove time we can retrieve an LMB's nid from its
corresponding memory_block.  There is no need to store the nid
in multiple locations.

Note that lmb_to_memblock() uses find_memory_block() to get the
corresponding memory_block.  As find_memory_block() runs in sub-linear
time this approach is negligibly slower than what we do at present.

In exchange for this lookup at hot-remove time we no longer need to
call memory_add_physaddr_to_nid() during drmem_init() for each LMB.
On powerpc, memory_add_physaddr_to_nid() is a linear search, so this
spares us an O(n^2) initialization during boot.

On systems with many LMBs that initialization overhead is palpable and
disruptive.  For example, on a box with 249854 LMBs we're seeing
drmem_init() take upwards of 30 seconds to complete:

[   53.721639] drmem: initializing drmem v2
[   80.604346] watchdog: BUG: soft lockup - CPU#65 stuck for 23s! [swapper/0:1]
[   80.604377] Modules linked in:
[   80.604389] CPU: 65 PID: 1 Comm: swapper/0 Not tainted 5.6.0-rc2+ #4
[   80.604397] NIP:  c00a4980 LR: c00a4940 CTR: 
[   80.604407] REGS: c0002dbff8493830 TRAP: 0901   Not tainted  (5.6.0-rc2+)
[   80.604412] MSR:  82009033   CR: 44000248  
XER: 000d
[   80.604431] CFAR: c00a4a38 IRQMASK: 0
[   80.604431] GPR00: c00a4940 c0002dbff8493ac0 c1904400 
c0003cfede30
[   80.604431] GPR04:  c0f4095a 002f 
1000
[   80.604431] GPR08: cbf7ecdb7fb8 cbf7ecc2d3c8 0008 
c00c0002fdfb2001
[   80.604431] GPR12:  c0001e8ec200
[   80.604477] NIP [c00a4980] hot_add_scn_to_nid+0xa0/0x3e0
[   80.604486] LR [c00a4940] hot_add_scn_to_nid+0x60/0x3e0
[   80.604492] Call Trace:
[   80.604498] [c0002dbff8493ac0] [c00a4940] 
hot_add_scn_to_nid+0x60/0x3e0 (unreliable)
[   80.604509] [c0002dbff8493b20] [c0087c10] 
memory_add_physaddr_to_nid+0x20/0x60
[   80.604521] [c0002dbff8493b40] [c10d4880] drmem_init+0x25c/0x2f0
[   80.604530] [c0002dbff8493c10] [c0010154] do_one_initcall+0x64/0x2c0
[   80.604540] [c0002dbff8493ce0] [c10c4aa0] 
kernel_init_freeable+0x2d8/0x3a0
[   80.604550] [c0002dbff8493db0] [c0010824] kernel_init+0x2c/0x148
[   80.604560] [c0002dbff8493e20] [c000b648] 
ret_from_kernel_thread+0x5c/0x74
[   80.604567] Instruction dump:
[   80.604574] 392918e8 e949 e90a000a e92a 80ea000c 1d080018 3908ffe8 
7d094214
[   80.604586] 7fa94040 419d00dc e9490010 714a0088 <2faa0008> 409e00ac e949 
7fbe5040
[   89.047390] drmem: 249854 LMB(s)

With a patched kernel on the same machine we're no longer seeing the
soft lockup.  drmem_init() now completes in negligible time, even when
the LMB count is large.

Signed-off-by: Scott Cheloha 
---
v1:
 - RFC

v2:
 - Adjusted commit message.
 - Miscellaneous cleanup.

v3:
 - Correct issue found by Laurent Dufour :
   - Add missing put_device() call in dlpar_remove_lmb() for the
 lmb's associated mem_block.

 arch/powerpc/include/asm/drmem.h  | 21 
 arch/powerpc/mm/drmem.c   |  6 +
 .../platforms/pseries/hotplug-memory.c| 24 ---
 3 files changed, 17 insertions(+), 34 deletions(-)

diff --git a/arch/powerpc/include/asm/drmem.h b/arch/powerpc/include/asm/drmem.h
index 414d209f45bb..34e4e9b257f5 100644
--- a/arch/powerpc/include/asm/drmem.h
+++ b/arch/powerpc/include/asm/drmem.h
@@ -13,9 +13,6 @@ struct drmem_lmb {
u32 drc_index;
u32 aa_index;
u32 flags;
-#ifdef CONFIG_MEMORY_HOTPLUG
-   int nid;
-#endif
 };
 
 struct drmem_lmb_info {
@@ -104,22 +101,4 @@ static inline void 
invalidate_lmb_associativity_index(struct drmem_lmb *lmb)
lmb->aa_index = 0x;
 }
 
-#ifdef CONFIG_MEMORY_HOTPLUG
-static inline void lmb_set_nid(struct drmem_lmb *lmb)
-{
-   lmb->nid = memory_add_physaddr_to_nid(lmb->base_addr);
-}
-static inline void lmb_clear_nid(struct drmem_lmb *lmb)
-{
-   lmb->nid = -1;
-}
-#else
-static inline void lmb_set_nid(struct drmem_lmb *lmb)
-{
-}
-static inline void lmb_clear_nid(struct drmem_lmb *lmb)
-{
-}
-#endif
-
 #endif /* _ASM_POWERPC_LMB_H */
diff --git a/arch/powerpc/mm/drmem.c b/arch/powerpc/mm/drmem.c
index 59327cefbc6a..873fcfc7b875 100644
--- a/arch/powerpc/mm/drmem.c
+++ b/arch/powerpc/mm/drmem.c
@@ -362,10 +362,8 @@ static void __init init_drmem_v1_lmbs(const __be32 *prop)
if (!drmem_info->lmbs)
return;
 
-   for_each_drmem_lmb(lmb) {
+   for_each_drmem_lmb(lmb)
read_drconf_v1_cell(lmb, );
-   lmb_set_nid(lmb);
-   }
 }
 
 static void __init init_drmem_v2_lmbs(const __be32 *prop)
@@ -410,8 +408,6 @@ static void __init init_drmem_v2_lmbs(const __be32 *prop)
 
lmb->aa_index = dr_cell.aa_index;
lmb->flags = dr_cell.flags;
-
-