Re: [PATCH v2 00/11] Optimization to improve CPU online/offline on Powerpc

2020-10-06 Thread Michael Ellerman
On Mon, 21 Sep 2020 15:26:42 +0530, Srikar Dronamraju wrote:
> Here are some optimizations and fixes to make CPU online/offline
> faster and hence result in faster bootup.
> 
> Its based on top of my v5 coregroup support patchset.
> https://lore.kernel.org/linuxppc-dev/20200810071834.92514-1-sri...@linux.vnet.ibm.com/t/#u
> 
> Anton reported that his 4096 cpu (1024 cores in a socket) was taking too
> long to boot. He also analyzed that most of the time was being spent on
> updating cpu_core_mask.
> 
> [...]

Applied to powerpc/next.

[01/11] powerpc/topology: Update topology_core_cpumask

https://git.kernel.org/powerpc/c/4bce545903fa0290e011cf118997717f0c4f4d20
[02/11] powerpc/smp: Stop updating cpu_core_mask

https://git.kernel.org/powerpc/c/4ca234a9cbd7c3a656b34dd98c8b156f70ed7849
[03/11] powerpc/smp: Remove get_physical_package_id

https://git.kernel.org/powerpc/c/e29e9ed665eeb6f98cd88672994ecf4aaefdb943
[04/11] powerpc/smp: Optimize remove_cpu_from_masks

https://git.kernel.org/powerpc/c/70edd4a7c753ba18e3e4bb9e97b6d85156cea738
[05/11] powerpc/smp: Limit CPUs traversed to within a node.

https://git.kernel.org/powerpc/c/53516d4abacfab1faaa075c1f79957abc3da358c
[06/11] powerpc/smp: Stop passing mask to update_mask_by_l2

https://git.kernel.org/powerpc/c/1f3a4181042107e32e44047e9dde990aced845b5
[07/11] powerpc/smp: Depend on cpu_l1_cache_map when adding CPUs

https://git.kernel.org/powerpc/c/661e3d42f99193b7fdd71467a87e48f6e597c285
[08/11] powerpc/smp: Check for duplicate topologies and consolidate

https://git.kernel.org/powerpc/c/375370a10d061d5c75c6bc5b09c5db4cc0b0fcfe
[09/11] powerpc/smp: Optimize update_mask_by_l2

https://git.kernel.org/powerpc/c/3ab33d6dc3e98e83b55732049e1d1d488207bb6d
[10/11] powerpc/smp: Move coregroup mask updation to a new function

https://git.kernel.org/powerpc/c/b8a97cb4599cda28bd3b3bc13042f5803b42ad65
[11/11] powerpc/smp: Optimize update_coregroup_mask

https://git.kernel.org/powerpc/c/70a94089d7f7fa91bc1795622426b3ed017ec71a

cheers


[PATCH v2 00/11] Optimization to improve CPU online/offline on Powerpc

2020-09-21 Thread Srikar Dronamraju
Here are some optimizations and fixes to make CPU online/offline
faster and hence result in faster bootup.

Its based on top of my v5 coregroup support patchset.
https://lore.kernel.org/linuxppc-dev/20200810071834.92514-1-sri...@linux.vnet.ibm.com/t/#u

Anton reported that his 4096 cpu (1024 cores in a socket) was taking too
long to boot. He also analyzed that most of the time was being spent on
updating cpu_core_mask.

The first two patches should solve Anton's immediate problem.
On the unofficial patches, Anton reported that the boot time came from 30
mins to 6 seconds. (Basically a high core count in a single socket
configuration). Satheesh also reported similar numbers.

The rest are cleanups/optimizations.

Since cpu_core_mask is an exported symbol for a long duration, lets retain
as a snapshot of cpumask_of_node.

$ lscpu
Architecture:ppc64le
Byte Order:  Little Endian
CPU(s):  1024
On-line CPU(s) list: 0-1023
Thread(s) per core:  8
Core(s) per socket:  8
Socket(s):   16
NUMA node(s):16
Model:   2.0 (pvr 004d 0200)
Model name:  POWER8 (architected), altivec supported
Hypervisor vendor:   pHyp
Virtualization type: para
L1d cache:   64K
L1i cache:   32K
L2 cache:512K
L3 cache:8192K
NUMA node0 CPU(s):   0-63
NUMA node1 CPU(s):   64-127
NUMA node2 CPU(s):   128-191
NUMA node3 CPU(s):   192-255
NUMA node4 CPU(s):   256-319
NUMA node5 CPU(s):   320-383
NUMA node6 CPU(s):   384-447
NUMA node7 CPU(s):   448-511
NUMA node8 CPU(s):   512-575
NUMA node9 CPU(s):   576-639
NUMA node10 CPU(s):  640-703
NUMA node11 CPU(s):  704-767
NUMA node12 CPU(s):  768-831
NUMA node13 CPU(s):  832-895
NUMA node14 CPU(s):  896-959
NUMA node15 CPU(s):  960-1023

$ dmesg -k | grep -i -e Bringing -e Brought -e sysrq -e bug
With powerp/next
[0.00] printk: debug: ignoring loglevel setting.
[0.354971] smp: Bringing up secondary CPUs ...
[  233.354676] smp: Brought up 16 nodes, 1024 CPUs
[  330.023073] sysrq: Changing Loglevel
[  330.023101] sysrq: Loglevel set to 9

With +patchset
[0.00] printk: debug: ignoring loglevel setting.
[0.351703] smp: Bringing up secondary CPUs ...
[4.059859] smp: Brought up 16 nodes, 1024 CPUs
[   98.309015] sysrq: Changing Loglevel
[   98.309044] sysrq: Loglevel set to 9

Observations:
CPU bringup time reduced to 4 seconds from 233 seconds on this 1024 CPU
system. This resulted in System boot up time reducing to 98 seconds from
330 seconds. The actual improvement would depend on your system topology.

Topology verification post patchset on a 2 node Power9 PowerVM LPAR

powerpc/next+patchset
-
$ lscpu
Architecture:ppc64le
Architecture:ppc64le
Byte Order:  Little Endian  Byte Order: 
 Little Endian
CPU(s):  128CPU(s): 
 128
On-line CPU(s) list: 0-127  On-line 
CPU(s) list: 0-127
Thread(s) per core:  8  Thread(s) 
per core:  8
Core(s) per socket:  8  Core(s) per 
socket:  8
Socket(s):   2  Socket(s):  
 2
NUMA node(s):2  NUMA 
node(s):2
Model:   2.2 (pvr 004e 0202)Model:  
 2.2 (pvr 004e 0202)
Model name:  POWER9 (architected), altivec supportedModel name: 
 POWER9 (architected), altivec supported
Hypervisor vendor:   pHyp   Hypervisor 
vendor:   pHyp
Virtualization type: para   
Virtualization type: para
L1d cache:   32KL1d cache:  
 32K
L1i cache:   32KL1i cache:  
 32K
L2 cache:512K   L2 cache:   
 512K
L3 cache:10240K L3 cache:   
 10240K
NUMA node0 CPU(s):   0-63   NUMA node0 
CPU(s):   0-63
NUMA node1 CPU(s):   64-127 NUMA node1 
CPU(s):   64-127

$ tail -f /proc/cpuinfo
processor   : 127   processor   
: 127
cpu : POWER9 (architected), altivec supported   cpu 
: POWER9 (architected), altivec supported
clock   : 3000.00MHzclock   
: 3000.00MHz
revision: 2.2 (pvr 004e 0202)   revision
: