RE: 5.11-rc4+git: Shortest NUMA path spans too many nodes

2021-01-22 Thread Song Bao Hua (Barry Song)


> -Original Message-
> From: Dietmar Eggemann [mailto:dietmar.eggem...@arm.com]
> Sent: Friday, January 22, 2021 11:05 PM
> To: Song Bao Hua (Barry Song) ; Valentin Schneider
> ; Meelis Roos ; LKML
> 
> Cc: Peter Zijlstra ; Vincent Guittot
> ; Mel Gorman 
> Subject: Re: 5.11-rc4+git: Shortest NUMA path spans too many nodes
> 
> On 21/01/2021 22:17, Song Bao Hua (Barry Song) wrote:
> >
> >
> >> -Original Message-
> >> From: Dietmar Eggemann [mailto:dietmar.eggem...@arm.com]
> >> Sent: Friday, January 22, 2021 7:54 AM
> >> To: Valentin Schneider ; Meelis Roos
> >> ; LKML 
> >> Cc: Peter Zijlstra ; Vincent Guittot
> >> ; Song Bao Hua (Barry Song)
> >> ; Mel Gorman 
> >> Subject: Re: 5.11-rc4+git: Shortest NUMA path spans too many nodes
> >>
> >> On 21/01/2021 19:21, Valentin Schneider wrote:
> >>> On 21/01/21 19:39, Meelis Roos wrote:
> 
> [...]
> 
> >> # cat /sys/devices/system/node/node*/distance
> >> 10 12 12 14 14 14 14 16
> >> 12 10 14 12 14 14 12 14
> >> 12 14 10 14 12 12 14 14
> >> 14 12 14 10 12 12 14 14
> >> 14 14 12 12 10 14 12 14
> >> 14 14 12 12 14 10 14 12
> >> 14 12 14 14 12 14 10 12
> >> 16 14 14 14 14 12 12 10
> >>
> >> The '16' seems to be the culprit. How does such a topo look like?
> 
> Maybe like this:
> 
>   _
>   |   |
> .-6   0   4-.
> |  \ / \ /  |
> |   1   2   |
> |   \\  |
> --7  35 |
>   |  ||_|
>   |___|
> 
> >
> > Once we get a topology like this:
> >
> >
> >  +--+ +--++---+   +--+
> >  | node | |node  || node  |   |node  |
> >  |  +-+  ++   +---+  |
> >  +--+ +--++---+   +--+
> >
> > We can reproduce this issue.
> > For example, every cpu with the below numa_distance can have
> > "groups don't span domain->span":
> > node   0   1   2   3
> >   0:  10  12  20  22
> >   1:  12  10  22  24
> >   2:  20  22  10  12
> >   3:  22  24  12  10
>  2 20 2
> So this should look like: 1 --- 0  2 --- 3

Yes. So here we are facing another problem:
kernel/sched/topology.c has an assumption that:
node_distance(0,j) includes all distances in 
node_distance(i,j).

void sched_init_numa(void)
{
...
 *
 * Assumes node_distance(0,j) includes all distances in
 * node_distance(i,j) in order to avoid cubic time.
 */
next_distance = curr_distance;
for (i = 0; i < nr_node_ids; i++) {
for (j = 0; j < nr_node_ids; j++) {
for (k = 0; k < nr_node_ids; k++)
}

but obviously we are not this case. Right now, we are getting
some performance decrease due to this, probably I'll start another
thread for it.

Thanks
Barry



RE: 5.11-rc4+git: Shortest NUMA path spans too many nodes

2021-01-22 Thread Valentin Schneider
On 22/01/21 11:09, Song Bao Hua (Barry Song) wrote:
>> -Original Message-
>> From: Dietmar Eggemann [mailto:dietmar.eggem...@arm.com]
>> > For example, every cpu with the below numa_distance can have
>> > "groups don't span domain->span":
>> > node   0   1   2   3
>> >   0:  10  12  20  22
>> >   1:  12  10  22  24
>> >   2:  20  22  10  12
>> >   3:  22  24  12  10
>>  2 20 2
>> So this should look like: 1 --- 0  2 --- 3
>
> Yes. So here we are facing another problem:
> kernel/sched/topology.c has an assumption that:
> node_distance(0,j) includes all distances in
> node_distance(i,j).
>
> void sched_init_numa(void)
> {
>   ...
>*
>* Assumes node_distance(0,j) includes all distances in
>* node_distance(i,j) in order to avoid cubic time.
>*/
>   next_distance = curr_distance;
>   for (i = 0; i < nr_node_ids; i++) {
>   for (j = 0; j < nr_node_ids; j++) {
>   for (k = 0; k < nr_node_ids; k++)
> }
>
> but obviously we are not this case. Right now, we are getting
> some performance decrease due to this, probably I'll start another
> thread for it.
>

It's not too difficult to solve that one; I must still have a patch laying
somewhere using a bitmap - this relies on the ACPI spec stating distance
values are 8bit, which gives us a reasonable bound for the bitmap size.

Let me fish this out.

> Thanks
> Barry


Re: 5.11-rc4+git: Shortest NUMA path spans too many nodes

2021-01-22 Thread Dietmar Eggemann
On 21/01/2021 22:17, Song Bao Hua (Barry Song) wrote:
> 
> 
>> -Original Message-
>> From: Dietmar Eggemann [mailto:dietmar.eggem...@arm.com]
>> Sent: Friday, January 22, 2021 7:54 AM
>> To: Valentin Schneider ; Meelis Roos
>> ; LKML 
>> Cc: Peter Zijlstra ; Vincent Guittot
>> ; Song Bao Hua (Barry Song)
>> ; Mel Gorman 
>> Subject: Re: 5.11-rc4+git: Shortest NUMA path spans too many nodes
>>
>> On 21/01/2021 19:21, Valentin Schneider wrote:
>>> On 21/01/21 19:39, Meelis Roos wrote:

[...]

>> # cat /sys/devices/system/node/node*/distance
>> 10 12 12 14 14 14 14 16
>> 12 10 14 12 14 14 12 14
>> 12 14 10 14 12 12 14 14
>> 14 12 14 10 12 12 14 14
>> 14 14 12 12 10 14 12 14
>> 14 14 12 12 14 10 14 12
>> 14 12 14 14 12 14 10 12
>> 16 14 14 14 14 12 12 10
>>
>> The '16' seems to be the culprit. How does such a topo look like?

Maybe like this:

  _
  |   |
.-6   0   4-.
|  \ / \ /  |
|   1   2   |
|   \\  |
--7  35 |
  |  ||_|
  |___|

> 
> Once we get a topology like this:
> 
> 
>  +--+ +--++---+   +--+
>  | node | |node  || node  |   |node  |
>  |  +-+  ++   +---+  |
>  +--+ +--++---+   +--+
> 
> We can reproduce this issue. 
> For example, every cpu with the below numa_distance can have 
> "groups don't span domain->span":
> node   0   1   2   3
>   0:  10  12  20  22
>   1:  12  10  22  24
>   2:  20  22  10  12
>   3:  22  24  12  10
 2 20 2
So this should look like: 1 --- 0  2 --- 3


RE: 5.11-rc4+git: Shortest NUMA path spans too many nodes

2021-01-21 Thread Song Bao Hua (Barry Song)


> -Original Message-
> From: Dietmar Eggemann [mailto:dietmar.eggem...@arm.com]
> Sent: Friday, January 22, 2021 7:54 AM
> To: Valentin Schneider ; Meelis Roos
> ; LKML 
> Cc: Peter Zijlstra ; Vincent Guittot
> ; Song Bao Hua (Barry Song)
> ; Mel Gorman 
> Subject: Re: 5.11-rc4+git: Shortest NUMA path spans too many nodes
> 
> On 21/01/2021 19:21, Valentin Schneider wrote:
> > On 21/01/21 19:39, Meelis Roos wrote:
> >>> Could you paste the output of the below?
> >>>
> >>>$ cat /sys/devices/system/node/node*/distance
> >>
> >> 10 12 12 14 14 14 14 16
> >> 12 10 14 12 14 14 12 14
> >> 12 14 10 14 12 12 14 14
> >> 14 12 14 10 12 12 14 14
> >> 14 14 12 12 10 14 12 14
> >> 14 14 12 12 14 10 14 12
> >> 14 12 14 14 12 14 10 12
> >> 16 14 14 14 14 12 12 10
> >>
> >
> > Thanks!
> >
> >>
> >>> Additionally, booting your system with CONFIG_SCHED_DEBUG=y and
> >>> appending 'sched_debug' to your cmdline should yield some extra data.
> >>
> >> [0.00] Linux version 5.11.0-rc4-00015-g45dfb8a5659a (mroos@x4600m2)
> (gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 
> 2.35.1)
> #55 SMP Thu Jan 21 19:23:10 EET 2021
> >> [0.00] Command line:
> BOOT_IMAGE=/boot/vmlinuz-5.11.0-rc4-00015-g45dfb8a5659a root=/dev/sda1 ro
> quiet
> >
> > This is missing 'sched_debug' to get the extra topology debug prints (yes
> > it needs an extra cmdline argument on top of having CONFIG_SCHED_DEBUG=y),
> > but I should be able to generate those locally by feeding QEMU the above
> > distance table.
> 
> Can be recreated with (simplified with only 1 CPU per node):
> 
> $ qemu-system-aarch64 -kernel /opt/git/kernel_org/arch/arm64/boot/Image -hda
> /opt/git/tools/qemu-imgs-manipulator/images/qemu-image-aarch64.img -append
> 'root=/dev/vda console=ttyAMA0 loglevel=8 sched_debug' -nographic -machine
> virt,gic-version=max -smp cores=8 -m 512 -cpu cortex-a57 -numa
> node,cpus=0,nodeid=0 -numa node,cpus=1,nodeid=1, -numa node,cpus=2,nodeid=2,
> -numa node,cpus=3,nodeid=3, -numa node,cpus=4,nodeid=4, -numa
> node,cpus=5,nodeid=5, -numa node,cpus=6,nodeid=6, -numa node,cpus=7,nodeid=7,
> -numa dist,src=0,dst=1,val=12, -numa dist,src=0,dst=2,val=12, -numa
> dist,src=0,dst=3,val=14, -numa dist,src=0,dst=4,val=14, -numa
> dist,src=0,dst=5,val=14, -numa dist,src=0,dst=6,val=14, -numa
> dist,src=0,dst=7,val=16, -numa dist,src=1,dst=2,val=14, -numa
> dist,src=1,dst=3,val=12, -numa dist,src=1,dst=4,val=14, -numa
> dist,src=1,dst=5,val=14, -numa dist,src=1,dst=6,val=12, -numa
> dist,src=1,dst=7,val=14, -numa dist,src=2,dst=3,val=14, -numa
> dist,src=2,dst=4,val=12, -numa dist,src=2,dst=5,val=12, -numa
> dist,src=2,dst=6,val=14, -numa dist,src=2,dst=7,val=14, -numa
> dist,src=3,dst=4,val=12, -numa dist,src=3,dst=5,val=12, -numa
> dist,src=3,dst=6,val=14, -numa dist,src=3,dst=7,val=14, -numa
> dist,src=4,dst=5,val=14, -numa dist,src=4,dst=6,val=12, -numa
> dist,src=4,dst=7,val=14, -numa dist,src=5,dst=6,val=14, -numa
> dist,src=5,dst=7,val=12, -numa dist,src=6,dst=7,val=12
> 
> [0.206628] [ cut here ]
> [0.206698] Shortest NUMA path spans too many nodes
> [0.207119] WARNING: CPU: 0 PID: 1 at kernel/sched/topology.c:753
> cpu_attach_domain+0x42c/0x87c
> [0.207176] Modules linked in:
> [0.207373] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
> 5.11.0-rc2-00010-g65bcf072e20e-dirty #81
> [0.207458] Hardware name: linux,dummy-virt (DT)
> [0.207584] pstate: 6005 (nZCv daif -PAN -UAO -TCO BTYPE=--)
> [0.207618] pc : cpu_attach_domain+0x42c/0x87c
> [0.207646] lr : cpu_attach_domain+0x42c/0x87c
> [0.207665] sp : 800011fcbbf0
> [0.207679] x29: 800011fcbbf0 x28: 024d8200
> [0.207735] x27: 1fef x26: 1917
> [0.207755] x25: 024d8000 x24: 1917
> [0.207772] x23:  x22: 800011b69a40
> [0.207789] x21: 024d8320 x20: 8000116fda80
> [0.207806] x19: 024d8000 x18: 
> [0.207822] x17:  x16: bd30d762
> [0.207838] x15: 0030 x14: 
> [0.207855] x13: 800011b82e08 x12: 01b9
> [0.207871] x11: 0093 x10: 800011bdae08
> [0.207887] x9 : f000 x8 : 800011b82e08
> [0.207922] x7 : 800011bdae08 x6 : 
> [0.207939] x5 :  x4 : 
> [0.207955] x3 :  x2 : 
> [0.207972] x1 :  x0 : 18

Re: 5.11-rc4+git: Shortest NUMA path spans too many nodes

2021-01-21 Thread Dietmar Eggemann
On 21/01/2021 19:21, Valentin Schneider wrote:
> On 21/01/21 19:39, Meelis Roos wrote:
>>> Could you paste the output of the below?
>>>
>>>$ cat /sys/devices/system/node/node*/distance
>>
>> 10 12 12 14 14 14 14 16
>> 12 10 14 12 14 14 12 14
>> 12 14 10 14 12 12 14 14
>> 14 12 14 10 12 12 14 14
>> 14 14 12 12 10 14 12 14
>> 14 14 12 12 14 10 14 12
>> 14 12 14 14 12 14 10 12
>> 16 14 14 14 14 12 12 10
>>
> 
> Thanks!
> 
>>
>>> Additionally, booting your system with CONFIG_SCHED_DEBUG=y and
>>> appending 'sched_debug' to your cmdline should yield some extra data.
>>
>> [0.00] Linux version 5.11.0-rc4-00015-g45dfb8a5659a (mroos@x4600m2) 
>> (gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 
>> 2.35.1) #55 SMP Thu Jan 21 19:23:10 EET 2021
>> [0.00] Command line: 
>> BOOT_IMAGE=/boot/vmlinuz-5.11.0-rc4-00015-g45dfb8a5659a root=/dev/sda1 ro 
>> quiet
> 
> This is missing 'sched_debug' to get the extra topology debug prints (yes
> it needs an extra cmdline argument on top of having CONFIG_SCHED_DEBUG=y),
> but I should be able to generate those locally by feeding QEMU the above
> distance table.

Can be recreated with (simplified with only 1 CPU per node):

$ qemu-system-aarch64 -kernel /opt/git/kernel_org/arch/arm64/boot/Image -hda 
/opt/git/tools/qemu-imgs-manipulator/images/qemu-image-aarch64.img -append 
'root=/dev/vda console=ttyAMA0 loglevel=8 sched_debug' -nographic -machine 
virt,gic-version=max -smp cores=8 -m 512 -cpu cortex-a57 -numa 
node,cpus=0,nodeid=0 -numa node,cpus=1,nodeid=1, -numa node,cpus=2,nodeid=2, 
-numa node,cpus=3,nodeid=3, -numa node,cpus=4,nodeid=4, -numa 
node,cpus=5,nodeid=5, -numa node,cpus=6,nodeid=6, -numa node,cpus=7,nodeid=7, 
-numa dist,src=0,dst=1,val=12, -numa dist,src=0,dst=2,val=12, -numa 
dist,src=0,dst=3,val=14, -numa dist,src=0,dst=4,val=14, -numa 
dist,src=0,dst=5,val=14, -numa dist,src=0,dst=6,val=14, -numa 
dist,src=0,dst=7,val=16, -numa dist,src=1,dst=2,val=14, -numa 
dist,src=1,dst=3,val=12, -numa dist,src=1,dst=4,val=14, -numa 
dist,src=1,dst=5,val=14, -numa dist,src=1,dst=6,val=12, -numa 
dist,src=1,dst=7,val=14, -numa dist,src=2,dst=3,val=14, -numa 
dist,src=2,dst=4,val=12, -numa dist,src=2,dst=5,val=12, -numa 
dist,src=2,dst=6,val=14, -numa dist,src=2,dst=7,val=14, -numa 
dist,src=3,dst=4,val=12, -numa dist,src=3,dst=5,val=12, -numa 
dist,src=3,dst=6,val=14, -numa dist,src=3,dst=7,val=14, -numa 
dist,src=4,dst=5,val=14, -numa dist,src=4,dst=6,val=12, -numa 
dist,src=4,dst=7,val=14, -numa dist,src=5,dst=6,val=14, -numa 
dist,src=5,dst=7,val=12, -numa dist,src=6,dst=7,val=12

[0.206628] [ cut here ]
[0.206698] Shortest NUMA path spans too many nodes
[0.207119] WARNING: CPU: 0 PID: 1 at kernel/sched/topology.c:753 
cpu_attach_domain+0x42c/0x87c
[0.207176] Modules linked in:
[0.207373] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 
5.11.0-rc2-00010-g65bcf072e20e-dirty #81
[0.207458] Hardware name: linux,dummy-virt (DT)
[0.207584] pstate: 6005 (nZCv daif -PAN -UAO -TCO BTYPE=--)
[0.207618] pc : cpu_attach_domain+0x42c/0x87c
[0.207646] lr : cpu_attach_domain+0x42c/0x87c
[0.207665] sp : 800011fcbbf0
[0.207679] x29: 800011fcbbf0 x28: 024d8200 
[0.207735] x27: 1fef x26: 1917 
[0.207755] x25: 024d8000 x24: 1917 
[0.207772] x23:  x22: 800011b69a40 
[0.207789] x21: 024d8320 x20: 8000116fda80 
[0.207806] x19: 024d8000 x18:  
[0.207822] x17:  x16: bd30d762 
[0.207838] x15: 0030 x14:  
[0.207855] x13: 800011b82e08 x12: 01b9 
[0.207871] x11: 0093 x10: 800011bdae08 
[0.207887] x9 : f000 x8 : 800011b82e08 
[0.207922] x7 : 800011bdae08 x6 :  
[0.207939] x5 :  x4 :  
[0.207955] x3 :  x2 :  
[0.207972] x1 :  x0 : 1802 
[0.208125] Call trace:
[0.208230]  cpu_attach_domain+0x42c/0x87c
[0.208256]  build_sched_domains+0x1238/0x12f4
[0.208271]  sched_init_domains+0x80/0xb0
[0.208283]  sched_init_smp+0x30/0x80
[0.208299]  kernel_init_freeable+0xf4/0x238
[0.208313]  kernel_init+0x14/0x118
[0.208328]  ret_from_fork+0x10/0x34
[0.208507] ---[ end trace 75cafa7c7d1a3d7e ]---
[0.208706] CPU0 attaching sched-domain(s):
[0.208756]  domain-0: span=0-2 level=NUMA
[0.209001]   groups: 0:{ span=0 cap=1017 }, 1:{ span=1 cap=1016 }, 2:{ 
span=2 cap=1015 }
[0.209247]   domain-1: span=0-6 level=NUMA
[0.209280]groups: 0:{ span=0-2 mask=0 cap=3048 }, 3:{ span=1,3-5 mask=3 
cap=4073 }, 6:{ span=1,4,6-7 mask=6 cap=4084 }
[0.209693] ERROR: groups don't span domain->span
[0.209703]domain-2: span=0-7 level=NUMA
[0.209722] groups: 0:{ span=0-6 

Re: 5.11-rc4+git: Shortest NUMA path spans too many nodes

2021-01-21 Thread Valentin Schneider
On 21/01/21 19:39, Meelis Roos wrote:
>> Could you paste the output of the below?
>>
>>$ cat /sys/devices/system/node/node*/distance
>
> 10 12 12 14 14 14 14 16
> 12 10 14 12 14 14 12 14
> 12 14 10 14 12 12 14 14
> 14 12 14 10 12 12 14 14
> 14 14 12 12 10 14 12 14
> 14 14 12 12 14 10 14 12
> 14 12 14 14 12 14 10 12
> 16 14 14 14 14 12 12 10
>

Thanks!

>
>> Additionally, booting your system with CONFIG_SCHED_DEBUG=y and
>> appending 'sched_debug' to your cmdline should yield some extra data.
>
> [0.00] Linux version 5.11.0-rc4-00015-g45dfb8a5659a (mroos@x4600m2) 
> (gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 
> 2.35.1) #55 SMP Thu Jan 21 19:23:10 EET 2021
> [0.00] Command line: 
> BOOT_IMAGE=/boot/vmlinuz-5.11.0-rc4-00015-g45dfb8a5659a root=/dev/sda1 ro 
> quiet

This is missing 'sched_debug' to get the extra topology debug prints (yes
it needs an extra cmdline argument on top of having CONFIG_SCHED_DEBUG=y),
but I should be able to generate those locally by feeding QEMU the above
distance table.


Re: 5.11-rc4+git: Shortest NUMA path spans too many nodes

2021-01-21 Thread Meelis Roos




Could you paste the output of the below?

   $ cat /sys/devices/system/node/node*/distance


10 12 12 14 14 14 14 16
12 10 14 12 14 14 12 14
12 14 10 14 12 12 14 14
14 12 14 10 12 12 14 14
14 14 12 12 10 14 12 14
14 14 12 12 14 10 14 12
14 12 14 14 12 14 10 12
16 14 14 14 14 12 12 10



Additionally, booting your system with CONFIG_SCHED_DEBUG=y and
appending 'sched_debug' to your cmdline should yield some extra data.


[0.00] Linux version 5.11.0-rc4-00015-g45dfb8a5659a (mroos@x4600m2) 
(gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 
2.35.1) #55 SMP Thu Jan 21 19:23:10 EET 2021
[0.00] Command line: 
BOOT_IMAGE=/boot/vmlinuz-5.11.0-rc4-00015-g45dfb8a5659a root=/dev/sda1 ro quiet
[0.00] x86/fpu: x87 FPU will use FXSAVE
[0.00] BIOS-provided physical RAM map:
[0.00] BIOS-e820: [mem 0x-0x00099bff] usable
[0.00] BIOS-e820: [mem 0x00099c00-0x0009] reserved
[0.00] BIOS-e820: [mem 0x000e6000-0x000f] reserved
[0.00] BIOS-e820: [mem 0x0010-0xd7f9] usable
[0.00] BIOS-e820: [mem 0xd7fae000-0xd7fa] type 9
[0.00] BIOS-e820: [mem 0xd7fb-0xd7fbdfff] ACPI data
[0.00] BIOS-e820: [mem 0xd7fbe000-0xd7fe] ACPI NVS
[0.00] BIOS-e820: [mem 0xd7ff-0xd7ff] reserved
[0.00] BIOS-e820: [mem 0xdc00-0xefff] reserved
[0.00] BIOS-e820: [mem 0xfec0-0xfec00fff] reserved
[0.00] BIOS-e820: [mem 0xfee0-0xfee00fff] reserved
[0.00] BIOS-e820: [mem 0xff70-0x] reserved
[0.00] BIOS-e820: [mem 0x0001-0x002027ff] usable
[0.00] NX (Execute Disable) protection: active
[0.00] SMBIOS 2.5 present.
[0.00] DMI: Sun Microsystems Sun Fire X4600 M2/Sun Fire X4600 M2, 
BIOS 0ABIT132 12/03/2009
[0.00] tsc: Fast TSC calibration using PIT
[0.00] tsc: Detected 2293.794 MHz processor
[0.005734] e820: update [mem 0x-0x0fff] usable ==> reserved
[0.005740] e820: remove [mem 0x000a-0x000f] usable
[0.011432] AGP: No AGP bridge found
[0.011578] last_pfn = 0x2028000 max_arch_pfn = 0x4
[0.011601] MTRR default type: uncachable
[0.011604] MTRR fixed ranges enabled:
[0.011607]   0-9 write-back
[0.011610]   A-E uncachable
[0.011612]   F-F write-protect
[0.011614] MTRR variable ranges enabled:
[0.011616]   0 base  mask 8000 write-back
[0.011620]   1 base 8000 mask C000 write-back
[0.011623]   2 base C000 mask F000 write-back
[0.011626]   3 base D000 mask F800 write-back
[0.011629]   4 disabled
[0.011630]   5 disabled
[0.011632]   6 disabled
[0.011633]   7 disabled
[0.011634] TOM2: 00202800 aka 131712M
[0.012697] x86/PAT: Configuration [0-7]: WB  WC  UC- UC  WB  WP  UC- WT
[0.013048] e820: update [mem 0xd800-0x] usable ==> reserved
[0.013083] last_pfn = 0xd7fa0 max_arch_pfn = 0x4
[0.018157] found SMP MP-table at [mem 0x000ff780-0x000ff78f]
[0.018215] Using GB pages for direct mapping
[0.018603] ACPI: Early table checksum verification disabled
[0.018613] ACPI: RSDP 0x000F9EE0 24 (v02 SUN   )
[0.018623] ACPI: XSDT 0xD7FB0100 9C (v01 SUNX4600 M2 
0132 MSFT 0097)
[0.018635] ACPI: FACP 0xD7FB0290 F4 (v03 SUNX4600 M2 
0132 MSFT 0097)
[0.018645] ACPI BIOS Warning (bug): 32/64X length mismatch in 
FADT/Gpe0Block: 64/32 (20201113/tbfadt-564)
[0.018652] ACPI BIOS Warning (bug): 32/64X length mismatch in 
FADT/Gpe1Block: 128/64 (20201113/tbfadt-564)
[0.018658] ACPI: DSDT 0xD7FB0710 007DF7 (v01 SUNX4600 M2 
0132 INTL 20051117)
[0.018664] ACPI: FACS 0xD7FBE000 40
[0.018667] ACPI: FACS 0xD7FBE000 40
[0.018671] ACPI: APIC 0xD7FB0390 000170 (v01 SUNX4600 M2 
0132 MSFT 0097)
[0.018676] ACPI: SPCR 0xD7FB0500 50 (v01 SUNX4600 M2 
0132 MSFT 0097)
[0.018681] ACPI: MCFG 0xD7FB0550 3C (v01 SUNX4600 M2 
0132 MSFT 0097)
[0.018686] ACPI: SLIT 0xD7FB064C 6C (v01 SUNX4600 M2 
0132 MSFT 0097)
[0.018691] ACPI: SPMI 0xD7FB06C0 41 (v05 SUNOEMSPMI  
0132 MSFT 0097)
[0.018695] ACPI: OEMB 0xD7FBE040 63 (v01 SUNX4600 M2 
0132 MSFT 0097)
[0.018700] ACPI: SRAT 0xD7FB8510 0003C0 (v01 AMDFAM_F_10 
0002 AMD  0001)
[0.018705] ACPI: HPET 0xD7FB88D0 38 (v01 SUNX4600 M2 
0132 MSFT 0097)
[0.018709] ACPI: IPET 0xD7FB8910 38 (v01 SUN

Re: 5.11-rc4+git: Shortest NUMA path spans too many nodes

2021-01-21 Thread Valentin Schneider


(+Cc relevant folks)

Hi,

On 21/01/21 15:41, Meelis Roos wrote:
> This happens on Sun Fire X4600 M2 - 32 cores in 8 CPU slots. 5.10 was silent. 
> Current git and
> 5.10.0-13256-g5814bc2d4cc2 exhibit this message in dmesg but otherwise seem 
> to work fine
> (kernel compilation succeeds).
>

b5b217346de8 ("sched/topology: Warn when NUMA diameter > 2") was added in
5.11-rc1, and I believe was marked for stable.

It doesn't come with a scheduler behaviour change, it only catches
topologies that end up being silently (unless run with SCHED_DEBUG=y)
misrepresented / misinterpreted by the scheduler.

Up until now I had only seen it fire on a single, somewhat unusual
topology. As fixing it is far from trivial, I figured adding this warning
would let us build a case for actually fixing it if we get some more
reports.

Could you paste the output of the below?

  $ cat /sys/devices/system/node/node*/distance

Additionally, booting your system with CONFIG_SCHED_DEBUG=y and
appending 'sched_debug' to your cmdline should yield some extra data.