** Description changed:

- On scobee-kernel(arm64) with hirsute:linux(5.11.0-41.45) for
- sru-20211108 there are several reports about the sched domain not
- covering the full range. The same does not happen on kuzzle. But 32 is a
- bit of a suspicious number.
+ [Impact]
+ The LTP cpuset_sched_domains test, authored by Miao Xie, fails on a Kunpeng920
+ server that has 4 NUMA nodes:
+   https://launchpad.net/bugs/1951289
  
-   Running tests.......
-   cpuset_sched_domains 1 TINFO: CPUs are numbered continuously starting at 0 
(0-127)
-   cpuset_sched_domains 1 TINFO: Nodes are numbered continuously starting at 0 
(0-3)
-   cpuset_sched_domains 1 TINFO: root group load balance test
-   cpuset_sched_domains 1 TINFO:      sched load balance: 0
-   cpuset_sched_domains 1 TINFO: CPU hotplug:
-   cpuset_check_domains    1  TPASS  :  check_sched_domains passed
-   cpuset_sched_domains 1 TPASS: partition sched domains succeeded.
-   cpuset_sched_domains 3 TINFO: root group load balance test
-   cpuset_sched_domains 3 TINFO:      sched load balance: 1
-   cpuset_sched_domains 3 TINFO: CPU hotplug:
-   cpuset_check_domains    1  TFAIL  :  cpuset_sched_domains_check.c:110: 
cpu32's sched domain is wrong(Domain: 0-127, CPU's Sched Domain: 0-95).
-   cpuset_sched_domains 3 TFAIL: partition sched domains failed.
-   cpuset_sched_domains 5 TINFO: root group load balance test
-   cpuset_sched_domains 5 TINFO:      sched load balance: 0
-   cpuset_sched_domains 5 TINFO: CPU hotplug:
-   cpuset_check_domains    1  TPASS  :  check_sched_domains passed
-   cpuset_sched_domains 5 TPASS: partition sched domains succeeded.
-   cpuset_sched_domains 7 TINFO: root group load balance test
-   cpuset_sched_domains 7 TINFO:      sched load balance: 0
-   cpuset_sched_domains 7 TINFO: CPU hotplug:
-   cpuset_check_domains    1  TPASS  :  check_sched_domains passed
-   cpuset_sched_domains 7 TPASS: partition sched domains succeeded.
-   cpuset_sched_domains 9 TINFO: root group load balance test
-   cpuset_sched_domains 9 TINFO:      sched load balance: 1
-   cpuset_sched_domains 9 TINFO: CPU hotplug:
-   cpuset_check_domains    1  TFAIL  :  cpuset_sched_domains_check.c:110: 
cpu32's sched domain is wrong(Domain: 0-127, CPU's Sched Domain: 0-95).
-   cpuset_sched_domains 9 TFAIL: partition sched domains failed.
-   cpuset_sched_domains 11 TINFO: root group load balance test
-   cpuset_sched_domains 11 TINFO:      sched load balance: 1
-   cpuset_sched_domains 11 TINFO: CPU hotplug:
-   cpuset_check_domains    1  TFAIL  :  cpuset_sched_domains_check.c:110: 
cpu32's sched domain is wrong(Domain: 0-127, CPU's Sched Domain: 0-95).
-   cpuset_sched_domains 11 TFAIL: partition sched domains failed.
-   cpuset_sched_domains 13 TINFO: general group load balance test
-   cpuset_sched_domains 13 TINFO: root group info:
-   cpuset_sched_domains 13 TINFO:      sched load balance: 0
-   cpuset_sched_domains 13 TINFO: general group info:
-   cpuset_sched_domains 13 TINFO:      cpus: -
-   cpuset_sched_domains 13 TINFO:      sched load balance: 1
-   cpuset_check_domains    1  TPASS  :  check_sched_domains passed
-   cpuset_sched_domains 13 TPASS: partition sched domains succeeded.
-   cpuset_sched_domains 15 TINFO: general group load balance test
-   cpuset_sched_domains 15 TINFO: root group info:
-   cpuset_sched_domains 15 TINFO:      sched load balance: 0
-   cpuset_sched_domains 15 TINFO: general group info:
-   cpuset_sched_domains 15 TINFO:      cpus: 1
-   cpuset_sched_domains 15 TINFO:      sched load balance: 0
-   cpuset_check_domains    1  TPASS  :  check_sched_domains passed
-   cpuset_sched_domains 15 TPASS: partition sched domains succeeded.
-   cpuset_sched_domains 17 TINFO: general group load balance test
-   cpuset_sched_domains 17 TINFO: root group info:
-   cpuset_sched_domains 17 TINFO:      sched load balance: 1
-   cpuset_sched_domains 17 TINFO: general group info:
-   cpuset_sched_domains 17 TINFO:      cpus: -
-   cpuset_sched_domains 17 TINFO:      sched load balance: 1
-   cpuset_check_domains    1  TFAIL  :  cpuset_sched_domains_check.c:110: 
cpu32's sched domain is wrong(Domain: 0-127, CPU's Sched Domain: 0-95).
-   cpuset_sched_domains 17 TFAIL: partition sched domains failed.
-   cpuset_sched_domains 19 TINFO: general group load balance test
-   cpuset_sched_domains 19 TINFO: root group info:
-   cpuset_sched_domains 19 TINFO:      sched load balance: 1
-   cpuset_sched_domains 19 TINFO: general group info:
-   cpuset_sched_domains 19 TINFO:      cpus: 1
-   cpuset_sched_domains 19 TINFO:      sched load balance: 1
-   cpuset_check_domains    1  TFAIL  :  cpuset_sched_domains_check.c:110: 
cpu32's sched domain is wrong(Domain: 0-127, CPU's Sched Domain: 0-95).
-   cpuset_sched_domains 19 TFAIL: partition sched domains failed.
-   cpuset_sched_domains 21 TINFO: general group load balance test
-   cpuset_sched_domains 21 TINFO: root group info:
-   cpuset_sched_domains 21 TINFO:      sched load balance: 0
-   cpuset_sched_domains 21 TINFO: general group info:
-   cpuset_sched_domains 21 TINFO:      cpus: 1,2
-   cpuset_sched_domains 21 TINFO:      sched load balance: 0
-   cpuset_check_domains    1  TPASS  :  check_sched_domains passed
-   cpuset_sched_domains 21 TPASS: partition sched domains succeeded.
-   cpuset_sched_domains 23 TINFO: general group load balance test
-   cpuset_sched_domains 23 TINFO: root group info:
-   cpuset_sched_domains 23 TINFO:      sched load balance: 0
-   cpuset_sched_domains 23 TINFO: general group info:
-   cpuset_sched_domains 23 TINFO:      cpus: 1,2
-   cpuset_sched_domains 23 TINFO:      sched load balance: 1
-   cpuset_check_domains    1  TPASS  :  check_sched_domains passed
-   cpuset_sched_domains 23 TPASS: partition sched domains succeeded.
-   cpuset_sched_domains 25 TINFO: general group load balance test
-   cpuset_sched_domains 25 TINFO: root group info:
-   cpuset_sched_domains 25 TINFO:      sched load balance: 0
-   cpuset_sched_domains 25 TINFO: general group info:
-   cpuset_sched_domains 25 TINFO:      cpus: 
0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127
-   cpuset_sched_domains 25 TINFO:      sched load balance: 1
-   cpuset_check_domains    1  TFAIL  :  cpuset_sched_domains_check.c:110: 
cpu32's sched domain is wrong(Domain: 0-127, CPU's Sched Domain: 0-95).
-   cpuset_sched_domains 25 TFAIL: partition sched domains failed.
-   cpuset_sched_domains 27 TINFO: general group load balance test
-   cpuset_sched_domains 27 TINFO: root group info:
-   cpuset_sched_domains 27 TINFO:      sched load balance: 0
-   cpuset_sched_domains 27 TINFO: general group1 info:
-   cpuset_sched_domains 27 TINFO:      cpus: 1
-   cpuset_sched_domains 27 TINFO:      sched load balance: 1
-   cpuset_sched_domains 27 TINFO: general group2 info:
-   cpuset_sched_domains 27 TINFO:      cpus: 0
-   cpuset_sched_domains 27 TINFO:      sched load balance: 1
-   cpuset_sched_domains 27 TINFO: CPU hotplug: none
-   cpuset_sched_domains 27 TPASS: partition sched domains succeeded.
-   cpuset_sched_domains 29 TINFO: general group load balance test
-   cpuset_sched_domains 29 TINFO: root group info:
-   cpuset_sched_domains 29 TINFO:      sched load balance: 0
-   cpuset_sched_domains 29 TINFO: general group1 info:
-   cpuset_sched_domains 29 TINFO:      cpus: 1,2
-   cpuset_sched_domains 29 TINFO:      sched load balance: 1
-   cpuset_sched_domains 29 TINFO: general group2 info:
-   cpuset_sched_domains 29 TINFO:      cpus: 0-3
-   cpuset_sched_domains 29 TINFO:      sched load balance: 0
-   cpuset_sched_domains 29 TINFO: CPU hotplug: none
-   cpuset_sched_domains 29 TPASS: partition sched domains succeeded.
-   cpuset_sched_domains 31 TINFO: general group load balance test
-   cpuset_sched_domains 31 TINFO: root group info:
-   cpuset_sched_domains 31 TINFO:      sched load balance: 0
-   cpuset_sched_domains 31 TINFO: general group1 info:
-   cpuset_sched_domains 31 TINFO:      cpus: 1,2
-   cpuset_sched_domains 31 TINFO:      sched load balance: 1
-   cpuset_sched_domains 31 TINFO: general group2 info:
-   cpuset_sched_domains 31 TINFO:      cpus: 0,3
-   cpuset_sched_domains 31 TINFO:      sched load balance: 1
-   cpuset_sched_domains 31 TINFO: CPU hotplug: none
-   cpuset_sched_domains 31 TPASS: partition sched domains succeeded.
-   cpuset_sched_domains 33 TINFO: general group load balance test
-   cpuset_sched_domains 33 TINFO: root group info:
-   cpuset_sched_domains 33 TINFO:      sched load balance: 0
-   cpuset_sched_domains 33 TINFO: general group1 info:
-   cpuset_sched_domains 33 TINFO:      cpus: 1,2
-   cpuset_sched_domains 33 TINFO:      sched load balance: 1
-   cpuset_sched_domains 33 TINFO: general group2 info:
-   cpuset_sched_domains 33 TINFO:      cpus: 1,3
-   cpuset_sched_domains 33 TINFO:      sched load balance: 1
-   cpuset_sched_domains 33 TINFO: CPU hotplug: none
-   cpuset_sched_domains 33 TPASS: partition sched domains succeeded.
-   cpuset_sched_domains 35 TINFO: general group load balance test
-   cpuset_sched_domains 35 TINFO: root group info:
-   cpuset_sched_domains 35 TINFO:      sched load balance: 0
-   cpuset_sched_domains 35 TINFO: general group1 info:
-   cpuset_sched_domains 35 TINFO:      cpus: 1,2
-   cpuset_sched_domains 35 TINFO:      sched load balance: 1
-   cpuset_sched_domains 35 TINFO: general group2 info:
-   cpuset_sched_domains 35 TINFO:      cpus: 1,3
-   cpuset_sched_domains 35 TINFO:      sched load balance: 1
-   cpuset_sched_domains 35 TINFO: CPU hotplug: offline
-   cpuset_sched_domains 35 TPASS: partition sched domains succeeded.
-   cpuset_sched_domains 37 TINFO: general group load balance test
-   cpuset_sched_domains 37 TINFO: root group info:
-   cpuset_sched_domains 37 TINFO:      sched load balance: 0
-   cpuset_sched_domains 37 TINFO: general group1 info:
-   cpuset_sched_domains 37 TINFO:      cpus: 1,2
-   cpuset_sched_domains 37 TINFO:      sched load balance: 1
-   cpuset_sched_domains 37 TINFO: general group2 info:
-   cpuset_sched_domains 37 TINFO:      cpus: 1,3
-   cpuset_sched_domains 37 TINFO:      sched load balance: 1
-   cpuset_sched_domains 37 TINFO: CPU hotplug: online
-   cpuset_sched_domains 37 TPASS: partition sched domains succeeded.
-   INFO: ltp-pan reported some tests FAIL
-   LTP Version: 20210927
-   INFO: Test end time: Sat Nov  6 19:28:17 UTC 2021
+ This does appear to be a real bug. /proc/schedstat displays 4 domain levels 
for
+ CPUs on 2 of the nodes, but only 3 levels for the others 2 (see below).
+ I assume this means the scheduler is making suboptimal decisions about
+ where to place/move processes.
+ 
+ [Test Case]
+ On a 128 core Kunpeng 920 system, observe that half the CPUs are missing a 
3rd level scheduling domain:
+ 
+ ubuntu@d06-4:~$ grep domain2 /proc/schedstat  | wc -l
+ 128
+ ubuntu@d06-4:~$ grep domain3 /proc/schedstat  | wc -l
+ 64
+ ubuntu@d06-4:~$ 
+ 
+ [What Could Go Wrong]
+ This changes the code used for populating sched domains, so it could 
potentially break on other systems, potentially leading to poor scheduling 
characteristics (higher latencies, lower overall throughput etc).

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1951289

Title:
  ubuntu_ltp_controllers:cpuset_sched_domains: tests 3,9,11,17,19,25
  report incorrect sched domain for cpu#32

To manage notifications about this bug go to:
https://bugs.launchpad.net/kunpeng920/+bug/1951289/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to