** Description changed: - On scobee-kernel(arm64) with hirsute:linux(5.11.0-41.45) for - sru-20211108 there are several reports about the sched domain not - covering the full range. The same does not happen on kuzzle. But 32 is a - bit of a suspicious number. + [Impact] + The LTP cpuset_sched_domains test, authored by Miao Xie, fails on a Kunpeng920 + server that has 4 NUMA nodes: + https://launchpad.net/bugs/1951289 - Running tests....... - cpuset_sched_domains 1 TINFO: CPUs are numbered continuously starting at 0 (0-127) - cpuset_sched_domains 1 TINFO: Nodes are numbered continuously starting at 0 (0-3) - cpuset_sched_domains 1 TINFO: root group load balance test - cpuset_sched_domains 1 TINFO: sched load balance: 0 - cpuset_sched_domains 1 TINFO: CPU hotplug: - cpuset_check_domains 1 TPASS : check_sched_domains passed - cpuset_sched_domains 1 TPASS: partition sched domains succeeded. - cpuset_sched_domains 3 TINFO: root group load balance test - cpuset_sched_domains 3 TINFO: sched load balance: 1 - cpuset_sched_domains 3 TINFO: CPU hotplug: - cpuset_check_domains 1 TFAIL : cpuset_sched_domains_check.c:110: cpu32's sched domain is wrong(Domain: 0-127, CPU's Sched Domain: 0-95). - cpuset_sched_domains 3 TFAIL: partition sched domains failed. - cpuset_sched_domains 5 TINFO: root group load balance test - cpuset_sched_domains 5 TINFO: sched load balance: 0 - cpuset_sched_domains 5 TINFO: CPU hotplug: - cpuset_check_domains 1 TPASS : check_sched_domains passed - cpuset_sched_domains 5 TPASS: partition sched domains succeeded. - cpuset_sched_domains 7 TINFO: root group load balance test - cpuset_sched_domains 7 TINFO: sched load balance: 0 - cpuset_sched_domains 7 TINFO: CPU hotplug: - cpuset_check_domains 1 TPASS : check_sched_domains passed - cpuset_sched_domains 7 TPASS: partition sched domains succeeded. - cpuset_sched_domains 9 TINFO: root group load balance test - cpuset_sched_domains 9 TINFO: sched load balance: 1 - cpuset_sched_domains 9 TINFO: CPU hotplug: - cpuset_check_domains 1 TFAIL : cpuset_sched_domains_check.c:110: cpu32's sched domain is wrong(Domain: 0-127, CPU's Sched Domain: 0-95). - cpuset_sched_domains 9 TFAIL: partition sched domains failed. - cpuset_sched_domains 11 TINFO: root group load balance test - cpuset_sched_domains 11 TINFO: sched load balance: 1 - cpuset_sched_domains 11 TINFO: CPU hotplug: - cpuset_check_domains 1 TFAIL : cpuset_sched_domains_check.c:110: cpu32's sched domain is wrong(Domain: 0-127, CPU's Sched Domain: 0-95). - cpuset_sched_domains 11 TFAIL: partition sched domains failed. - cpuset_sched_domains 13 TINFO: general group load balance test - cpuset_sched_domains 13 TINFO: root group info: - cpuset_sched_domains 13 TINFO: sched load balance: 0 - cpuset_sched_domains 13 TINFO: general group info: - cpuset_sched_domains 13 TINFO: cpus: - - cpuset_sched_domains 13 TINFO: sched load balance: 1 - cpuset_check_domains 1 TPASS : check_sched_domains passed - cpuset_sched_domains 13 TPASS: partition sched domains succeeded. - cpuset_sched_domains 15 TINFO: general group load balance test - cpuset_sched_domains 15 TINFO: root group info: - cpuset_sched_domains 15 TINFO: sched load balance: 0 - cpuset_sched_domains 15 TINFO: general group info: - cpuset_sched_domains 15 TINFO: cpus: 1 - cpuset_sched_domains 15 TINFO: sched load balance: 0 - cpuset_check_domains 1 TPASS : check_sched_domains passed - cpuset_sched_domains 15 TPASS: partition sched domains succeeded. - cpuset_sched_domains 17 TINFO: general group load balance test - cpuset_sched_domains 17 TINFO: root group info: - cpuset_sched_domains 17 TINFO: sched load balance: 1 - cpuset_sched_domains 17 TINFO: general group info: - cpuset_sched_domains 17 TINFO: cpus: - - cpuset_sched_domains 17 TINFO: sched load balance: 1 - cpuset_check_domains 1 TFAIL : cpuset_sched_domains_check.c:110: cpu32's sched domain is wrong(Domain: 0-127, CPU's Sched Domain: 0-95). - cpuset_sched_domains 17 TFAIL: partition sched domains failed. - cpuset_sched_domains 19 TINFO: general group load balance test - cpuset_sched_domains 19 TINFO: root group info: - cpuset_sched_domains 19 TINFO: sched load balance: 1 - cpuset_sched_domains 19 TINFO: general group info: - cpuset_sched_domains 19 TINFO: cpus: 1 - cpuset_sched_domains 19 TINFO: sched load balance: 1 - cpuset_check_domains 1 TFAIL : cpuset_sched_domains_check.c:110: cpu32's sched domain is wrong(Domain: 0-127, CPU's Sched Domain: 0-95). - cpuset_sched_domains 19 TFAIL: partition sched domains failed. - cpuset_sched_domains 21 TINFO: general group load balance test - cpuset_sched_domains 21 TINFO: root group info: - cpuset_sched_domains 21 TINFO: sched load balance: 0 - cpuset_sched_domains 21 TINFO: general group info: - cpuset_sched_domains 21 TINFO: cpus: 1,2 - cpuset_sched_domains 21 TINFO: sched load balance: 0 - cpuset_check_domains 1 TPASS : check_sched_domains passed - cpuset_sched_domains 21 TPASS: partition sched domains succeeded. - cpuset_sched_domains 23 TINFO: general group load balance test - cpuset_sched_domains 23 TINFO: root group info: - cpuset_sched_domains 23 TINFO: sched load balance: 0 - cpuset_sched_domains 23 TINFO: general group info: - cpuset_sched_domains 23 TINFO: cpus: 1,2 - cpuset_sched_domains 23 TINFO: sched load balance: 1 - cpuset_check_domains 1 TPASS : check_sched_domains passed - cpuset_sched_domains 23 TPASS: partition sched domains succeeded. - cpuset_sched_domains 25 TINFO: general group load balance test - cpuset_sched_domains 25 TINFO: root group info: - cpuset_sched_domains 25 TINFO: sched load balance: 0 - cpuset_sched_domains 25 TINFO: general group info: - cpuset_sched_domains 25 TINFO: cpus: 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127 - cpuset_sched_domains 25 TINFO: sched load balance: 1 - cpuset_check_domains 1 TFAIL : cpuset_sched_domains_check.c:110: cpu32's sched domain is wrong(Domain: 0-127, CPU's Sched Domain: 0-95). - cpuset_sched_domains 25 TFAIL: partition sched domains failed. - cpuset_sched_domains 27 TINFO: general group load balance test - cpuset_sched_domains 27 TINFO: root group info: - cpuset_sched_domains 27 TINFO: sched load balance: 0 - cpuset_sched_domains 27 TINFO: general group1 info: - cpuset_sched_domains 27 TINFO: cpus: 1 - cpuset_sched_domains 27 TINFO: sched load balance: 1 - cpuset_sched_domains 27 TINFO: general group2 info: - cpuset_sched_domains 27 TINFO: cpus: 0 - cpuset_sched_domains 27 TINFO: sched load balance: 1 - cpuset_sched_domains 27 TINFO: CPU hotplug: none - cpuset_sched_domains 27 TPASS: partition sched domains succeeded. - cpuset_sched_domains 29 TINFO: general group load balance test - cpuset_sched_domains 29 TINFO: root group info: - cpuset_sched_domains 29 TINFO: sched load balance: 0 - cpuset_sched_domains 29 TINFO: general group1 info: - cpuset_sched_domains 29 TINFO: cpus: 1,2 - cpuset_sched_domains 29 TINFO: sched load balance: 1 - cpuset_sched_domains 29 TINFO: general group2 info: - cpuset_sched_domains 29 TINFO: cpus: 0-3 - cpuset_sched_domains 29 TINFO: sched load balance: 0 - cpuset_sched_domains 29 TINFO: CPU hotplug: none - cpuset_sched_domains 29 TPASS: partition sched domains succeeded. - cpuset_sched_domains 31 TINFO: general group load balance test - cpuset_sched_domains 31 TINFO: root group info: - cpuset_sched_domains 31 TINFO: sched load balance: 0 - cpuset_sched_domains 31 TINFO: general group1 info: - cpuset_sched_domains 31 TINFO: cpus: 1,2 - cpuset_sched_domains 31 TINFO: sched load balance: 1 - cpuset_sched_domains 31 TINFO: general group2 info: - cpuset_sched_domains 31 TINFO: cpus: 0,3 - cpuset_sched_domains 31 TINFO: sched load balance: 1 - cpuset_sched_domains 31 TINFO: CPU hotplug: none - cpuset_sched_domains 31 TPASS: partition sched domains succeeded. - cpuset_sched_domains 33 TINFO: general group load balance test - cpuset_sched_domains 33 TINFO: root group info: - cpuset_sched_domains 33 TINFO: sched load balance: 0 - cpuset_sched_domains 33 TINFO: general group1 info: - cpuset_sched_domains 33 TINFO: cpus: 1,2 - cpuset_sched_domains 33 TINFO: sched load balance: 1 - cpuset_sched_domains 33 TINFO: general group2 info: - cpuset_sched_domains 33 TINFO: cpus: 1,3 - cpuset_sched_domains 33 TINFO: sched load balance: 1 - cpuset_sched_domains 33 TINFO: CPU hotplug: none - cpuset_sched_domains 33 TPASS: partition sched domains succeeded. - cpuset_sched_domains 35 TINFO: general group load balance test - cpuset_sched_domains 35 TINFO: root group info: - cpuset_sched_domains 35 TINFO: sched load balance: 0 - cpuset_sched_domains 35 TINFO: general group1 info: - cpuset_sched_domains 35 TINFO: cpus: 1,2 - cpuset_sched_domains 35 TINFO: sched load balance: 1 - cpuset_sched_domains 35 TINFO: general group2 info: - cpuset_sched_domains 35 TINFO: cpus: 1,3 - cpuset_sched_domains 35 TINFO: sched load balance: 1 - cpuset_sched_domains 35 TINFO: CPU hotplug: offline - cpuset_sched_domains 35 TPASS: partition sched domains succeeded. - cpuset_sched_domains 37 TINFO: general group load balance test - cpuset_sched_domains 37 TINFO: root group info: - cpuset_sched_domains 37 TINFO: sched load balance: 0 - cpuset_sched_domains 37 TINFO: general group1 info: - cpuset_sched_domains 37 TINFO: cpus: 1,2 - cpuset_sched_domains 37 TINFO: sched load balance: 1 - cpuset_sched_domains 37 TINFO: general group2 info: - cpuset_sched_domains 37 TINFO: cpus: 1,3 - cpuset_sched_domains 37 TINFO: sched load balance: 1 - cpuset_sched_domains 37 TINFO: CPU hotplug: online - cpuset_sched_domains 37 TPASS: partition sched domains succeeded. - INFO: ltp-pan reported some tests FAIL - LTP Version: 20210927 - INFO: Test end time: Sat Nov 6 19:28:17 UTC 2021 + This does appear to be a real bug. /proc/schedstat displays 4 domain levels for + CPUs on 2 of the nodes, but only 3 levels for the others 2 (see below). + I assume this means the scheduler is making suboptimal decisions about + where to place/move processes. + + [Test Case] + On a 128 core Kunpeng 920 system, observe that half the CPUs are missing a 3rd level scheduling domain: + + ubuntu@d06-4:~$ grep domain2 /proc/schedstat | wc -l + 128 + ubuntu@d06-4:~$ grep domain3 /proc/schedstat | wc -l + 64 + ubuntu@d06-4:~$ + + [What Could Go Wrong] + This changes the code used for populating sched domains, so it could potentially break on other systems, potentially leading to poor scheduling characteristics (higher latencies, lower overall throughput etc).
-- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1951289 Title: ubuntu_ltp_controllers:cpuset_sched_domains: tests 3,9,11,17,19,25 report incorrect sched domain for cpu#32 To manage notifications about this bug go to: https://bugs.launchpad.net/kunpeng920/+bug/1951289/+subscriptions -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
