** Description changed: [Impact] - CONFIG_NUMA_BALANCING and CONFIG_NUMA_BALANCING_DEFAULT_ENABLED were both set to =y in hwe-x/hwe-y. This changed to =n in hwe-z, unintentionally as far as I can tell. This can lead to performance degradation on NUMA-based arm64 systems when processes migrate, and their memory accesses now suffer additional latency. + CONFIG_NUMA_BALANCING and CONFIG_NUMA_BALANCING_DEFAULT_ENABLED were both set to =y in hwe-x/hwe-y. This changed to =n in hwe-z, unintentionally as far as I can tell. This can lead to performance degradation on NUMA-based arm64 systems when threads migrate, and their memory accesses now suffer additional latency. [Test Case] At a functional level: - test -f /proc/sys/kernel/numabalancing + $ test -f /proc/sys/kernel/numabalancing - Performance? + Performance: + + $ perf bench numa -a + I didn't see any significant changes in the RAM-bw tests (expected). + For the convergence tests, I observed the following results, which appear to be all within reasonable variance. + + Test | Balancing=n | Balancing=y + ------------------------------------- + 1x3 | No-Converge | No-Converge + 1x4 | No-Converge | 0.576s + 1x6 | No-Converge | No-Converge + 2x3 | No-Converge | No-Converge + 3x3 | No-Converge | No-Converge + 4x4 | No-Converge | No-Converge + 4x4-NOTHP| No-Converge | No-Converge + 4x6 | No-Converge | No-Converge + 4x8 | No-Converge | No-Converge + 8x4 | No-Converge | No-Converge + 8x4-NOTHP| No-Converge | No-Converge + 3x1 | 0.848s | 1.212s + 4x1 | 0.832s | 0.712s + 8x1 | 0.792s | 0.649s + 16x1 | 1.511s | 1.485s + 32x1 | 0.750s | 0.899s + + Finally, for the bw tests, I see significant improvements across the board: + Test | BW Improvement + ------------------------- + ======= Process ========= + 2x1 | 2.2% + 3x1 | 61.4% + 4x1 | 25.0% + 8x1 | 104.6% + 8x1-NOTHP | 107.6% + 16x1 | 200.9% + ======= Thread ========== + 4x1 | 10.9% + 8x1 | 107.4% + 16x1 | 230.7% + 32x1 | 239.7% + 2x3 | 13.5% + 4x4 | 69.2% + 4x6 | 84.4% + 4x8 | 79.7% + 4x8-NOTHP | 152.5% + 3x3 | 96.1% + 5x5 | 150.2% + 2x16 | 122.6% + 1x32 | 40.5% [Regression Risk] + This is changing a config only on arm64, so the regression risk will be limited to those platforms. The code we will be enabling on arm64 is already enabled on other architectures (!s390x), so has been tested within Ubuntu zesty already. This was previous also enabled on arm64 in hwe-x/hwe-y, so we can gain some confidence from that. + + There is certainly a possibility that this negatively impacts + performance for certain workloads on NUMA/arm64 systems. If that occurs, + there is a sysctl that can be used to disable this feature.
-- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1690914 Title: [Regression] NUMA_BALANCING disabled on arm64 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1690914/+subscriptions -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
