I have a couple ideas on how this could be improved. I don't know how eager the Linux kernel scheduler is at moving processes to new CPUs. I'm sure there's some sort of cost function considering the performance cost of, e.g., moving your memory across physical boundaries like NUMA nodes (?).
Sleeping for n seconds is always going to be technically flaky. The simplest choice would be to wait longer, but we need to sample the new cpu that entire time. So my first proposal (A) is to sample the CPU in a loop for the moved process, up to a timeout. That way, we will catch if the process lands there. Another idea is setting a cpu affinity on one or more of the processes so that the scheduler MUST assign it to the newly-onlined core. However, this changes the semantics of what we are testing, and the test may no longer be valid for its original intended purposes. Aside: I wish the docs explained in more detail what the kernel bugs were that led to their creation (disclaimer: I haven't looked that deeply). Finally, perhaps we should offline all but one core, start all the idle processes on that single core, and then open up the floodgates. If my assumptions about the scheduler's internal workings are correct, this is the best-case scenario to load-balance the processes across cores. It's definitely possible that it won't happen though, so this may also require a timeout loop that samples each core... sigh. To get a clearer picture on what will work, I need more info about: - How the scheduler chooses a cpu core to run a process on - The original bug this test was created for Fortunately, the code is pretty straightforward. Note that as of writing this comment, this test suite is in-sync (no changes) with upstream (https://github.com/linux-test- project/ltp/tree/master/testcases/kernel/hotplug/cpu_hotplug). Perhaps we could ask the upstream maintainers for guidance? ** Changed in: ubuntu-kernel-tests Assignee: (unassigned) => Benjamin Wheeler (benjaminwheeler) ** Changed in: ubuntu-kernel-tests Status: New => Confirmed -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1931390 Title: cpuhotplug03 in cpuhotplug from ubuntu_ltp failed after successful CPU1 offline To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/1931390/+subscriptions -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
