I have a couple ideas on how this could be improved.

I don't know how eager the Linux kernel scheduler is at moving processes
to new CPUs. I'm sure there's some sort of cost function considering the
performance cost of, e.g., moving your memory across physical boundaries
like NUMA nodes (?).

Sleeping for n seconds is always going to be technically flaky. The
simplest choice would be to wait longer, but we need to sample the new
cpu that entire time. So my first proposal (A) is to sample the CPU in a
loop for the moved process, up to a timeout. That way, we will catch if
the process lands there.

Another idea is setting a cpu affinity on one or more of the processes
so that the scheduler MUST assign it to the newly-onlined core. However,
this changes the semantics of what we are testing, and the test may no
longer be valid for its original intended purposes.

Aside: I wish the docs explained in more detail what the kernel bugs
were that led to their creation (disclaimer: I haven't looked that
deeply).

Finally, perhaps we should offline all but one core, start all the idle
processes on that single core, and then open up the floodgates. If my
assumptions about the scheduler's internal workings are correct, this is
the best-case scenario to load-balance the processes across cores. It's
definitely possible that it won't happen though, so this may also
require a timeout loop that samples each core... sigh.

To get a clearer picture on what will work, I need more info about:
- How the scheduler chooses a cpu core to run a process on
- The original bug this test was created for

Fortunately, the code is pretty straightforward. Note that as of writing
this comment, this test suite is in-sync (no changes) with upstream
(https://github.com/linux-test-
project/ltp/tree/master/testcases/kernel/hotplug/cpu_hotplug).

Perhaps we could ask the upstream maintainers for guidance?

** Changed in: ubuntu-kernel-tests
     Assignee: (unassigned) => Benjamin Wheeler (benjaminwheeler)

** Changed in: ubuntu-kernel-tests
       Status: New => Confirmed

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1931390

Title:
  cpuhotplug03 in cpuhotplug from ubuntu_ltp failed after successful
  CPU1 offline

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/1931390/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to