Scott Cheloha <scottchel...@gmail.com> wrote: > > > How about this. Kill the spc_ldavg calculation. Its use is more then > > > questionable. The cpu selection code using this is not wroking well and > > > process stealing will do the rest. > > This is more or less what I said yesterday. The per-CPU load average > is not useful for deciding where to put a thread.
I guess you have not been reading mpi's scheduler diff. The entire idea of "placing a thread" is 1980's single-processor flawed. > Of the variables we > have available to consider, only the current length of the runqueue is > useful. No, that concept is also broken. On your 8-cpu laptop, the runqueue does not work at all. Typically, the number of available cpu's exceeds the ready-to-run processes. For workloads where the ready process count exceeds the cpus, the processes get put onto the wrong cpu's queues -- and because scheduler code runs so rarely, this is all a waste of time. What actually happens is pretty much all process selection happen based upon a process on a cpu going to sleep, and that cpu finds it's runqueue is empty because other cpu's have stolen it empty, so that cpu proceeds to steal out of another cpu's runqueue. All process progress really depends upon stealing processes, fixing the other cpu's runqueue with locks, and thus ignoring any pre-calculation by the 'scheduler code'. All of this stealing requires big locks, to protect the scheduler code which is occasionally (let's be honest -- rarely?) re-organizing these stupid runqueues, which then get ignored in the typical case. Those locks are so crazy weird, we've been confused for decades about how to improve it. It appears there are no small steps, and we probably need a "Briandead / Dead Alive" lawnmower procedure, and then rebuild afterwards. I think you will soon join the club of people who believe this code from 1980 is so completely unsuitable that not one line of it can be reused.