Re: sched_steal_proc(): steal from highest loaded CPU

Amit Kulkarni Tue, 04 Feb 2020 11:15:14 -0800

> > When a cpu is idle, and wants to steal, it should steal from worst loaded 
> > cpu, i.e. with the highest cost, not the least cost.
> 
> What you say might be a valid choice.  However I'm not sure to
> understand how does it relate to the behavior of the code?
> 
> When stealing sched_proc_to_cpu_cost() is always called with for the
> same CPU, `ci' is always the same, which means `cost' is equivalent to
> `p_usrpri', right?
> 
> Now the priority space is defined in sys/param.h and the higher priority
> correspond to a small number.  So are we sure we want to steal a thread
> with the highest `cost', which would mean with the lowest priority?  Or
> did I miss something?


When a cpu is idle, it is looking for a proc to steal. Now these two loops 
(outer queued procs cpu loop + inner runqueue per cpu loop) basically loops 
over all procs in the first readily runnable runqueue per each cpu. We should 
ideally steal from the most loaded CPU, denoted by cost. Cost takes into 
account: priority, runqueue length, primary cpu check, load average, and 
finally memory footprint. Here, the most dominant variable should be the 
runqueue length, since it is a multiplication with sched_cost_runnable. 

'ci' changes after we proceed to another cpu, and cost will be different for 
each cpu+proc combination. Actually, in the steal case, if there is a large 
difference in spc_curpriority and p_usrpri between running proc which is 
SONPROC vs the current proc, then the cost will be skewed towards an outlier, 
but this will happen for all procs for that ci as you analyzed.

To summarize: currently, if a cpu is running 1-2 procs with large differences 
in priority, there is a high possibility that this function will go steal a 
proc from it for the just idled cpu. Whereas at the same time, another cpu is 
running 5-7 procs with exact same priorities, that will be ignored in the above 
calculation! We should have stolen from the cpu with 5-7 procs.

I hope this is a correct interpretation? Is there something wrong in this 
analysis?

Another small optimization is to issue a break right after the if(cost > 
bestcost) check, from the TAILQ_FOREACH, so we short circuit the potential 
exponential lookup (i * j). Currently, this is O(N * N )! With that 
optimization in place, we now looked at this cpu, and maybe we consider keeping 
this unpegged proc in mind, for our stealing calculation. Let's now go look at 
just 1 proc in other cpu's. Why iterate all procs at the head of runqueue of 
each cpu (worst case), one proc is enough per cpu in this stealing calculation, 
is it not? Once we decide which is the highest cost, we send it to the idle 
cpu. Basically, we try to shorten this O(N * N) lookup. My guess, for machines 
with large ncpu, this will be really noticeable slowness, in sched_steal_proc().

Thanks

> 
> > Index: kern/kern_sched.c
> > ===================================================================
> > RCS file: /cvs/src/sys/kern/kern_sched.c,v
> > retrieving revision 1.64
> > diff -u -p -u -p -r1.64 kern_sched.c
> > --- kern/kern_sched.c       30 Jan 2020 08:51:27 -0000      1.64
> > +++ kern/kern_sched.c       4 Feb 2020 14:25:59 -0000
> > @@ -487,7 +487,7 @@ sched_steal_proc(struct cpu_info *self)
> >     struct proc *best = NULL;
> >  #ifdef MULTIPROCESSOR
> >     struct schedstate_percpu *spc;
> > -   int bestcost = INT_MAX;
> > +   int bestcost = 0;
> >     struct cpu_info *ci;
> >     struct cpuset set;
> >  
> > @@ -515,7 +515,7 @@ sched_steal_proc(struct cpu_info *self)
> >  
> >                     cost = sched_proc_to_cpu_cost(self, p);
> >  
> > -                   if (best == NULL || cost < bestcost) {
> > +                   if (cost > bestcost) {
> >                             best = p;
> >                             bestcost = cost;
> >                     }

a break statement right here.

> > @@ -524,7 +524,6 @@ sched_steal_proc(struct cpu_info *self)
> >     if (best == NULL)
> >             return (NULL);
> >  
> > -   spc = &best->p_cpu->ci_schedstate;
> >     remrunqueue(best);
> >     best->p_cpu = self;
> >  
> > 


-- 
Amit Kulkarni <[email protected]>

Re: sched_steal_proc(): steal from highest loaded CPU

Reply via email to