On Wed, 20 Feb 2019 14:08:45 -0300
Martin Pieuchot <[email protected]> wrote:
> When choosing the initial CPU for a new thread to run on, the scheduler
> do not consider that CPU0 is handling interrupts.
>
> Threads that are created early, like the network taskq, always end up
> scheduled on CPU0. If the machine isn't running many programs, like
> simple firewalls, there won't be contention and the thread will always
> stay on its original CPU.
>
> This behavior derives to how yield() and preempt() are implemented. These
> functions do not look for another CPU to run on when a busy thread tries to
> be cooperative. This is what we want. However in the case of the network
> taskq we'd prefer to avoid CPU0.
>
> I don't know if enforcing this behavior makes sense or if we should
> accept that in some moon phases our forwarding performances are only 50%
> of what they could be.
>
> However using the same logic to pick a CPU during fork than during a
> sleep cycle seems to be good enough to avoid the problem described above.
> If there's contention the taskq will move between CPUs, if there isn't it
> will stay, generally, on CPU1.
>
> Since FORK_PPWAIT isn't implemented since 10 years, I'd say it's also a
> simplification :o)
>
> What would you think of killing sched_choosecpu_fork()?
>
> Any other suggestion?
>
> ok?
>
> Index: kern/kern_fork.c
> ===================================================================
> RCS file: /cvs/src/sys/kern/kern_fork.c,v
> retrieving revision 1.209
> diff -u -p -r1.209 kern_fork.c
> --- kern/kern_fork.c 6 Jan 2019 12:59:45 -0000 1.209
> +++ kern/kern_fork.c 15 Feb 2019 13:49:32 -0000
> @@ -329,7 +329,7 @@ fork_thread_start(struct proc *p, struct
>
> SCHED_LOCK(s);
> p->p_stat = SRUN;
> - p->p_cpu = sched_choosecpu_fork(parent, flags);
> + p->p_cpu = sched_choosecpu(parent);
> setrunqueue(p);
> SCHED_UNLOCK(s);
> }
> Index: kern/kern_sched.c
> ===================================================================
> RCS file: /cvs/src/sys/kern/kern_sched.c,v
> retrieving revision 1.54
> diff -u -p -r1.54 kern_sched.c
> --- kern/kern_sched.c 17 Nov 2018 23:10:08 -0000 1.54
> +++ kern/kern_sched.c 15 Feb 2019 13:50:06 -0000
> @@ -340,62 +340,6 @@ again:
> }
>
> struct cpu_info *
> -sched_choosecpu_fork(struct proc *parent, int flags)
> -{
> -#ifdef MULTIPROCESSOR
> - struct cpu_info *choice = NULL;
> - fixpt_t load, best_load = ~0;
> - int run, best_run = INT_MAX;
> - struct cpu_info *ci;
> - struct cpuset set;
> -
> -#if 0
> - /*
> - * XXX
> - * Don't do this until we have a painless way to move the cpu in exec.
> - * Preferably when nuking the old pmap and getting a new one on a
> - * new cpu.
> - */
> - /*
> - * PPWAIT forks are simple. We know that the parent will not
> - * run until we exec and choose another cpu, so we just steal its
> - * cpu.
> - */
> - if (flags & FORK_PPWAIT)
> - return (parent->p_cpu);
> -#endif
> -
> - /*
> - * Look at all cpus that are currently idle and have nothing queued.
> - * If there are none, pick the one with least queued procs first,
> - * then the one with lowest load average.
> - */
> - cpuset_complement(&set, &sched_queued_cpus, &sched_idle_cpus);
> - cpuset_intersection(&set, &set, &sched_all_cpus);
> - if (cpuset_first(&set) == NULL)
> - cpuset_copy(&set, &sched_all_cpus);
> -
> - while ((ci = cpuset_first(&set)) != NULL) {
> - cpuset_del(&set, ci);
> -
> - load = ci->ci_schedstate.spc_ldavg;
> - run = ci->ci_schedstate.spc_nrun;
> -
> - if (choice == NULL || run < best_run ||
> - (run == best_run &&load < best_load)) {
> - choice = ci;
> - best_load = load;
> - best_run = run;
> - }
> - }
> -
> - return (choice);
> -#else
> - return (curcpu());
> -#endif
> -}
> -
> -struct cpu_info *
> sched_choosecpu(struct proc *p)
> {
> #ifdef MULTIPROCESSOR
> Index: sys/sched.h
> ===================================================================
> RCS file: /cvs/src/sys/sys/sched.h,v
> retrieving revision 1.50
> diff -u -p -r1.50 sched.h
> --- sys/sched.h 17 Nov 2018 23:10:08 -0000 1.50
> +++ sys/sched.h 15 Feb 2019 13:50:14 -0000
> @@ -150,7 +150,6 @@ void mi_switch(void);
> void cpu_switchto(struct proc *, struct proc *);
> struct proc *sched_chooseproc(void);
> struct cpu_info *sched_choosecpu(struct proc *);
> -struct cpu_info *sched_choosecpu_fork(struct proc *parent, int);
> void cpu_idle_enter(void);
> void cpu_idle_cycle(void);
> void cpu_idle_leave(void);
>
In sched_choosecpu_fork(), we see best_run is INT_MAX, the comparison below
actually reduces to a assignment ==> choice = ci;
- if (choice == NULL || run < best_run ||
- (run == best_run &&load < best_load)) {
- choice = ci;
- best_load = load;
- best_run = run;
- }
because run will always be less than INT_MAX + run can rarely be INT_MAX, and
choice is always NULL!
In sched_choosecpu(), last_cost is not being used where it could be useful.
Setting to INT_MAX, the comparison will always be false. This is like you can
safely delete sched_proc_to_cpu_cost() also. choice is again null.
int cost = sched_proc_to_cpu_cost(ci, p);
if (choice == NULL || cost < last_cost) {
choice = ci;
last_cost = cost;
}
cpuset_del(&set, ci);
Finally, the other call to sched_proc_to_cpu_cost() in kern/kern_fork.c can be
wrapped in ifdef MULTIPROCESSOR instead of calling the function if it's a
single CPU, save a function call.
static inline void
fork_thread_start(struct proc *p, struct proc *parent, int flags)
{
int s;
SCHED_LOCK(s);
p->p_stat = SRUN;
#ifdef MULTIPROCESSOR
p->p_cpu = sched_choosecpu(parent);
#else
p->p_cpu = 0;
#endif
setrunqueue(p);
SCHED_UNLOCK(s);
}