Applications using multiple threads often call sched_yield(2) to
indicate that one of the threads cannot make any progress because
it is waiting for a resource held by another one.

One example of this scenario is the _spinlock() implementation of
our librthread.  But if you look on https://codesearch.debian.net
you can find much more use cases, notably MySQL, PostgreSQL, JDK,
libreoffice, etc.

Now the problem with our current scheduler is that the priority of
a thread decreases when it is the "curproc" of a CPU.  So the threads
that don't run and sched_yield(2) end up having a higher priority than
the thread holding the resource.  Which means that it's really hard for
such multi-threaded applications to make progress, resulting in a lot of
IPIs numbers.
That'd also explain why if you have a more CPUs, let's say 4 instead
of 2, your application will more likely make some progress and you'll
see less sluttering/freezing.

So what the diff below does is that it penalizes the threads from
multi-threaded applications such that progress can be made.  It is
inspired from the recent scheduler work done by Michal Mazurek on
tech@.

I experimented with various values for "p_priority" and this one is
the one that generates fewer # IPIs when watching a HD video on firefox. 
Because yes, with this diff, now I can.

I'd like to know if dereferencing ``p_p'' is safe without holding the
KERNEL_LOCK.

I'm also interested in hearing from more people using multi-threaded
applications.

Index: kern/sched_bsd.c
===================================================================
RCS file: /cvs/src/sys/kern/sched_bsd.c,v
retrieving revision 1.43
diff -u -p -r1.43 sched_bsd.c
--- kern/sched_bsd.c    9 Mar 2016 13:38:50 -0000       1.43
+++ kern/sched_bsd.c    19 Mar 2016 12:21:36 -0000
@@ -298,7 +298,16 @@ yield(void)
        int s;
 
        SCHED_LOCK(s);
-       p->p_priority = p->p_usrpri;
+       /*
+        * If one of the threads of a multi-threaded process called
+        * sched_yield(2), drop its priority to ensure its siblings
+        * can make some progress.
+        */
+       if (TAILQ_FIRST(&p->p_p->ps_threads) == p &&
+           TAILQ_NEXT(p, p_thr_link) == NULL)
+               p->p_priority = p->p_usrpri;
+       else
+               p->p_priority = min(MAXPRI, p->p_usrpri * 2);
        p->p_stat = SRUN;
        setrunqueue(p);
        p->p_ru.ru_nvcsw++;

Reply via email to