> Date: Mon, 20 Dec 2010 18:36:54 +0000 > From: Miod Vallat <[email protected]> > > I just noticed this morning that msleep() will not behave correctly on > some platforms, when passed a mutex associated to the IPL_NONE level > (i.e. plain mutex, without an associated spl semantic). > > msleep() needs to run at splsched(), and fiddles with the mutex fields > to make sure mtx_leave() will not lower spl below IPL_SCHED, and fiddles > later on, after taking the mutex back, to make the next mtx_leave() > return to the level the first mtx_enter() was run at. > > This works on amd64, i386, powerpc and sparc64, because the mutex > implementation on these platforms always performs an spl update. > > All other platforms have a slightly different implementation which skips > the splraise()/splx() calls if the mutex is associated to IPL_NONE. This > was done mainly for speed, since some of these platforms have somewhat > expensive spl calls (in comparison to the overhead of the comparison + > branch). > > When invoking msleep() on an IPL_NONE mutex on these platforms, msleep() > will raise to IPL_SCHED, but the next mtx_leave() call, after msleep() > returns, will not alter the current level. Things continue running at > splsched() until the next sleep or preemption. > > There are two ways to fix this: > - make all mutex implementation perform spl operations, even if they are > associated to IPL_NONE. Easy to do. > - make msleep() aware that IPL_NONE mutexens may not change spl, and > avoid using the mutex bowels to store the temporary splsched() > sequence. This is easy to do as well, but requires an extra macro to > be defined, to let msleep() know the ipl the mutex is associated to > (something like #define MUTEX_WANTIPL(mtx) (mtx)->mtx_wantipl) > > Or is there any better option?
I think the solution is to make sure splx() isn't expensive if it doesn't actually lower the priority level. If the guts of splx() really are expensive, we should check whether we are already at the desired priority level and skip the expensive bits if we are. They should be a no-op in that case. Now that check can still be somewhat expensive if the current priority level is kept in a hardware register on some slow bus far away. But then we could consider keeping a software copy in struct cpu_info or so. Making splx() cheap in that case will help with nested splxxx()'s as well. At that point, making all mutex implementations perform spl operations unconditionally becomes the obvious thing to do.
