> Date: Mon, 6 Sep 2021 21:43:29 +0200
> From: Patrick Wildt <patr...@blueri.se>
> 
> Am Fri, Jul 30, 2021 at 07:55:29PM +0200 schrieb Alexander Bluhm:
> > On Mon, Jul 26, 2021 at 08:12:39AM -0500, Scott Cheloha wrote:
> > > On Fri, Jun 25, 2021 at 06:09:27PM -0500, Scott Cheloha wrote:
> > > 1 month bump.  I really appreciate the tests I've gotten so far, thank
> > > you.
> > 
> > On my Xeon machine it works and all regress tests pass.
> > 
> > But it fails on my old Opteron machine.  It hangs after attaching
> > cpu1.
> 
> This seems to be caused by contention on the mutex in i8254's gettick().
> 
> With Scott's diff, delay_func is i8254_delay() on that old AMD machine.
> Its gettick() implementation uses a mutex to protect I/O access to the
> i8254.
> 
> When secondary CPUs come up, they will wait for CPU0 to let them boot up
> further by checking for a flag:
> 
>       /*
>        * We need to wait until we can identify, otherwise dmesg
>        * output will be messy.
>        */
>       while ((ci->ci_flags & CPUF_IDENTIFY) == 0)
>               delay(10);
> 
> Now that machine has 3 secondary cores that are spinning like that.  At
> the same time CPU0 waits for the core to come up:
> 
>       /* wait for it to identify */
>       for (i = 2000000; (ci->ci_flags & CPUF_IDENTIFY) && i > 0; i--)
>               delay(10);
> 
> That means we have 3-4 cores spinning just to be able to delay().  Our
> mutex implementation isn't fair, which means whoever manages to claim
> the free mutex wins.  Now if CPU2 and CPU3 are spinning all the time,
> CPU1 identifies and needs delay() and CPU0 waits for CPU1, maybe the
> one that needs to make progress never gets it.
> 
> I changed those delay(10) in cpu_hatch() to CPU_BUSY_CYCLE() and it went
> ahead a bit better instead of hanging forever.
> 
> Then I remembered an idea something from years ago: fair kernel mutexes,
> so basically mutexes implemented as ticket lock, like our kerne lock.
> 
> I did a quick diff, which probably contains a million bugs, but with
> this bluhm's machine boots up well.
> 
> I'm not saying this is the solution, but it might be.

So the idea really is that the kernel mutexes are cheap and simple
spin locks.  The assumption has always been that there shouldn't be a
lot of contention on them.  If you have contention, your locking
probably isn't fine-grained enough, or you're using the wrong lock
type.  Note that our mpsafe pmaps use a per-page mutex.  So increasing
the size of struct mutex is going to have a significant impact.

Maybe we need another lock type, although we already have one that
tries to be "fair": struct __mp_lock, which is what we use for the
kernel lock and the scheduler lock.  A non-recursive version of that
might make sense.

> diff --git a/sys/kern/kern_lock.c b/sys/kern/kern_lock.c
> index 5cc55bb256a..c6a284beb51 100644
> --- a/sys/kern/kern_lock.c
> +++ b/sys/kern/kern_lock.c
> @@ -248,6 +248,8 @@ __mtx_init(struct mutex *mtx, int wantipl)
>       mtx->mtx_owner = NULL;
>       mtx->mtx_wantipl = wantipl;
>       mtx->mtx_oldipl = IPL_NONE;
> +     mtx->mtx_ticket = 0;
> +     mtx->mtx_cur = 0;
>  }
>  
>  #ifdef MULTIPROCESSOR
> @@ -255,15 +257,26 @@ void
>  mtx_enter(struct mutex *mtx)
>  {
>       struct schedstate_percpu *spc = &curcpu()->ci_schedstate;
> +     struct cpu_info *ci = curcpu();
> +     unsigned int t;
>  #ifdef MP_LOCKDEBUG
>       int nticks = __mp_lock_spinout;
>  #endif
> +     int s;
> +
> +     /* Avoid deadlocks after panic or in DDB */
> +     if (panicstr || db_active)
> +             return;
>  
>       WITNESS_CHECKORDER(MUTEX_LOCK_OBJECT(mtx),
>           LOP_EXCLUSIVE | LOP_NEWORDER, NULL);
>  
> +     if (mtx->mtx_wantipl != IPL_NONE)
> +             s = splraise(mtx->mtx_wantipl);
> +
>       spc->spc_spinning++;
> -     while (mtx_enter_try(mtx) == 0) {
> +     t = atomic_inc_int_nv(&mtx->mtx_ticket) - 1;
> +     while (READ_ONCE(mtx->mtx_cur) != t) {
>               CPU_BUSY_CYCLE();
>  
>  #ifdef MP_LOCKDEBUG
> @@ -275,12 +288,21 @@ mtx_enter(struct mutex *mtx)
>  #endif
>       }
>       spc->spc_spinning--;
> +
> +     mtx->mtx_owner = curcpu();
> +     if (mtx->mtx_wantipl != IPL_NONE)
> +             mtx->mtx_oldipl = s;
> +#ifdef DIAGNOSTIC
> +     ci->ci_mutex_level++;
> +#endif
> +     WITNESS_LOCK(MUTEX_LOCK_OBJECT(mtx), LOP_EXCLUSIVE);
>  }
>  
>  int
>  mtx_enter_try(struct mutex *mtx)
>  {
> -     struct cpu_info *owner, *ci = curcpu();
> +     struct cpu_info *ci = curcpu();
> +     unsigned int t;
>       int s;
>  
>       /* Avoid deadlocks after panic or in DDB */
> @@ -290,13 +312,15 @@ mtx_enter_try(struct mutex *mtx)
>       if (mtx->mtx_wantipl != IPL_NONE)
>               s = splraise(mtx->mtx_wantipl);
>  
> -     owner = atomic_cas_ptr(&mtx->mtx_owner, NULL, ci);
>  #ifdef DIAGNOSTIC
> -     if (__predict_false(owner == ci))
> +     if (__predict_false(mtx->mtx_owner == ci))
>               panic("mtx %p: locking against myself", mtx);
>  #endif
> -     if (owner == NULL) {
> +
> +     t = READ_ONCE(mtx->mtx_cur);
> +     if (atomic_cas_uint(&mtx->mtx_ticket, t, t + 1) == t) {
>               membar_enter_after_atomic();
> +             mtx->mtx_owner = curcpu();
>               if (mtx->mtx_wantipl != IPL_NONE)
>                       mtx->mtx_oldipl = s;
>  #ifdef DIAGNOSTIC
> @@ -369,6 +393,9 @@ mtx_leave(struct mutex *mtx)
>       membar_exit();
>  #endif
>       mtx->mtx_owner = NULL;
> +#ifdef MULTIPROCESSOR
> +     atomic_inc_int_nv(&mtx->mtx_cur);
> +#endif
>       if (mtx->mtx_wantipl != IPL_NONE)
>               splx(s);
>  }
> 
> 

Reply via email to