> Date: Sat, 22 May 2021 11:11:38 +0000
> From: Visa Hankala <v...@hankala.org>
> 
> On Wed, May 19, 2021 at 05:11:09PM -0500, Scott Cheloha wrote:
> > Hi,
> > 
> > visa@ says I need to unlock softintr_dispatch() before I can
> > unlock softclock(), so let's do that.
> > 
> > Additionally, when we call softintr_disestablish() we want to wait for
> > the underlying softintr handle to finish running if it is running.
> > 
> > We can start with amd64.
> > 
> > I think this approach will work:
> > 
> > - Keep a pointer to the running softintr, if any, in the queue.  NULL
> >   the pointer when we return from sih_func().
> > 
> > - Take/release the kernel lock if the SI_MPSAFE flag is present when
> >   we enter/leave sih_func().
> > 
> > - If the handle is running when you call softintr_disestablish(), spin
> >   until the handle isn't running anymore and retry.
> > 
> > There is no softintr manpage but I think it is understood that
> > softintr_disestablish() is only safe to call from a process context,
> > otherwise you may deadlock.  Maybe we should do splassert(IPL_NONE)?
> > 
> > We could probably sleep here instead of spinning.  We'd have to change
> > every softintr_disestablish() implementation to do that, though.
> > Otherwise you'd have different behavior on different platforms.
> 
> I think your diff does not pay enough attention to the fact that soft
> interrupts are handled by all CPUs. I think the diff that I posted
> a while ago [1] is better in that respect.
> 
> Two biggest things that I do not like in my original diff are
> synchronization of handler execution, and use of the SMR barrier.
> 
> [1] https://marc.info/?l=openbsd-tech&m=162092714911609
> 
> The kernel lock has guaranteed that at most one CPU is able to run
> a given soft interrupt handler at a time. My diff used a mutex to
> prevent concurrent execution. However, it is wasteful to spin. It would
> be more economical to let the current runner of the handler re-execute
> the code.
> 
> The SMR barrier in softintr_disestablish() was a trick to drain any
> pending activity. However, it made me feel uneasy because I have not
> checked every caller of softintr_disestablish(). My main worry is not
> the latency but unexpected side effects.
> 
> Below is a revised diff that improves the above two points.
> 
> When a soft interrupt handler is scheduled, it is assigned to a CPU.
> That CPU will keep running the handler as long as there are pending
> requests. Once all pending requests have been drained, the CPU
> relinquishes its hold of the handler. This provides natural
> serialization.
> 
> Now softintr_disestablish() uses spinning for draining activity.
> I still have slight qualms about this, though, because the feature
> has not been so explicit before. Integration with witness(4) might be
> in order.
> 
> softintr_disestablish() uses READ_ONCE() to enforce reloading of the
> value in the busy-wait loop. This way the variable does not need to be
> volatile. (As yet another option, CPU_BUSY_CYCLE() could always
> imply memory clobbering, which should make an optimizing compiler
> redo the load.) For consistency with this READ_ONCE(), WRITE_ONCE() is
> used whenever the variable is written, excluding the initialization.
> 
> The patch uses a single mutex for access serialization. The old code
> has used one mutex per each soft IPL level, but I am not sure how
> useful that has been. I think it would be better to have a separate
> mutex for each CPU. However, the increased code complexity might not
> be worthwhile at the moment. Even having the per-CPU queues has
> a whiff of being somewhat overkill.

A few comments:

* Looking at amd64 in isolation does not make sense.  Like a lot of MD
  code in OpenBSD the softintr code was copied from whatever
  Net/FreeBSD had at the time, with no attempt at unification (it
  works, check it in, don't go back to clean it up).  However, with
  powerpc64 and riscv64 we try to do things a little bit better in
  that area.  So arm64, powerpc64 and riscv64 share the same softintr
  implementation that already implements softintr_establish_flags()
  with SOFTINTR_ESTABLISH_MPSAFE.  Now we haven't used that flag
  anywhere in our tree yet, so the code might be completely busted.
  But it may make a lot of sense to migrate other architectures to the
  same codebase.

* The softintr_disestablish() function isn't used a lot in our tree.
  It may make sense to postpone worrying about safely disestablishing
  mpsafe soft interrupts for now and simply panic if someone tries to
  do this.

* Wouldn't it make sense for an mpsafe soft interrupt to protect
  itself from running simultaniously on multiple CPUs?  It probably
  already needs some sort of lock to protect the handler and other
  code running in process context on other CPUs.  And some handlers
  may be safe to run simultaniously anyway.

* I think we should avoid MAXCPU arrays if we can; adding stuff to
  struct cpu_info is probably a better approach here.

Cheers,

Mark

> Index: arch/amd64/amd64/softintr.c
> ===================================================================
> RCS file: src/sys/arch/amd64/amd64/softintr.c,v
> retrieving revision 1.10
> diff -u -p -r1.10 softintr.c
> --- arch/amd64/amd64/softintr.c       11 Sep 2020 09:27:09 -0000      1.10
> +++ arch/amd64/amd64/softintr.c       22 May 2021 11:07:23 -0000
> @@ -36,13 +36,37 @@
>  
>  #include <sys/param.h>
>  #include <sys/systm.h>
> +#include <sys/atomic.h>
>  #include <sys/malloc.h>
> +#include <sys/mutex.h>
> +#include <sys/percpu.h>
> +#include <sys/queue.h>
>  
>  #include <machine/intr.h>
>  
>  #include <uvm/uvm_extern.h>
>  
> -struct x86_soft_intr x86_soft_intrs[X86_NSOFTINTR];
> +struct x86_soft_intr_percpu;
> +
> +struct x86_soft_intrhand {
> +     TAILQ_ENTRY(x86_soft_intrhand)  sih_q;
> +     void                            (*sih_fn)(void *);
> +     void                            *sih_arg;
> +     struct x86_soft_intr_percpu     *sih_owner;
> +     int                             sih_which;
> +     unsigned short                  sih_flags;
> +     unsigned short                  sih_pending;
> +};
> +
> +#define SIF_DYING            0x0001
> +#define SIF_MPSAFE           0x0002
> +
> +struct x86_soft_intr_percpu {
> +     TAILQ_HEAD(, x86_soft_intrhand) sip_queue[X86_NSOFTINTR];
> +} __aligned(CACHELINESIZE);
> +
> +struct x86_soft_intr_percpu x86_soft_intr_percpu[MAXCPUS];
> +struct mutex softintr_lock = MUTEX_INITIALIZER(IPL_HIGH);
>  
>  const int x86_soft_intr_to_ssir[X86_NSOFTINTR] = {
>       SIR_CLOCK,
> @@ -58,14 +82,13 @@ const int x86_soft_intr_to_ssir[X86_NSOF
>  void
>  softintr_init(void)
>  {
> -     struct x86_soft_intr *si;
> -     int i;
> +     struct x86_soft_intr_percpu *sip;
> +     int i, j;
>  
> -     for (i = 0; i < X86_NSOFTINTR; i++) {
> -             si = &x86_soft_intrs[i];
> -             TAILQ_INIT(&si->softintr_q);
> -             mtx_init(&si->softintr_lock, IPL_HIGH);
> -             si->softintr_ssir = x86_soft_intr_to_ssir[i];
> +     for (i = 0; i < MAXCPUS; i++) {
> +             sip = &x86_soft_intr_percpu[i];
> +             for (j = 0; j < X86_NSOFTINTR; j++)
> +                     TAILQ_INIT(&sip->sip_queue[j]);
>       }
>  }
>  
> @@ -78,31 +101,38 @@ void
>  softintr_dispatch(int which)
>  {
>       struct cpu_info *ci = curcpu();
> -     struct x86_soft_intr *si = &x86_soft_intrs[which];
>       struct x86_soft_intrhand *sih;
> +     struct x86_soft_intr_percpu *sip;
>       int floor;
>  
> +     sip = &x86_soft_intr_percpu[CPU_INFO_UNIT(ci)];
> +
>       floor = ci->ci_handled_intr_level;
>       ci->ci_handled_intr_level = ci->ci_ilevel;
>  
> -     KERNEL_LOCK();
> -     for (;;) {
> -             mtx_enter(&si->softintr_lock);
> -             sih = TAILQ_FIRST(&si->softintr_q);
> -             if (sih == NULL) {
> -                     mtx_leave(&si->softintr_lock);
> -                     break;
> -             }
> -             TAILQ_REMOVE(&si->softintr_q, sih, sih_q);
> +     mtx_enter(&softintr_lock);
> +     while (!TAILQ_EMPTY(&sip->sip_queue[which])) {
> +             sih = TAILQ_FIRST(&sip->sip_queue[which]);
> +             KASSERT(sih->sih_owner == sip);
> +             KASSERT(sih->sih_pending);
>               sih->sih_pending = 0;
> +             TAILQ_REMOVE(&sip->sip_queue[which], sih, sih_q);
> +             mtx_leave(&softintr_lock);
>  
> -             uvmexp.softs++;
> -
> -             mtx_leave(&si->softintr_lock);
> +             if (sih->sih_flags & SIF_MPSAFE) {
> +                     (*sih->sih_fn)(sih->sih_arg);
> +             } else {
> +                     KERNEL_LOCK();
> +                     (*sih->sih_fn)(sih->sih_arg);
> +                     KERNEL_UNLOCK();
> +             }
>  
> -             (*sih->sih_fn)(sih->sih_arg);
> +             mtx_enter(&softintr_lock);
> +             uvmexp.softs++;
> +             if (sih->sih_pending == 0)
> +                     WRITE_ONCE(sih->sih_owner, NULL);
>       }
> -     KERNEL_UNLOCK();
> +     mtx_leave(&softintr_lock);
>  
>       ci->ci_handled_intr_level = floor;
>  }
> @@ -115,9 +145,13 @@ softintr_dispatch(int which)
>  void *
>  softintr_establish(int ipl, void (*func)(void *), void *arg)
>  {
> -     struct x86_soft_intr *si;
>       struct x86_soft_intrhand *sih;
>       int which;
> +     unsigned short flags = 0;
> +
> +     if (ipl & IPL_MPSAFE)
> +             flags |= SIF_MPSAFE;
> +     ipl &= ~IPL_MPSAFE;
>  
>       switch (ipl) {
>       case IPL_SOFTCLOCK:
> @@ -137,13 +171,13 @@ softintr_establish(int ipl, void (*func)
>               panic("softintr_establish");
>       }
>  
> -     si = &x86_soft_intrs[which];
> -
>       sih = malloc(sizeof(*sih), M_DEVBUF, M_NOWAIT);
>       if (__predict_true(sih != NULL)) {
> -             sih->sih_intrhead = si;
>               sih->sih_fn = func;
>               sih->sih_arg = arg;
> +             sih->sih_owner = NULL;
> +             sih->sih_which = which;
> +             sih->sih_flags = flags;
>               sih->sih_pending = 0;
>       }
>       return (sih);
> @@ -158,14 +192,57 @@ void
>  softintr_disestablish(void *arg)
>  {
>       struct x86_soft_intrhand *sih = arg;
> -     struct x86_soft_intr *si = sih->sih_intrhead;
> +     struct x86_soft_intr_percpu *sip;
> +
> +     mtx_enter(&softintr_lock);
> +
> +     sih->sih_flags |= SIF_DYING;
>  
> -     mtx_enter(&si->softintr_lock);
>       if (sih->sih_pending) {
> -             TAILQ_REMOVE(&si->softintr_q, sih, sih_q);
> +             sip = sih->sih_owner;
> +             KASSERT(sip != NULL);
> +             TAILQ_REMOVE(&sip->sip_queue[sih->sih_which], sih, sih_q);
>               sih->sih_pending = 0;
>       }
> -     mtx_leave(&si->softintr_lock);
> +
> +     if (sih->sih_owner != NULL) {
> +             mtx_leave(&softintr_lock);
> +             while (READ_ONCE(sih->sih_owner) != NULL)
> +                     CPU_BUSY_CYCLE();
> +             /* Synchronize this CPU's view of memory. */
> +             mtx_enter(&softintr_lock);
> +     }
> +
> +     mtx_leave(&softintr_lock);
>  
>       free(sih, M_DEVBUF, sizeof(*sih));
>  }
> +
> +void
> +softintr_schedule(void *arg)
> +{
> +     struct x86_soft_intr_percpu *cursip, *sip;
> +     struct x86_soft_intrhand *sih = arg;
> +     int trigger = 0;
> +
> +     cursip = &x86_soft_intr_percpu[cpu_number()];
> +
> +     mtx_enter(&softintr_lock);
> +
> +     KASSERT((sih->sih_flags & SIF_DYING) == 0);
> +
> +     if (sih->sih_pending == 0) {
> +             if (sih->sih_owner == NULL) {
> +                     WRITE_ONCE(sih->sih_owner, cursip);
> +                     trigger = 1;
> +             }
> +             sip = sih->sih_owner;
> +             TAILQ_INSERT_TAIL(&sip->sip_queue[sih->sih_which], sih, sih_q);
> +             sih->sih_pending = 1;
> +     }
> +
> +     mtx_leave(&softintr_lock);
> +
> +     if (trigger)
> +             softintr(x86_soft_intr_to_ssir[sih->sih_which]);
> +}
> Index: arch/amd64/include/intr.h
> ===================================================================
> RCS file: src/sys/arch/amd64/include/intr.h,v
> retrieving revision 1.32
> diff -u -p -r1.32 intr.h
> --- arch/amd64/include/intr.h 17 Jun 2020 06:14:52 -0000      1.32
> +++ arch/amd64/include/intr.h 22 May 2021 11:07:24 -0000
> @@ -231,42 +231,13 @@ extern void (*ipifunc[X86_NIPI])(struct 
>  #define      X86_NSOFTINTR                   3
>  
>  #ifndef _LOCORE
> -#include <sys/queue.h>
> -
> -struct x86_soft_intrhand {
> -     TAILQ_ENTRY(x86_soft_intrhand)
> -             sih_q;
> -     struct x86_soft_intr *sih_intrhead;
> -     void    (*sih_fn)(void *);
> -     void    *sih_arg;
> -     int     sih_pending;
> -};
> -
> -struct x86_soft_intr {
> -     TAILQ_HEAD(, x86_soft_intrhand)
> -                     softintr_q;
> -     int             softintr_ssir;
> -     struct mutex    softintr_lock;
> -};
>  
>  void *softintr_establish(int, void (*)(void *), void *);
>  void softintr_disestablish(void *);
>  void softintr_init(void);
>  void softintr_dispatch(int);
> +void softintr_schedule(void *);
>  
> -#define      softintr_schedule(arg)                                          
> \
> -do {                                                                 \
> -     struct x86_soft_intrhand *__sih = (arg);                        \
> -     struct x86_soft_intr *__si = __sih->sih_intrhead;               \
> -                                                                     \
> -     mtx_enter(&__si->softintr_lock);                                \
> -     if (__sih->sih_pending == 0) {                                  \
> -             TAILQ_INSERT_TAIL(&__si->softintr_q, __sih, sih_q);     \
> -             __sih->sih_pending = 1;                                 \
> -             softintr(__si->softintr_ssir);                          \
> -     }                                                               \
> -     mtx_leave(&__si->softintr_lock);                                \
> -} while (/*CONSTCOND*/ 0)
>  #endif /* _LOCORE */
>  
>  #endif /* !_MACHINE_INTR_H_ */
> 
> 

Reply via email to