Am Sat, May 22, 2021 at 02:33:47PM +0200 schrieb Mark Kettenis:
> > Date: Sat, 22 May 2021 11:11:38 +0000
> > From: Visa Hankala <[email protected]>
> > 
> > On Wed, May 19, 2021 at 05:11:09PM -0500, Scott Cheloha wrote:
> > > Hi,
> > > 
> > > visa@ says I need to unlock softintr_dispatch() before I can
> > > unlock softclock(), so let's do that.
> > > 
> > > Additionally, when we call softintr_disestablish() we want to wait for
> > > the underlying softintr handle to finish running if it is running.
> > > 
> > > We can start with amd64.
> > > 
> > > I think this approach will work:
> > > 
> > > - Keep a pointer to the running softintr, if any, in the queue.  NULL
> > >   the pointer when we return from sih_func().
> > > 
> > > - Take/release the kernel lock if the SI_MPSAFE flag is present when
> > >   we enter/leave sih_func().
> > > 
> > > - If the handle is running when you call softintr_disestablish(), spin
> > >   until the handle isn't running anymore and retry.
> > > 
> > > There is no softintr manpage but I think it is understood that
> > > softintr_disestablish() is only safe to call from a process context,
> > > otherwise you may deadlock.  Maybe we should do splassert(IPL_NONE)?
> > > 
> > > We could probably sleep here instead of spinning.  We'd have to change
> > > every softintr_disestablish() implementation to do that, though.
> > > Otherwise you'd have different behavior on different platforms.
> > 
> > I think your diff does not pay enough attention to the fact that soft
> > interrupts are handled by all CPUs. I think the diff that I posted
> > a while ago [1] is better in that respect.
> > 
> > Two biggest things that I do not like in my original diff are
> > synchronization of handler execution, and use of the SMR barrier.
> > 
> > [1] https://marc.info/?l=openbsd-tech&m=162092714911609
> > 
> > The kernel lock has guaranteed that at most one CPU is able to run
> > a given soft interrupt handler at a time. My diff used a mutex to
> > prevent concurrent execution. However, it is wasteful to spin. It would
> > be more economical to let the current runner of the handler re-execute
> > the code.
> > 
> > The SMR barrier in softintr_disestablish() was a trick to drain any
> > pending activity. However, it made me feel uneasy because I have not
> > checked every caller of softintr_disestablish(). My main worry is not
> > the latency but unexpected side effects.
> > 
> > Below is a revised diff that improves the above two points.
> > 
> > When a soft interrupt handler is scheduled, it is assigned to a CPU.
> > That CPU will keep running the handler as long as there are pending
> > requests. Once all pending requests have been drained, the CPU
> > relinquishes its hold of the handler. This provides natural
> > serialization.
> > 
> > Now softintr_disestablish() uses spinning for draining activity.
> > I still have slight qualms about this, though, because the feature
> > has not been so explicit before. Integration with witness(4) might be
> > in order.
> > 
> > softintr_disestablish() uses READ_ONCE() to enforce reloading of the
> > value in the busy-wait loop. This way the variable does not need to be
> > volatile. (As yet another option, CPU_BUSY_CYCLE() could always
> > imply memory clobbering, which should make an optimizing compiler
> > redo the load.) For consistency with this READ_ONCE(), WRITE_ONCE() is
> > used whenever the variable is written, excluding the initialization.
> > 
> > The patch uses a single mutex for access serialization. The old code
> > has used one mutex per each soft IPL level, but I am not sure how
> > useful that has been. I think it would be better to have a separate
> > mutex for each CPU. However, the increased code complexity might not
> > be worthwhile at the moment. Even having the per-CPU queues has
> > a whiff of being somewhat overkill.
> 
> A few comments:
> 
> * Looking at amd64 in isolation does not make sense.  Like a lot of MD
>   code in OpenBSD the softintr code was copied from whatever
>   Net/FreeBSD had at the time, with no attempt at unification (it
>   works, check it in, don't go back to clean it up).  However, with
>   powerpc64 and riscv64 we try to do things a little bit better in
>   that area.  So arm64, powerpc64 and riscv64 share the same softintr
>   implementation that already implements softintr_establish_flags()
>   with SOFTINTR_ESTABLISH_MPSAFE.  Now we haven't used that flag
>   anywhere in our tree yet, so the code might be completely busted.
>   But it may make a lot of sense to migrate other architectures to the
>   same codebase.
> 
> * The softintr_disestablish() function isn't used a lot in our tree.
>   It may make sense to postpone worrying about safely disestablishing
>   mpsafe soft interrupts for now and simply panic if someone tries to
>   do this.
> 
> * Wouldn't it make sense for an mpsafe soft interrupt to protect
>   itself from running simultaniously on multiple CPUs?  It probably
>   already needs some sort of lock to protect the handler and other
>   code running in process context on other CPUs.  And some handlers
>   may be safe to run simultaniously anyway.
> 
> * I think we should avoid MAXCPU arrays if we can; adding stuff to
>   struct cpu_info is probably a better approach here.

Which reminds me of that 160 core ARM64 machine I have access to...

> Cheers,
> 
> Mark
> 
> > Index: arch/amd64/amd64/softintr.c
> > ===================================================================
> > RCS file: src/sys/arch/amd64/amd64/softintr.c,v
> > retrieving revision 1.10
> > diff -u -p -r1.10 softintr.c
> > --- arch/amd64/amd64/softintr.c     11 Sep 2020 09:27:09 -0000      1.10
> > +++ arch/amd64/amd64/softintr.c     22 May 2021 11:07:23 -0000
> > @@ -36,13 +36,37 @@
> >  
> >  #include <sys/param.h>
> >  #include <sys/systm.h>
> > +#include <sys/atomic.h>
> >  #include <sys/malloc.h>
> > +#include <sys/mutex.h>
> > +#include <sys/percpu.h>
> > +#include <sys/queue.h>
> >  
> >  #include <machine/intr.h>
> >  
> >  #include <uvm/uvm_extern.h>
> >  
> > -struct x86_soft_intr x86_soft_intrs[X86_NSOFTINTR];
> > +struct x86_soft_intr_percpu;
> > +
> > +struct x86_soft_intrhand {
> > +   TAILQ_ENTRY(x86_soft_intrhand)  sih_q;
> > +   void                            (*sih_fn)(void *);
> > +   void                            *sih_arg;
> > +   struct x86_soft_intr_percpu     *sih_owner;
> > +   int                             sih_which;
> > +   unsigned short                  sih_flags;
> > +   unsigned short                  sih_pending;
> > +};
> > +
> > +#define SIF_DYING          0x0001
> > +#define SIF_MPSAFE         0x0002
> > +
> > +struct x86_soft_intr_percpu {
> > +   TAILQ_HEAD(, x86_soft_intrhand) sip_queue[X86_NSOFTINTR];
> > +} __aligned(CACHELINESIZE);
> > +
> > +struct x86_soft_intr_percpu x86_soft_intr_percpu[MAXCPUS];
> > +struct mutex softintr_lock = MUTEX_INITIALIZER(IPL_HIGH);
> >  
> >  const int x86_soft_intr_to_ssir[X86_NSOFTINTR] = {
> >     SIR_CLOCK,
> > @@ -58,14 +82,13 @@ const int x86_soft_intr_to_ssir[X86_NSOF
> >  void
> >  softintr_init(void)
> >  {
> > -   struct x86_soft_intr *si;
> > -   int i;
> > +   struct x86_soft_intr_percpu *sip;
> > +   int i, j;
> >  
> > -   for (i = 0; i < X86_NSOFTINTR; i++) {
> > -           si = &x86_soft_intrs[i];
> > -           TAILQ_INIT(&si->softintr_q);
> > -           mtx_init(&si->softintr_lock, IPL_HIGH);
> > -           si->softintr_ssir = x86_soft_intr_to_ssir[i];
> > +   for (i = 0; i < MAXCPUS; i++) {
> > +           sip = &x86_soft_intr_percpu[i];
> > +           for (j = 0; j < X86_NSOFTINTR; j++)
> > +                   TAILQ_INIT(&sip->sip_queue[j]);
> >     }
> >  }
> >  
> > @@ -78,31 +101,38 @@ void
> >  softintr_dispatch(int which)
> >  {
> >     struct cpu_info *ci = curcpu();
> > -   struct x86_soft_intr *si = &x86_soft_intrs[which];
> >     struct x86_soft_intrhand *sih;
> > +   struct x86_soft_intr_percpu *sip;
> >     int floor;
> >  
> > +   sip = &x86_soft_intr_percpu[CPU_INFO_UNIT(ci)];
> > +
> >     floor = ci->ci_handled_intr_level;
> >     ci->ci_handled_intr_level = ci->ci_ilevel;
> >  
> > -   KERNEL_LOCK();
> > -   for (;;) {
> > -           mtx_enter(&si->softintr_lock);
> > -           sih = TAILQ_FIRST(&si->softintr_q);
> > -           if (sih == NULL) {
> > -                   mtx_leave(&si->softintr_lock);
> > -                   break;
> > -           }
> > -           TAILQ_REMOVE(&si->softintr_q, sih, sih_q);
> > +   mtx_enter(&softintr_lock);
> > +   while (!TAILQ_EMPTY(&sip->sip_queue[which])) {
> > +           sih = TAILQ_FIRST(&sip->sip_queue[which]);
> > +           KASSERT(sih->sih_owner == sip);
> > +           KASSERT(sih->sih_pending);
> >             sih->sih_pending = 0;
> > +           TAILQ_REMOVE(&sip->sip_queue[which], sih, sih_q);
> > +           mtx_leave(&softintr_lock);
> >  
> > -           uvmexp.softs++;
> > -
> > -           mtx_leave(&si->softintr_lock);
> > +           if (sih->sih_flags & SIF_MPSAFE) {
> > +                   (*sih->sih_fn)(sih->sih_arg);
> > +           } else {
> > +                   KERNEL_LOCK();
> > +                   (*sih->sih_fn)(sih->sih_arg);
> > +                   KERNEL_UNLOCK();
> > +           }
> >  
> > -           (*sih->sih_fn)(sih->sih_arg);
> > +           mtx_enter(&softintr_lock);
> > +           uvmexp.softs++;
> > +           if (sih->sih_pending == 0)
> > +                   WRITE_ONCE(sih->sih_owner, NULL);
> >     }
> > -   KERNEL_UNLOCK();
> > +   mtx_leave(&softintr_lock);
> >  
> >     ci->ci_handled_intr_level = floor;
> >  }
> > @@ -115,9 +145,13 @@ softintr_dispatch(int which)
> >  void *
> >  softintr_establish(int ipl, void (*func)(void *), void *arg)
> >  {
> > -   struct x86_soft_intr *si;
> >     struct x86_soft_intrhand *sih;
> >     int which;
> > +   unsigned short flags = 0;
> > +
> > +   if (ipl & IPL_MPSAFE)
> > +           flags |= SIF_MPSAFE;
> > +   ipl &= ~IPL_MPSAFE;
> >  
> >     switch (ipl) {
> >     case IPL_SOFTCLOCK:
> > @@ -137,13 +171,13 @@ softintr_establish(int ipl, void (*func)
> >             panic("softintr_establish");
> >     }
> >  
> > -   si = &x86_soft_intrs[which];
> > -
> >     sih = malloc(sizeof(*sih), M_DEVBUF, M_NOWAIT);
> >     if (__predict_true(sih != NULL)) {
> > -           sih->sih_intrhead = si;
> >             sih->sih_fn = func;
> >             sih->sih_arg = arg;
> > +           sih->sih_owner = NULL;
> > +           sih->sih_which = which;
> > +           sih->sih_flags = flags;
> >             sih->sih_pending = 0;
> >     }
> >     return (sih);
> > @@ -158,14 +192,57 @@ void
> >  softintr_disestablish(void *arg)
> >  {
> >     struct x86_soft_intrhand *sih = arg;
> > -   struct x86_soft_intr *si = sih->sih_intrhead;
> > +   struct x86_soft_intr_percpu *sip;
> > +
> > +   mtx_enter(&softintr_lock);
> > +
> > +   sih->sih_flags |= SIF_DYING;
> >  
> > -   mtx_enter(&si->softintr_lock);
> >     if (sih->sih_pending) {
> > -           TAILQ_REMOVE(&si->softintr_q, sih, sih_q);
> > +           sip = sih->sih_owner;
> > +           KASSERT(sip != NULL);
> > +           TAILQ_REMOVE(&sip->sip_queue[sih->sih_which], sih, sih_q);
> >             sih->sih_pending = 0;
> >     }
> > -   mtx_leave(&si->softintr_lock);
> > +
> > +   if (sih->sih_owner != NULL) {
> > +           mtx_leave(&softintr_lock);
> > +           while (READ_ONCE(sih->sih_owner) != NULL)
> > +                   CPU_BUSY_CYCLE();
> > +           /* Synchronize this CPU's view of memory. */
> > +           mtx_enter(&softintr_lock);
> > +   }
> > +
> > +   mtx_leave(&softintr_lock);
> >  
> >     free(sih, M_DEVBUF, sizeof(*sih));
> >  }
> > +
> > +void
> > +softintr_schedule(void *arg)
> > +{
> > +   struct x86_soft_intr_percpu *cursip, *sip;
> > +   struct x86_soft_intrhand *sih = arg;
> > +   int trigger = 0;
> > +
> > +   cursip = &x86_soft_intr_percpu[cpu_number()];
> > +
> > +   mtx_enter(&softintr_lock);
> > +
> > +   KASSERT((sih->sih_flags & SIF_DYING) == 0);
> > +
> > +   if (sih->sih_pending == 0) {
> > +           if (sih->sih_owner == NULL) {
> > +                   WRITE_ONCE(sih->sih_owner, cursip);
> > +                   trigger = 1;
> > +           }
> > +           sip = sih->sih_owner;
> > +           TAILQ_INSERT_TAIL(&sip->sip_queue[sih->sih_which], sih, sih_q);
> > +           sih->sih_pending = 1;
> > +   }
> > +
> > +   mtx_leave(&softintr_lock);
> > +
> > +   if (trigger)
> > +           softintr(x86_soft_intr_to_ssir[sih->sih_which]);
> > +}
> > Index: arch/amd64/include/intr.h
> > ===================================================================
> > RCS file: src/sys/arch/amd64/include/intr.h,v
> > retrieving revision 1.32
> > diff -u -p -r1.32 intr.h
> > --- arch/amd64/include/intr.h       17 Jun 2020 06:14:52 -0000      1.32
> > +++ arch/amd64/include/intr.h       22 May 2021 11:07:24 -0000
> > @@ -231,42 +231,13 @@ extern void (*ipifunc[X86_NIPI])(struct 
> >  #define    X86_NSOFTINTR                   3
> >  
> >  #ifndef _LOCORE
> > -#include <sys/queue.h>
> > -
> > -struct x86_soft_intrhand {
> > -   TAILQ_ENTRY(x86_soft_intrhand)
> > -           sih_q;
> > -   struct x86_soft_intr *sih_intrhead;
> > -   void    (*sih_fn)(void *);
> > -   void    *sih_arg;
> > -   int     sih_pending;
> > -};
> > -
> > -struct x86_soft_intr {
> > -   TAILQ_HEAD(, x86_soft_intrhand)
> > -                   softintr_q;
> > -   int             softintr_ssir;
> > -   struct mutex    softintr_lock;
> > -};
> >  
> >  void       *softintr_establish(int, void (*)(void *), void *);
> >  void       softintr_disestablish(void *);
> >  void       softintr_init(void);
> >  void       softintr_dispatch(int);
> > +void       softintr_schedule(void *);
> >  
> > -#define    softintr_schedule(arg)                                          
> > \
> > -do {                                                                       
> > \
> > -   struct x86_soft_intrhand *__sih = (arg);                        \
> > -   struct x86_soft_intr *__si = __sih->sih_intrhead;               \
> > -                                                                   \
> > -   mtx_enter(&__si->softintr_lock);                                \
> > -   if (__sih->sih_pending == 0) {                                  \
> > -           TAILQ_INSERT_TAIL(&__si->softintr_q, __sih, sih_q);     \
> > -           __sih->sih_pending = 1;                                 \
> > -           softintr(__si->softintr_ssir);                          \
> > -   }                                                               \
> > -   mtx_leave(&__si->softintr_lock);                                \
> > -} while (/*CONSTCOND*/ 0)
> >  #endif /* _LOCORE */
> >  
> >  #endif /* !_MACHINE_INTR_H_ */
> > 
> > 
> 

Reply via email to