On Sat, May 22, 2021 at 02:23:57PM +0000, Visa Hankala wrote:
> On Sat, May 22, 2021 at 02:33:47PM +0200, Mark Kettenis wrote:
> > > Date: Sat, 22 May 2021 11:11:38 +0000
> > > From: Visa Hankala <v...@hankala.org>
> > > 
> > > On Wed, May 19, 2021 at 05:11:09PM -0500, Scott Cheloha wrote:
> > > > Hi,
> > > > 
> > > > visa@ says I need to unlock softintr_dispatch() before I can
> > > > unlock softclock(), so let's do that.
> > > > 
> > > > Additionally, when we call softintr_disestablish() we want to wait for
> > > > the underlying softintr handle to finish running if it is running.
> > > > 
> > > > We can start with amd64.
> > > > 
> > > > I think this approach will work:
> > > > 
> > > > - Keep a pointer to the running softintr, if any, in the queue.  NULL
> > > >   the pointer when we return from sih_func().
> > > > 
> > > > - Take/release the kernel lock if the SI_MPSAFE flag is present when
> > > >   we enter/leave sih_func().
> > > > 
> > > > - If the handle is running when you call softintr_disestablish(), spin
> > > >   until the handle isn't running anymore and retry.
> > > > 
> > > > There is no softintr manpage but I think it is understood that
> > > > softintr_disestablish() is only safe to call from a process context,
> > > > otherwise you may deadlock.  Maybe we should do splassert(IPL_NONE)?
> > > > 
> > > > We could probably sleep here instead of spinning.  We'd have to change
> > > > every softintr_disestablish() implementation to do that, though.
> > > > Otherwise you'd have different behavior on different platforms.
> > > 
> > > I think your diff does not pay enough attention to the fact that soft
> > > interrupts are handled by all CPUs. I think the diff that I posted
> > > a while ago [1] is better in that respect.
> > > 
> > > Two biggest things that I do not like in my original diff are
> > > synchronization of handler execution, and use of the SMR barrier.
> > > 
> > > [1] https://marc.info/?l=openbsd-tech&m=162092714911609
> > > 
> > > The kernel lock has guaranteed that at most one CPU is able to run
> > > a given soft interrupt handler at a time. My diff used a mutex to
> > > prevent concurrent execution. However, it is wasteful to spin. It would
> > > be more economical to let the current runner of the handler re-execute
> > > the code.
> > > 
> > > The SMR barrier in softintr_disestablish() was a trick to drain any
> > > pending activity. However, it made me feel uneasy because I have not
> > > checked every caller of softintr_disestablish(). My main worry is not
> > > the latency but unexpected side effects.
> > > 
> > > Below is a revised diff that improves the above two points.
> > > 
> > > When a soft interrupt handler is scheduled, it is assigned to a CPU.
> > > That CPU will keep running the handler as long as there are pending
> > > requests. Once all pending requests have been drained, the CPU
> > > relinquishes its hold of the handler. This provides natural
> > > serialization.
> > > 
> > > Now softintr_disestablish() uses spinning for draining activity.
> > > I still have slight qualms about this, though, because the feature
> > > has not been so explicit before. Integration with witness(4) might be
> > > in order.
> > > 
> > > softintr_disestablish() uses READ_ONCE() to enforce reloading of the
> > > value in the busy-wait loop. This way the variable does not need to be
> > > volatile. (As yet another option, CPU_BUSY_CYCLE() could always
> > > imply memory clobbering, which should make an optimizing compiler
> > > redo the load.) For consistency with this READ_ONCE(), WRITE_ONCE() is
> > > used whenever the variable is written, excluding the initialization.
> > > 
> > > The patch uses a single mutex for access serialization. The old code
> > > has used one mutex per each soft IPL level, but I am not sure how
> > > useful that has been. I think it would be better to have a separate
> > > mutex for each CPU. However, the increased code complexity might not
> > > be worthwhile at the moment. Even having the per-CPU queues has
> > > a whiff of being somewhat overkill.
> > 
> > A few comments:
> > 
> > * Looking at amd64 in isolation does not make sense.  Like a lot of MD
> >   code in OpenBSD the softintr code was copied from whatever
> >   Net/FreeBSD had at the time, with no attempt at unification (it
> >   works, check it in, don't go back to clean it up).  However, with
> >   powerpc64 and riscv64 we try to do things a little bit better in
> >   that area.  So arm64, powerpc64 and riscv64 share the same softintr
> >   implementation that already implements softintr_establish_flags()
> >   with SOFTINTR_ESTABLISH_MPSAFE.  Now we haven't used that flag
> >   anywhere in our tree yet, so the code might be completely busted.
> >   But it may make a lot of sense to migrate other architectures to the
> >   same codebase.
> 
> I know that the implementations should be unified, or made machine
> independent as far as possible. I have focused on amd64's code to get
> the MP-safe processing working sooner on that platform - at the cost
> of increased code debt.
> 
> > * The softintr_disestablish() function isn't used a lot in our tree.
> >   It may make sense to postpone worrying about safely disestablishing
> >   mpsafe soft interrupts for now and simply panic if someone tries to
> >   do this.
> 
> Thinking about this I realized that the disestablishing is tricky even
> in the non-MP-safe case. A CPU might be trying to tear down a handler
> while another CPU is about to enter the handler but spinning for the
> kernel lock, which would cause a deadlock. Until now this has not been
> an issue with the implementations where the whole dispatch loop runs
> under the kernel lock.
> 
> Users of soft interrupts should ensure that no further scheduling of the
> soft interrupt happens when softintr_disestablish() is called. Typically
> this would be accomplished by disabling and draining the hard interrupt
> that uses the soft interrupt. In the usual case the hard interrupt is
> processed by a specific CPU. Consequently, the soft interrupt is run
> by that CPU as well, immediately after the hard interrupt. Once
> intr_barrier() returns, the soft interrupt should have quiesced.
> However, this looks quite fragile, and SMR barrier has gained a tiny bit
> of appeal because it releases the lock temporarily.
> 
> > * Wouldn't it make sense for an mpsafe soft interrupt to protect
> >   itself from running simultaniously on multiple CPUs?  It probably
> >   already needs some sort of lock to protect the handler and other
> >   code running in process context on other CPUs.  And some handlers
> >   may be safe to run simultaniously anyway.
> 
> OpenBSD's soft interrupt code actually combines two things into one:
> implementation of soft interrupt vectors, and tasklet-like execution
> of handlers. In general, the former is part MD and part MI, whereas
> the latter is essentially fully MI.
> 
> Timeout processing could be among users of the semi-fixed set of soft
> interrupt vectors. Most device drivers needing soft interrupts could use
> the higher-level tasklet interface.
> 
> I have already thought about implementing the above. However, there are
> things that I should finish first.
> 
> > * I think we should avoid MAXCPU arrays if we can; adding stuff to
> >   struct cpu_info is probably a better approach here.
> 
> I agree. I used MAXCPUS to simplify the initialization.
> 
> What would be the right place to call per-CPU initialization of the
> subsystem? cpu_init() on amd64?

Here is a revised patch with a slightly altered approach.

I have replaced the per-CPU queues with system-wide queues to make
the code simpler.

When a CPU starts processing a soft interrupt, it reserves the handler
to prevent concurrent execution. If the soft interrupt gets rescheduled
during processing, the handler is run again by the same CPU. This breaks
FIFO ordering, though.

softintr_disestablish() now calls sched_barrier() to drain processing.
This should also prevent the deadlock with the kernel lock.

It appeared tricky to add witness(4) integration. witness(4) treats the
kernel lock as a sleepable lock to avoid ordering issues with rwlocks.
This creates a dilemma in softintr_dispatch() because of nesting.
The natural lock class for the required soft interrupt pseudo-lock is
mutex, but that does not work if an MP-safe soft interrupt got preempted
by a higher IPL non-MP-safe soft interrupt.

The dilemma could be worked around by preventing nesting when witness(4)
is enabled, altering the system's behaviour. Another approach is to skip
the pseudo-lock handling when processing is nested.

However, as soft interrupt handlers cannot use rwlocks, for deadlock
prevention it should be sufficient to ensure that the caller of
softintr_disestablish() does not hold spinning locks (the kernel lock
excluded). Hence softintr_disestablish() calls assertwaitok() at the
start. The assertion is actually implied by sched_barrier().

Index: arch/amd64/amd64/softintr.c
===================================================================
RCS file: src/sys/arch/amd64/amd64/softintr.c,v
retrieving revision 1.10
diff -u -p -r1.10 softintr.c
--- arch/amd64/amd64/softintr.c 11 Sep 2020 09:27:09 -0000      1.10
+++ arch/amd64/amd64/softintr.c 23 May 2021 08:56:28 -0000
@@ -37,12 +37,33 @@
 #include <sys/param.h>
 #include <sys/systm.h>
 #include <sys/malloc.h>
+#include <sys/mutex.h>
+#include <sys/queue.h>
 
 #include <machine/intr.h>
 
 #include <uvm/uvm_extern.h>
 
-struct x86_soft_intr x86_soft_intrs[X86_NSOFTINTR];
+struct x86_soft_intrhand {
+       STAILQ_ENTRY(x86_soft_intrhand) sih_q;
+       void                    (*sih_fn)(void *);
+       void                    *sih_arg;
+       struct cpu_info         *sih_runner;
+       int                     sih_which;
+       unsigned short          sih_flags;
+       unsigned short          sih_state;
+};
+
+#define SIF_MPSAFE             0x0001
+
+#define SIS_DYING              0x0001
+#define SIS_PENDING            0x0002
+#define SIS_RESTART            0x0004
+
+STAILQ_HEAD(x86_soft_intr_queue, x86_soft_intrhand);
+
+struct x86_soft_intr_queue softintr_queue[X86_NSOFTINTR];
+struct mutex softintr_lock = MUTEX_INITIALIZER(IPL_HIGH);
 
 const int x86_soft_intr_to_ssir[X86_NSOFTINTR] = {
        SIR_CLOCK,
@@ -58,15 +79,10 @@ const int x86_soft_intr_to_ssir[X86_NSOF
 void
 softintr_init(void)
 {
-       struct x86_soft_intr *si;
        int i;
 
-       for (i = 0; i < X86_NSOFTINTR; i++) {
-               si = &x86_soft_intrs[i];
-               TAILQ_INIT(&si->softintr_q);
-               mtx_init(&si->softintr_lock, IPL_HIGH);
-               si->softintr_ssir = x86_soft_intr_to_ssir[i];
-       }
+       for (i = 0; i < X86_NSOFTINTR; i++)
+               STAILQ_INIT(&softintr_queue[i]);
 }
 
 /*
@@ -78,31 +94,45 @@ void
 softintr_dispatch(int which)
 {
        struct cpu_info *ci = curcpu();
-       struct x86_soft_intr *si = &x86_soft_intrs[which];
+       struct x86_soft_intr_queue *queue = &softintr_queue[which];
        struct x86_soft_intrhand *sih;
        int floor;
 
        floor = ci->ci_handled_intr_level;
        ci->ci_handled_intr_level = ci->ci_ilevel;
 
-       KERNEL_LOCK();
-       for (;;) {
-               mtx_enter(&si->softintr_lock);
-               sih = TAILQ_FIRST(&si->softintr_q);
-               if (sih == NULL) {
-                       mtx_leave(&si->softintr_lock);
-                       break;
+       mtx_enter(&softintr_lock);
+       while ((sih = STAILQ_FIRST(queue)) != NULL) {
+               KASSERT((sih->sih_state & (SIS_PENDING | SIS_RESTART)) ==
+                   SIS_PENDING);
+               KASSERT(sih->sih_runner == NULL);
+
+               sih->sih_state &= ~SIS_PENDING;
+               STAILQ_REMOVE_HEAD(queue, sih_q);
+               sih->sih_runner = ci;
+restart:
+               mtx_leave(&softintr_lock);
+
+               if (sih->sih_flags & SIF_MPSAFE) {
+                       (*sih->sih_fn)(sih->sih_arg);
+               } else {
+                       KERNEL_LOCK();
+                       (*sih->sih_fn)(sih->sih_arg);
+                       KERNEL_UNLOCK();
                }
-               TAILQ_REMOVE(&si->softintr_q, sih, sih_q);
-               sih->sih_pending = 0;
 
+               mtx_enter(&softintr_lock);
                uvmexp.softs++;
 
-               mtx_leave(&si->softintr_lock);
+               KASSERT((sih->sih_state & SIS_PENDING) == 0);
+               if (sih->sih_state & SIS_RESTART) {
+                       sih->sih_state &= ~SIS_RESTART;
+                       goto restart;
+               }
 
-               (*sih->sih_fn)(sih->sih_arg);
+               sih->sih_runner = NULL;
        }
-       KERNEL_UNLOCK();
+       mtx_leave(&softintr_lock);
 
        ci->ci_handled_intr_level = floor;
 }
@@ -115,9 +145,13 @@ softintr_dispatch(int which)
 void *
 softintr_establish(int ipl, void (*func)(void *), void *arg)
 {
-       struct x86_soft_intr *si;
        struct x86_soft_intrhand *sih;
        int which;
+       unsigned short flags = 0;
+
+       if (ipl & IPL_MPSAFE)
+               flags |= SIF_MPSAFE;
+       ipl &= ~IPL_MPSAFE;
 
        switch (ipl) {
        case IPL_SOFTCLOCK:
@@ -137,14 +171,14 @@ softintr_establish(int ipl, void (*func)
                panic("softintr_establish");
        }
 
-       si = &x86_soft_intrs[which];
-
        sih = malloc(sizeof(*sih), M_DEVBUF, M_NOWAIT);
        if (__predict_true(sih != NULL)) {
-               sih->sih_intrhead = si;
                sih->sih_fn = func;
                sih->sih_arg = arg;
-               sih->sih_pending = 0;
+               sih->sih_runner = NULL;
+               sih->sih_which = which;
+               sih->sih_flags = flags;
+               sih->sih_state = 0;
        }
        return (sih);
 }
@@ -157,15 +191,52 @@ softintr_establish(int ipl, void (*func)
 void
 softintr_disestablish(void *arg)
 {
+       struct cpu_info *runner = NULL;
        struct x86_soft_intrhand *sih = arg;
-       struct x86_soft_intr *si = sih->sih_intrhead;
 
-       mtx_enter(&si->softintr_lock);
-       if (sih->sih_pending) {
-               TAILQ_REMOVE(&si->softintr_q, sih, sih_q);
-               sih->sih_pending = 0;
+       assertwaitok();
+
+       mtx_enter(&softintr_lock);
+       sih->sih_state |= SIS_DYING;
+       if (sih->sih_state & SIS_PENDING) {
+               sih->sih_state &= ~SIS_PENDING;
+               STAILQ_REMOVE(&softintr_queue[sih->sih_which], sih,
+                   x86_soft_intrhand, sih_q);
        }
-       mtx_leave(&si->softintr_lock);
+       runner = sih->sih_runner;
+       mtx_leave(&softintr_lock);
+
+       if (runner != NULL)
+               sched_barrier(runner);
+
+       KASSERT((sih->sih_state & (SIS_PENDING | SIS_RESTART)) == 0);
+       KASSERT(sih->sih_runner == NULL);
 
        free(sih, M_DEVBUF, sizeof(*sih));
 }
+
+void
+softintr_schedule(void *arg)
+{
+       struct x86_soft_intrhand *sih = arg;
+       struct x86_soft_intr_queue *queue = &softintr_queue[sih->sih_which];
+       int trigger = 0;
+
+       mtx_enter(&softintr_lock);
+       KASSERT((sih->sih_state & SIS_DYING) == 0);
+       if (sih->sih_runner == NULL) {
+               KASSERT((sih->sih_state & SIS_RESTART) == 0);
+               if ((sih->sih_state & SIS_PENDING) == 0) {
+                       STAILQ_INSERT_TAIL(queue, sih, sih_q);
+                       sih->sih_state |= SIS_PENDING;
+                       trigger = 1;
+               }
+       } else {
+               KASSERT((sih->sih_state & SIS_PENDING) == 0);
+               sih->sih_state |= SIS_RESTART;
+       }
+       mtx_leave(&softintr_lock);
+
+       if (trigger)
+               softintr(x86_soft_intr_to_ssir[sih->sih_which]);
+}
Index: arch/amd64/include/intr.h
===================================================================
RCS file: src/sys/arch/amd64/include/intr.h,v
retrieving revision 1.32
diff -u -p -r1.32 intr.h
--- arch/amd64/include/intr.h   17 Jun 2020 06:14:52 -0000      1.32
+++ arch/amd64/include/intr.h   23 May 2021 08:56:28 -0000
@@ -231,42 +231,13 @@ extern void (*ipifunc[X86_NIPI])(struct 
 #define        X86_NSOFTINTR                   3
 
 #ifndef _LOCORE
-#include <sys/queue.h>
-
-struct x86_soft_intrhand {
-       TAILQ_ENTRY(x86_soft_intrhand)
-               sih_q;
-       struct x86_soft_intr *sih_intrhead;
-       void    (*sih_fn)(void *);
-       void    *sih_arg;
-       int     sih_pending;
-};
-
-struct x86_soft_intr {
-       TAILQ_HEAD(, x86_soft_intrhand)
-                       softintr_q;
-       int             softintr_ssir;
-       struct mutex    softintr_lock;
-};
 
 void   *softintr_establish(int, void (*)(void *), void *);
 void   softintr_disestablish(void *);
 void   softintr_init(void);
 void   softintr_dispatch(int);
+void   softintr_schedule(void *);
 
-#define        softintr_schedule(arg)                                          
\
-do {                                                                   \
-       struct x86_soft_intrhand *__sih = (arg);                        \
-       struct x86_soft_intr *__si = __sih->sih_intrhead;               \
-                                                                       \
-       mtx_enter(&__si->softintr_lock);                                \
-       if (__sih->sih_pending == 0) {                                  \
-               TAILQ_INSERT_TAIL(&__si->softintr_q, __sih, sih_q);     \
-               __sih->sih_pending = 1;                                 \
-               softintr(__si->softintr_ssir);                          \
-       }                                                               \
-       mtx_leave(&__si->softintr_lock);                                \
-} while (/*CONSTCOND*/ 0)
 #endif /* _LOCORE */
 
 #endif /* !_MACHINE_INTR_H_ */

Reply via email to