Am Sat, May 22, 2021 at 02:33:47PM +0200 schrieb Mark Kettenis:
> > Date: Sat, 22 May 2021 11:11:38 +0000
> > From: Visa Hankala <[email protected]>
> >
> > On Wed, May 19, 2021 at 05:11:09PM -0500, Scott Cheloha wrote:
> > > Hi,
> > >
> > > visa@ says I need to unlock softintr_dispatch() before I can
> > > unlock softclock(), so let's do that.
> > >
> > > Additionally, when we call softintr_disestablish() we want to wait for
> > > the underlying softintr handle to finish running if it is running.
> > >
> > > We can start with amd64.
> > >
> > > I think this approach will work:
> > >
> > > - Keep a pointer to the running softintr, if any, in the queue. NULL
> > > the pointer when we return from sih_func().
> > >
> > > - Take/release the kernel lock if the SI_MPSAFE flag is present when
> > > we enter/leave sih_func().
> > >
> > > - If the handle is running when you call softintr_disestablish(), spin
> > > until the handle isn't running anymore and retry.
> > >
> > > There is no softintr manpage but I think it is understood that
> > > softintr_disestablish() is only safe to call from a process context,
> > > otherwise you may deadlock. Maybe we should do splassert(IPL_NONE)?
> > >
> > > We could probably sleep here instead of spinning. We'd have to change
> > > every softintr_disestablish() implementation to do that, though.
> > > Otherwise you'd have different behavior on different platforms.
> >
> > I think your diff does not pay enough attention to the fact that soft
> > interrupts are handled by all CPUs. I think the diff that I posted
> > a while ago [1] is better in that respect.
> >
> > Two biggest things that I do not like in my original diff are
> > synchronization of handler execution, and use of the SMR barrier.
> >
> > [1] https://marc.info/?l=openbsd-tech&m=162092714911609
> >
> > The kernel lock has guaranteed that at most one CPU is able to run
> > a given soft interrupt handler at a time. My diff used a mutex to
> > prevent concurrent execution. However, it is wasteful to spin. It would
> > be more economical to let the current runner of the handler re-execute
> > the code.
> >
> > The SMR barrier in softintr_disestablish() was a trick to drain any
> > pending activity. However, it made me feel uneasy because I have not
> > checked every caller of softintr_disestablish(). My main worry is not
> > the latency but unexpected side effects.
> >
> > Below is a revised diff that improves the above two points.
> >
> > When a soft interrupt handler is scheduled, it is assigned to a CPU.
> > That CPU will keep running the handler as long as there are pending
> > requests. Once all pending requests have been drained, the CPU
> > relinquishes its hold of the handler. This provides natural
> > serialization.
> >
> > Now softintr_disestablish() uses spinning for draining activity.
> > I still have slight qualms about this, though, because the feature
> > has not been so explicit before. Integration with witness(4) might be
> > in order.
> >
> > softintr_disestablish() uses READ_ONCE() to enforce reloading of the
> > value in the busy-wait loop. This way the variable does not need to be
> > volatile. (As yet another option, CPU_BUSY_CYCLE() could always
> > imply memory clobbering, which should make an optimizing compiler
> > redo the load.) For consistency with this READ_ONCE(), WRITE_ONCE() is
> > used whenever the variable is written, excluding the initialization.
> >
> > The patch uses a single mutex for access serialization. The old code
> > has used one mutex per each soft IPL level, but I am not sure how
> > useful that has been. I think it would be better to have a separate
> > mutex for each CPU. However, the increased code complexity might not
> > be worthwhile at the moment. Even having the per-CPU queues has
> > a whiff of being somewhat overkill.
>
> A few comments:
>
> * Looking at amd64 in isolation does not make sense. Like a lot of MD
> code in OpenBSD the softintr code was copied from whatever
> Net/FreeBSD had at the time, with no attempt at unification (it
> works, check it in, don't go back to clean it up). However, with
> powerpc64 and riscv64 we try to do things a little bit better in
> that area. So arm64, powerpc64 and riscv64 share the same softintr
> implementation that already implements softintr_establish_flags()
> with SOFTINTR_ESTABLISH_MPSAFE. Now we haven't used that flag
> anywhere in our tree yet, so the code might be completely busted.
> But it may make a lot of sense to migrate other architectures to the
> same codebase.
>
> * The softintr_disestablish() function isn't used a lot in our tree.
> It may make sense to postpone worrying about safely disestablishing
> mpsafe soft interrupts for now and simply panic if someone tries to
> do this.
>
> * Wouldn't it make sense for an mpsafe soft interrupt to protect
> itself from running simultaniously on multiple CPUs? It probably
> already needs some sort of lock to protect the handler and other
> code running in process context on other CPUs. And some handlers
> may be safe to run simultaniously anyway.
>
> * I think we should avoid MAXCPU arrays if we can; adding stuff to
> struct cpu_info is probably a better approach here.
Which reminds me of that 160 core ARM64 machine I have access to...
> Cheers,
>
> Mark
>
> > Index: arch/amd64/amd64/softintr.c
> > ===================================================================
> > RCS file: src/sys/arch/amd64/amd64/softintr.c,v
> > retrieving revision 1.10
> > diff -u -p -r1.10 softintr.c
> > --- arch/amd64/amd64/softintr.c 11 Sep 2020 09:27:09 -0000 1.10
> > +++ arch/amd64/amd64/softintr.c 22 May 2021 11:07:23 -0000
> > @@ -36,13 +36,37 @@
> >
> > #include <sys/param.h>
> > #include <sys/systm.h>
> > +#include <sys/atomic.h>
> > #include <sys/malloc.h>
> > +#include <sys/mutex.h>
> > +#include <sys/percpu.h>
> > +#include <sys/queue.h>
> >
> > #include <machine/intr.h>
> >
> > #include <uvm/uvm_extern.h>
> >
> > -struct x86_soft_intr x86_soft_intrs[X86_NSOFTINTR];
> > +struct x86_soft_intr_percpu;
> > +
> > +struct x86_soft_intrhand {
> > + TAILQ_ENTRY(x86_soft_intrhand) sih_q;
> > + void (*sih_fn)(void *);
> > + void *sih_arg;
> > + struct x86_soft_intr_percpu *sih_owner;
> > + int sih_which;
> > + unsigned short sih_flags;
> > + unsigned short sih_pending;
> > +};
> > +
> > +#define SIF_DYING 0x0001
> > +#define SIF_MPSAFE 0x0002
> > +
> > +struct x86_soft_intr_percpu {
> > + TAILQ_HEAD(, x86_soft_intrhand) sip_queue[X86_NSOFTINTR];
> > +} __aligned(CACHELINESIZE);
> > +
> > +struct x86_soft_intr_percpu x86_soft_intr_percpu[MAXCPUS];
> > +struct mutex softintr_lock = MUTEX_INITIALIZER(IPL_HIGH);
> >
> > const int x86_soft_intr_to_ssir[X86_NSOFTINTR] = {
> > SIR_CLOCK,
> > @@ -58,14 +82,13 @@ const int x86_soft_intr_to_ssir[X86_NSOF
> > void
> > softintr_init(void)
> > {
> > - struct x86_soft_intr *si;
> > - int i;
> > + struct x86_soft_intr_percpu *sip;
> > + int i, j;
> >
> > - for (i = 0; i < X86_NSOFTINTR; i++) {
> > - si = &x86_soft_intrs[i];
> > - TAILQ_INIT(&si->softintr_q);
> > - mtx_init(&si->softintr_lock, IPL_HIGH);
> > - si->softintr_ssir = x86_soft_intr_to_ssir[i];
> > + for (i = 0; i < MAXCPUS; i++) {
> > + sip = &x86_soft_intr_percpu[i];
> > + for (j = 0; j < X86_NSOFTINTR; j++)
> > + TAILQ_INIT(&sip->sip_queue[j]);
> > }
> > }
> >
> > @@ -78,31 +101,38 @@ void
> > softintr_dispatch(int which)
> > {
> > struct cpu_info *ci = curcpu();
> > - struct x86_soft_intr *si = &x86_soft_intrs[which];
> > struct x86_soft_intrhand *sih;
> > + struct x86_soft_intr_percpu *sip;
> > int floor;
> >
> > + sip = &x86_soft_intr_percpu[CPU_INFO_UNIT(ci)];
> > +
> > floor = ci->ci_handled_intr_level;
> > ci->ci_handled_intr_level = ci->ci_ilevel;
> >
> > - KERNEL_LOCK();
> > - for (;;) {
> > - mtx_enter(&si->softintr_lock);
> > - sih = TAILQ_FIRST(&si->softintr_q);
> > - if (sih == NULL) {
> > - mtx_leave(&si->softintr_lock);
> > - break;
> > - }
> > - TAILQ_REMOVE(&si->softintr_q, sih, sih_q);
> > + mtx_enter(&softintr_lock);
> > + while (!TAILQ_EMPTY(&sip->sip_queue[which])) {
> > + sih = TAILQ_FIRST(&sip->sip_queue[which]);
> > + KASSERT(sih->sih_owner == sip);
> > + KASSERT(sih->sih_pending);
> > sih->sih_pending = 0;
> > + TAILQ_REMOVE(&sip->sip_queue[which], sih, sih_q);
> > + mtx_leave(&softintr_lock);
> >
> > - uvmexp.softs++;
> > -
> > - mtx_leave(&si->softintr_lock);
> > + if (sih->sih_flags & SIF_MPSAFE) {
> > + (*sih->sih_fn)(sih->sih_arg);
> > + } else {
> > + KERNEL_LOCK();
> > + (*sih->sih_fn)(sih->sih_arg);
> > + KERNEL_UNLOCK();
> > + }
> >
> > - (*sih->sih_fn)(sih->sih_arg);
> > + mtx_enter(&softintr_lock);
> > + uvmexp.softs++;
> > + if (sih->sih_pending == 0)
> > + WRITE_ONCE(sih->sih_owner, NULL);
> > }
> > - KERNEL_UNLOCK();
> > + mtx_leave(&softintr_lock);
> >
> > ci->ci_handled_intr_level = floor;
> > }
> > @@ -115,9 +145,13 @@ softintr_dispatch(int which)
> > void *
> > softintr_establish(int ipl, void (*func)(void *), void *arg)
> > {
> > - struct x86_soft_intr *si;
> > struct x86_soft_intrhand *sih;
> > int which;
> > + unsigned short flags = 0;
> > +
> > + if (ipl & IPL_MPSAFE)
> > + flags |= SIF_MPSAFE;
> > + ipl &= ~IPL_MPSAFE;
> >
> > switch (ipl) {
> > case IPL_SOFTCLOCK:
> > @@ -137,13 +171,13 @@ softintr_establish(int ipl, void (*func)
> > panic("softintr_establish");
> > }
> >
> > - si = &x86_soft_intrs[which];
> > -
> > sih = malloc(sizeof(*sih), M_DEVBUF, M_NOWAIT);
> > if (__predict_true(sih != NULL)) {
> > - sih->sih_intrhead = si;
> > sih->sih_fn = func;
> > sih->sih_arg = arg;
> > + sih->sih_owner = NULL;
> > + sih->sih_which = which;
> > + sih->sih_flags = flags;
> > sih->sih_pending = 0;
> > }
> > return (sih);
> > @@ -158,14 +192,57 @@ void
> > softintr_disestablish(void *arg)
> > {
> > struct x86_soft_intrhand *sih = arg;
> > - struct x86_soft_intr *si = sih->sih_intrhead;
> > + struct x86_soft_intr_percpu *sip;
> > +
> > + mtx_enter(&softintr_lock);
> > +
> > + sih->sih_flags |= SIF_DYING;
> >
> > - mtx_enter(&si->softintr_lock);
> > if (sih->sih_pending) {
> > - TAILQ_REMOVE(&si->softintr_q, sih, sih_q);
> > + sip = sih->sih_owner;
> > + KASSERT(sip != NULL);
> > + TAILQ_REMOVE(&sip->sip_queue[sih->sih_which], sih, sih_q);
> > sih->sih_pending = 0;
> > }
> > - mtx_leave(&si->softintr_lock);
> > +
> > + if (sih->sih_owner != NULL) {
> > + mtx_leave(&softintr_lock);
> > + while (READ_ONCE(sih->sih_owner) != NULL)
> > + CPU_BUSY_CYCLE();
> > + /* Synchronize this CPU's view of memory. */
> > + mtx_enter(&softintr_lock);
> > + }
> > +
> > + mtx_leave(&softintr_lock);
> >
> > free(sih, M_DEVBUF, sizeof(*sih));
> > }
> > +
> > +void
> > +softintr_schedule(void *arg)
> > +{
> > + struct x86_soft_intr_percpu *cursip, *sip;
> > + struct x86_soft_intrhand *sih = arg;
> > + int trigger = 0;
> > +
> > + cursip = &x86_soft_intr_percpu[cpu_number()];
> > +
> > + mtx_enter(&softintr_lock);
> > +
> > + KASSERT((sih->sih_flags & SIF_DYING) == 0);
> > +
> > + if (sih->sih_pending == 0) {
> > + if (sih->sih_owner == NULL) {
> > + WRITE_ONCE(sih->sih_owner, cursip);
> > + trigger = 1;
> > + }
> > + sip = sih->sih_owner;
> > + TAILQ_INSERT_TAIL(&sip->sip_queue[sih->sih_which], sih, sih_q);
> > + sih->sih_pending = 1;
> > + }
> > +
> > + mtx_leave(&softintr_lock);
> > +
> > + if (trigger)
> > + softintr(x86_soft_intr_to_ssir[sih->sih_which]);
> > +}
> > Index: arch/amd64/include/intr.h
> > ===================================================================
> > RCS file: src/sys/arch/amd64/include/intr.h,v
> > retrieving revision 1.32
> > diff -u -p -r1.32 intr.h
> > --- arch/amd64/include/intr.h 17 Jun 2020 06:14:52 -0000 1.32
> > +++ arch/amd64/include/intr.h 22 May 2021 11:07:24 -0000
> > @@ -231,42 +231,13 @@ extern void (*ipifunc[X86_NIPI])(struct
> > #define X86_NSOFTINTR 3
> >
> > #ifndef _LOCORE
> > -#include <sys/queue.h>
> > -
> > -struct x86_soft_intrhand {
> > - TAILQ_ENTRY(x86_soft_intrhand)
> > - sih_q;
> > - struct x86_soft_intr *sih_intrhead;
> > - void (*sih_fn)(void *);
> > - void *sih_arg;
> > - int sih_pending;
> > -};
> > -
> > -struct x86_soft_intr {
> > - TAILQ_HEAD(, x86_soft_intrhand)
> > - softintr_q;
> > - int softintr_ssir;
> > - struct mutex softintr_lock;
> > -};
> >
> > void *softintr_establish(int, void (*)(void *), void *);
> > void softintr_disestablish(void *);
> > void softintr_init(void);
> > void softintr_dispatch(int);
> > +void softintr_schedule(void *);
> >
> > -#define softintr_schedule(arg)
> > \
> > -do {
> > \
> > - struct x86_soft_intrhand *__sih = (arg); \
> > - struct x86_soft_intr *__si = __sih->sih_intrhead; \
> > - \
> > - mtx_enter(&__si->softintr_lock); \
> > - if (__sih->sih_pending == 0) { \
> > - TAILQ_INSERT_TAIL(&__si->softintr_q, __sih, sih_q); \
> > - __sih->sih_pending = 1; \
> > - softintr(__si->softintr_ssir); \
> > - } \
> > - mtx_leave(&__si->softintr_lock); \
> > -} while (/*CONSTCOND*/ 0)
> > #endif /* _LOCORE */
> >
> > #endif /* !_MACHINE_INTR_H_ */
> >
> >
>