On Tue, May 05, 2015 at 11:02 -0700, Philip Guenther wrote:
> On Tue, May 5, 2015 at 9:35 AM, Mike Belopuhov <[email protected]> wrote:
> ...
> > Here's a diff to remedy this.  This is the same chunk as in the
> > tsleep, except it uses semantics of msleep.  IPL dance is there
> > to negate the IPL changing effect of mtx_enter/mtx_leave so that
> > splx(safepri) operation is actually changing IPL level from HIGH
> > to "safepri" and then back to the mutex IPL level.
> ...
> > +               spl = MUTEX_OLDIPL(mtx);
> > +               MUTEX_OLDIPL(mtx) = splhigh();
> > +               mtx_leave(mtx);
> > +
> > +               splx(safepri);
> 
> Hmm, since mtx_leave() effectively ends with
> "splx(MUTEX_OLDIPL(mtx));", can't we just say:
> 
>               spl = MUTEX_OLDIPL(mtx);
>               MUTEX_OLDIPL(mtx) = safepri;
>               mtx_leave(mtx);
> 
> (The tsleep() version needs splhigh() because that's the only safe way
> to get the current spl level so that it can be restored before
> returning.  Here in msleep() the mutex's MUTEX_OLDIPL() already has
> the value we need.)
> 
> 
> Philip Guenther

Sorry for the delay.  I have finally figured out why it's not enough
for MP (but still needs to go in).  The problem is that panic can run
on any CPU and that's where we're holding the kernel lock.  Interrupts
are happening on cpu0 and our lock prevents them from running.  It was
rather easy to demonstrate by dropping and re-acquiring the kernel lock
in msleep/tsleep.

I believe the easiest solution is to force SCSI_POLL and run all I/O
on the CPU we paniced on and that's what I'm currently investigating.

diff --git sys/kern/kern_synch.c sys/kern/kern_synch.c
index fbf7684..03308b4 100644
--- sys/kern/kern_synch.c
+++ sys/kern/kern_synch.c
@@ -140,11 +140,11 @@ tsleep(const volatile void *ident, int priority, const 
char *wmesg, int timo)
 
        return (error);
 }
 
 /*
- * Same as tsleep, but if we have a mutex provided, then once we've 
+ * Same as tsleep, but if we have a mutex provided, then once we've
  * entered the sleep queue we drop the mutex. After sleeping we re-lock.
  */
 int
 msleep(const volatile void *ident, struct mutex *mtx, int priority,
     const char *wmesg, int timo)
@@ -153,16 +153,34 @@ msleep(const volatile void *ident, struct mutex *mtx, int 
priority,
        int error, error1, spl;
 
        KASSERT((priority & ~(PRIMASK | PCATCH | PNORELOCK)) == 0);
        KASSERT(mtx != NULL);
 
+       if (cold || panicstr) {
+               /*
+                * After a panic, or during autoconfiguration,
+                * just give interrupts a chance, then just return;
+                * don't run any other procs or panic below,
+                * in case this is the idle process and already asleep.
+                */
+               spl = MUTEX_OLDIPL(mtx);
+               MUTEX_OLDIPL(mtx) = safepri;
+               mtx_leave(mtx);
+               if ((priority & PNORELOCK) == 0) {
+                       mtx_enter(mtx);
+                       MUTEX_OLDIPL(mtx) = spl;
+               } else
+                       splx(spl);
+               return (0);
+       }
+
        sleep_setup(&sls, ident, priority, wmesg);
        sleep_setup_timeout(&sls, timo);
        sleep_setup_signal(&sls, priority);
 
        /* XXX - We need to make sure that the mutex doesn't
-        * unblock splsched. This can be made a bit more 
+        * unblock splsched. This can be made a bit more
         * correct when the sched_lock is a mutex.
         */
        spl = MUTEX_OLDIPL(mtx);
        MUTEX_OLDIPL(mtx) = splsched();
        mtx_leave(mtx);

Reply via email to