Re: vfs_shutdown would like to do polled I/O at least on panic

2015-05-11 Thread Mark Kettenis
> Date: Mon, 11 May 2015 16:54:54 +0200
> From: Mike Belopuhov 
> 
> On Fri, May 08, 2015 at 20:28 +0200, Mark Kettenis wrote:
> > > Date: Fri, 8 May 2015 20:15:58 +0200
> > > From: Mike Belopuhov 
> > > 
> > > On Fri, May 08, 2015 at 12:34 +0200, Mike Belopuhov wrote:
> > > > > I think tsleep(9) and msleep(9) need to release and re-acquire the
> > > > > kernel lock in the "cold || panicstr" case.
> > > > 
> > > > Well, it's not hard to do really, but...
> > > > 
> > > > > We might need this for
> > > > > handling interrupts during autoconf as soon as we start distributing
> > > > > interrupts over CPUs.
> > > > >
> > > > 
> > > > ...cold might mean that interrupts are not ready yet.  So then we might
> > > > need another flag for shutdown?
> > > 
> > > This is what I have come up with.  Chunks were taken directly from
> > > mi_switch and it seems to do the job just fine.  Right now I'm not
> > > using any additional flags and it seems to work here.  I'll resume
> > > testing on Monday, but it looks fairly complete.  Any comments?
> > 
> > Makes sense to me.  The msleep/tsleep code could be simplified to:
> > 
> > if (__mp_lock_held(&kernel_lock)) {
> > hold_count = __mp_release_all(&kernel_lock);
> > __mp_acquire_count(&kernel_lock, hold_count);
> > }
> >
> 
> Indeed.
> 
> > I'm also wondering whether we should change __mp_release_all() to
> > simple return 0 if the current CPU does not hold the lock, such that
> > the __mp_lock_held() check isn't needed anymore.  But that's a
> > separate issue.
> > 
> 
> It would have made the __mp_release_all safer as well since it
> wouldn't require an external check.
> 
> I don't have any firther input on this, diff works fine here.
> 
> OK?

ok kettenis@

> diff --git sys/kern/kern_synch.c sys/kern/kern_synch.c
> index 03308b4..6f573fc 100644
> --- sys/kern/kern_synch.c
> +++ sys/kern/kern_synch.c
> @@ -103,10 +103,13 @@ extern int safepri;
>  int
>  tsleep(const volatile void *ident, int priority, const char *wmesg, int timo)
>  {
>   struct sleep_state sls;
>   int error, error1;
> +#ifdef MULTIPROCESSOR
> + int hold_count;
> +#endif
>  
>   KASSERT((priority & ~(PRIMASK | PCATCH)) == 0);
>  
>  #ifdef MULTIPROCESSOR
>   KASSERT(timo || __mp_lock_held(&kernel_lock));
> @@ -120,10 +123,16 @@ tsleep(const volatile void *ident, int priority, const 
> char *wmesg, int timo)
>* don't run any other procs or panic below,
>* in case this is the idle process and already asleep.
>*/
>   s = splhigh();
>   splx(safepri);
> +#ifdef MULTIPROCESSOR
> + if (__mp_lock_held(&kernel_lock)) {
> + hold_count = __mp_release_all(&kernel_lock);
> + __mp_acquire_count(&kernel_lock, hold_count);
> + }
> +#endif
>   splx(s);
>   return (0);
>   }
>  
>   sleep_setup(&sls, ident, priority, wmesg);
> @@ -149,10 +158,13 @@ int
>  msleep(const volatile void *ident, struct mutex *mtx, int priority,
>  const char *wmesg, int timo)
>  {
>   struct sleep_state sls;
>   int error, error1, spl;
> +#ifdef MULTIPROCESSOR
> + int hold_count;
> +#endif
>  
>   KASSERT((priority & ~(PRIMASK | PCATCH | PNORELOCK)) == 0);
>   KASSERT(mtx != NULL);
>  
>   if (cold || panicstr) {
> @@ -163,10 +175,16 @@ msleep(const volatile void *ident, struct mutex *mtx, 
> int priority,
>* in case this is the idle process and already asleep.
>*/
>   spl = MUTEX_OLDIPL(mtx);
>   MUTEX_OLDIPL(mtx) = safepri;
>   mtx_leave(mtx);
> +#ifdef MULTIPROCESSOR
> + if (__mp_lock_held(&kernel_lock)) {
> + hold_count = __mp_release_all(&kernel_lock);
> + __mp_acquire_count(&kernel_lock, hold_count);
> + }
> +#endif
>   if ((priority & PNORELOCK) == 0) {
>   mtx_enter(mtx);
>   MUTEX_OLDIPL(mtx) = spl;
>   } else
>   splx(spl);
> diff --git sys/kern/vfs_subr.c sys/kern/vfs_subr.c
> index a26fbe2..a373789 100644
> --- sys/kern/vfs_subr.c
> +++ sys/kern/vfs_subr.c
> @@ -1664,10 +1664,13 @@ int
>  vfs_syncwait(int verbose)
>  {
>   struct buf *bp;
>   int iter, nbusy, dcount, s;
>   struct proc *p;
> +#ifdef MULTIPROCESSOR
> + int hold_count;
> +#endif
>  
>   p = curproc? curproc : &proc0;
>   sys_sync(p, (void *)0, (register_t *)0);
>  
>   /* Wait for sync to finish. */
> @@ -1698,11 +1701,21 @@ vfs_syncwait(int verbose)
>   }
>   if (nbusy == 0)
>   break;
>   if (verbose)
>   printf("%d ", nbusy);
> +#ifdef MULTIPROCESSOR
> + if (__mp_lock_held(&kernel_lock))
> + hold_count = __mp_release_all(&kernel_lock);
> + else
> + 

Re: vfs_shutdown would like to do polled I/O at least on panic

2015-05-11 Thread Mike Belopuhov
On Fri, May 08, 2015 at 20:28 +0200, Mark Kettenis wrote:
> > Date: Fri, 8 May 2015 20:15:58 +0200
> > From: Mike Belopuhov 
> > 
> > On Fri, May 08, 2015 at 12:34 +0200, Mike Belopuhov wrote:
> > > > I think tsleep(9) and msleep(9) need to release and re-acquire the
> > > > kernel lock in the "cold || panicstr" case.
> > > 
> > > Well, it's not hard to do really, but...
> > > 
> > > > We might need this for
> > > > handling interrupts during autoconf as soon as we start distributing
> > > > interrupts over CPUs.
> > > >
> > > 
> > > ...cold might mean that interrupts are not ready yet.  So then we might
> > > need another flag for shutdown?
> > 
> > This is what I have come up with.  Chunks were taken directly from
> > mi_switch and it seems to do the job just fine.  Right now I'm not
> > using any additional flags and it seems to work here.  I'll resume
> > testing on Monday, but it looks fairly complete.  Any comments?
> 
> Makes sense to me.  The msleep/tsleep code could be simplified to:
> 
>   if (__mp_lock_held(&kernel_lock)) {
>   hold_count = __mp_release_all(&kernel_lock);
>   __mp_acquire_count(&kernel_lock, hold_count);
>   }
>

Indeed.

> I'm also wondering whether we should change __mp_release_all() to
> simple return 0 if the current CPU does not hold the lock, such that
> the __mp_lock_held() check isn't needed anymore.  But that's a
> separate issue.
> 

It would have made the __mp_release_all safer as well since it
wouldn't require an external check.

I don't have any firther input on this, diff works fine here.

OK?

diff --git sys/kern/kern_synch.c sys/kern/kern_synch.c
index 03308b4..6f573fc 100644
--- sys/kern/kern_synch.c
+++ sys/kern/kern_synch.c
@@ -103,10 +103,13 @@ extern int safepri;
 int
 tsleep(const volatile void *ident, int priority, const char *wmesg, int timo)
 {
struct sleep_state sls;
int error, error1;
+#ifdef MULTIPROCESSOR
+   int hold_count;
+#endif
 
KASSERT((priority & ~(PRIMASK | PCATCH)) == 0);
 
 #ifdef MULTIPROCESSOR
KASSERT(timo || __mp_lock_held(&kernel_lock));
@@ -120,10 +123,16 @@ tsleep(const volatile void *ident, int priority, const 
char *wmesg, int timo)
 * don't run any other procs or panic below,
 * in case this is the idle process and already asleep.
 */
s = splhigh();
splx(safepri);
+#ifdef MULTIPROCESSOR
+   if (__mp_lock_held(&kernel_lock)) {
+   hold_count = __mp_release_all(&kernel_lock);
+   __mp_acquire_count(&kernel_lock, hold_count);
+   }
+#endif
splx(s);
return (0);
}
 
sleep_setup(&sls, ident, priority, wmesg);
@@ -149,10 +158,13 @@ int
 msleep(const volatile void *ident, struct mutex *mtx, int priority,
 const char *wmesg, int timo)
 {
struct sleep_state sls;
int error, error1, spl;
+#ifdef MULTIPROCESSOR
+   int hold_count;
+#endif
 
KASSERT((priority & ~(PRIMASK | PCATCH | PNORELOCK)) == 0);
KASSERT(mtx != NULL);
 
if (cold || panicstr) {
@@ -163,10 +175,16 @@ msleep(const volatile void *ident, struct mutex *mtx, int 
priority,
 * in case this is the idle process and already asleep.
 */
spl = MUTEX_OLDIPL(mtx);
MUTEX_OLDIPL(mtx) = safepri;
mtx_leave(mtx);
+#ifdef MULTIPROCESSOR
+   if (__mp_lock_held(&kernel_lock)) {
+   hold_count = __mp_release_all(&kernel_lock);
+   __mp_acquire_count(&kernel_lock, hold_count);
+   }
+#endif
if ((priority & PNORELOCK) == 0) {
mtx_enter(mtx);
MUTEX_OLDIPL(mtx) = spl;
} else
splx(spl);
diff --git sys/kern/vfs_subr.c sys/kern/vfs_subr.c
index a26fbe2..a373789 100644
--- sys/kern/vfs_subr.c
+++ sys/kern/vfs_subr.c
@@ -1664,10 +1664,13 @@ int
 vfs_syncwait(int verbose)
 {
struct buf *bp;
int iter, nbusy, dcount, s;
struct proc *p;
+#ifdef MULTIPROCESSOR
+   int hold_count;
+#endif
 
p = curproc? curproc : &proc0;
sys_sync(p, (void *)0, (register_t *)0);
 
/* Wait for sync to finish. */
@@ -1698,11 +1701,21 @@ vfs_syncwait(int verbose)
}
if (nbusy == 0)
break;
if (verbose)
printf("%d ", nbusy);
+#ifdef MULTIPROCESSOR
+   if (__mp_lock_held(&kernel_lock))
+   hold_count = __mp_release_all(&kernel_lock);
+   else
+   hold_count = 0;
+#endif
DELAY(4 * iter);
+#ifdef MULTIPROCESSOR
+   if (hold_count)
+   __mp_acquire_count(&kernel_lock, hold_count);
+#endif
}
 
return nbusy;
 }
 



Re: vfs_shutdown would like to do polled I/O at least on panic

2015-05-08 Thread Mark Kettenis
> Date: Fri, 8 May 2015 20:15:58 +0200
> From: Mike Belopuhov 
> 
> On Fri, May 08, 2015 at 12:34 +0200, Mike Belopuhov wrote:
> > > I think tsleep(9) and msleep(9) need to release and re-acquire the
> > > kernel lock in the "cold || panicstr" case.
> > 
> > Well, it's not hard to do really, but...
> > 
> > > We might need this for
> > > handling interrupts during autoconf as soon as we start distributing
> > > interrupts over CPUs.
> > >
> > 
> > ...cold might mean that interrupts are not ready yet.  So then we might
> > need another flag for shutdown?
> 
> This is what I have come up with.  Chunks were taken directly from
> mi_switch and it seems to do the job just fine.  Right now I'm not
> using any additional flags and it seems to work here.  I'll resume
> testing on Monday, but it looks fairly complete.  Any comments?

Makes sense to me.  The msleep/tsleep code could be simplified to:

if (__mp_lock_held(&kernel_lock)) {
hold_count = __mp_release_all(&kernel_lock);
__mp_acquire_count(&kernel_lock, hold_count);
}

I'm also wondering whether we should change __mp_release_all() to
simple return 0 if the current CPU does not hold the lock, such that
the __mp_lock_held() check isn't needed anymore.  But that's a
separate issue.

> diff --git sys/kern/kern_synch.c sys/kern/kern_synch.c
> index 03308b4..4089adf 100644
> --- sys/kern/kern_synch.c
> +++ sys/kern/kern_synch.c
> @@ -103,10 +103,13 @@ extern int safepri;
>  int
>  tsleep(const volatile void *ident, int priority, const char *wmesg, int timo)
>  {
>   struct sleep_state sls;
>   int error, error1;
> +#ifdef MULTIPROCESSOR
> + int hold_count;
> +#endif
>  
>   KASSERT((priority & ~(PRIMASK | PCATCH)) == 0);
>  
>  #ifdef MULTIPROCESSOR
>   KASSERT(timo || __mp_lock_held(&kernel_lock));
> @@ -120,10 +123,18 @@ tsleep(const volatile void *ident, int priority, const 
> char *wmesg, int timo)
>* don't run any other procs or panic below,
>* in case this is the idle process and already asleep.
>*/
>   s = splhigh();
>   splx(safepri);
> +#ifdef MULTIPROCESSOR
> + if (__mp_lock_held(&kernel_lock))
> + hold_count = __mp_release_all(&kernel_lock);
> + else
> + hold_count = 0;
> + if (hold_count)
> + __mp_acquire_count(&kernel_lock, hold_count);
> +#endif
>   splx(s);
>   return (0);
>   }
>  
>   sleep_setup(&sls, ident, priority, wmesg);
> @@ -149,10 +160,13 @@ int
>  msleep(const volatile void *ident, struct mutex *mtx, int priority,
>  const char *wmesg, int timo)
>  {
>   struct sleep_state sls;
>   int error, error1, spl;
> +#ifdef MULTIPROCESSOR
> + int hold_count;
> +#endif
>  
>   KASSERT((priority & ~(PRIMASK | PCATCH | PNORELOCK)) == 0);
>   KASSERT(mtx != NULL);
>  
>   if (cold || panicstr) {
> @@ -163,10 +177,18 @@ msleep(const volatile void *ident, struct mutex *mtx, 
> int priority,
>* in case this is the idle process and already asleep.
>*/
>   spl = MUTEX_OLDIPL(mtx);
>   MUTEX_OLDIPL(mtx) = safepri;
>   mtx_leave(mtx);
> +#ifdef MULTIPROCESSOR
> + if (__mp_lock_held(&kernel_lock))
> + hold_count = __mp_release_all(&kernel_lock);
> + else
> + hold_count = 0;
> + if (hold_count)
> + __mp_acquire_count(&kernel_lock, hold_count);
> +#endif
>   if ((priority & PNORELOCK) == 0) {
>   mtx_enter(mtx);
>   MUTEX_OLDIPL(mtx) = spl;
>   } else
>   splx(spl);
> diff --git sys/kern/vfs_subr.c sys/kern/vfs_subr.c
> index a26fbe2..a373789 100644
> --- sys/kern/vfs_subr.c
> +++ sys/kern/vfs_subr.c
> @@ -1664,10 +1664,13 @@ int
>  vfs_syncwait(int verbose)
>  {
>   struct buf *bp;
>   int iter, nbusy, dcount, s;
>   struct proc *p;
> +#ifdef MULTIPROCESSOR
> + int hold_count;
> +#endif
>  
>   p = curproc? curproc : &proc0;
>   sys_sync(p, (void *)0, (register_t *)0);
>  
>   /* Wait for sync to finish. */
> @@ -1698,11 +1701,21 @@ vfs_syncwait(int verbose)
>   }
>   if (nbusy == 0)
>   break;
>   if (verbose)
>   printf("%d ", nbusy);
> +#ifdef MULTIPROCESSOR
> + if (__mp_lock_held(&kernel_lock))
> + hold_count = __mp_release_all(&kernel_lock);
> + else
> + hold_count = 0;
> +#endif
>   DELAY(4 * iter);
> +#ifdef MULTIPROCESSOR
> + if (hold_count)
> + __mp_acquire_count(&kernel_lock, hold_count);
> +#endif
>   }
>  
>   return nbusy;
>  }
>  
> 
> 



Re: vfs_shutdown would like to do polled I/O at least on panic

2015-05-08 Thread Mike Belopuhov
On Fri, May 08, 2015 at 12:34 +0200, Mike Belopuhov wrote:
> > I think tsleep(9) and msleep(9) need to release and re-acquire the
> > kernel lock in the "cold || panicstr" case.
> 
> Well, it's not hard to do really, but...
> 
> > We might need this for
> > handling interrupts during autoconf as soon as we start distributing
> > interrupts over CPUs.
> >
> 
> ...cold might mean that interrupts are not ready yet.  So then we might
> need another flag for shutdown?

This is what I have come up with.  Chunks were taken directly from
mi_switch and it seems to do the job just fine.  Right now I'm not
using any additional flags and it seems to work here.  I'll resume
testing on Monday, but it looks fairly complete.  Any comments?

diff --git sys/kern/kern_synch.c sys/kern/kern_synch.c
index 03308b4..4089adf 100644
--- sys/kern/kern_synch.c
+++ sys/kern/kern_synch.c
@@ -103,10 +103,13 @@ extern int safepri;
 int
 tsleep(const volatile void *ident, int priority, const char *wmesg, int timo)
 {
struct sleep_state sls;
int error, error1;
+#ifdef MULTIPROCESSOR
+   int hold_count;
+#endif
 
KASSERT((priority & ~(PRIMASK | PCATCH)) == 0);
 
 #ifdef MULTIPROCESSOR
KASSERT(timo || __mp_lock_held(&kernel_lock));
@@ -120,10 +123,18 @@ tsleep(const volatile void *ident, int priority, const 
char *wmesg, int timo)
 * don't run any other procs or panic below,
 * in case this is the idle process and already asleep.
 */
s = splhigh();
splx(safepri);
+#ifdef MULTIPROCESSOR
+   if (__mp_lock_held(&kernel_lock))
+   hold_count = __mp_release_all(&kernel_lock);
+   else
+   hold_count = 0;
+   if (hold_count)
+   __mp_acquire_count(&kernel_lock, hold_count);
+#endif
splx(s);
return (0);
}
 
sleep_setup(&sls, ident, priority, wmesg);
@@ -149,10 +160,13 @@ int
 msleep(const volatile void *ident, struct mutex *mtx, int priority,
 const char *wmesg, int timo)
 {
struct sleep_state sls;
int error, error1, spl;
+#ifdef MULTIPROCESSOR
+   int hold_count;
+#endif
 
KASSERT((priority & ~(PRIMASK | PCATCH | PNORELOCK)) == 0);
KASSERT(mtx != NULL);
 
if (cold || panicstr) {
@@ -163,10 +177,18 @@ msleep(const volatile void *ident, struct mutex *mtx, int 
priority,
 * in case this is the idle process and already asleep.
 */
spl = MUTEX_OLDIPL(mtx);
MUTEX_OLDIPL(mtx) = safepri;
mtx_leave(mtx);
+#ifdef MULTIPROCESSOR
+   if (__mp_lock_held(&kernel_lock))
+   hold_count = __mp_release_all(&kernel_lock);
+   else
+   hold_count = 0;
+   if (hold_count)
+   __mp_acquire_count(&kernel_lock, hold_count);
+#endif
if ((priority & PNORELOCK) == 0) {
mtx_enter(mtx);
MUTEX_OLDIPL(mtx) = spl;
} else
splx(spl);
diff --git sys/kern/vfs_subr.c sys/kern/vfs_subr.c
index a26fbe2..a373789 100644
--- sys/kern/vfs_subr.c
+++ sys/kern/vfs_subr.c
@@ -1664,10 +1664,13 @@ int
 vfs_syncwait(int verbose)
 {
struct buf *bp;
int iter, nbusy, dcount, s;
struct proc *p;
+#ifdef MULTIPROCESSOR
+   int hold_count;
+#endif
 
p = curproc? curproc : &proc0;
sys_sync(p, (void *)0, (register_t *)0);
 
/* Wait for sync to finish. */
@@ -1698,11 +1701,21 @@ vfs_syncwait(int verbose)
}
if (nbusy == 0)
break;
if (verbose)
printf("%d ", nbusy);
+#ifdef MULTIPROCESSOR
+   if (__mp_lock_held(&kernel_lock))
+   hold_count = __mp_release_all(&kernel_lock);
+   else
+   hold_count = 0;
+#endif
DELAY(4 * iter);
+#ifdef MULTIPROCESSOR
+   if (hold_count)
+   __mp_acquire_count(&kernel_lock, hold_count);
+#endif
}
 
return nbusy;
 }
 



Re: vfs_shutdown would like to do polled I/O at least on panic

2015-05-08 Thread Mike Belopuhov
On 8 May 2015 at 04:51, Masao Uebayashi  wrote:
> By doing complex VFS shutdown operation, the system's memory image will
> be modified a lot since a panic was triggered.  I'd totally skip
> vfs_shutdown() after a panic [1], then do the best to dump a kernel core
> for analysis.
>
> [1] OpenBSD's panic(9) sets RB_NOSYNC only when panicstr is already set
> (== recursive panic()) at this moment.

This is also a valid point of view and in fact I was surprised
to see sys_sync called from vfs_syncwait, esp. since it's doing
uvm_vnp_sync.  But on the other hand it has been there for 20
years in OpenBSD and who knows how many years before that in
*BSD and we got used to relying on this behavior, didn't we?



Re: vfs_shutdown would like to do polled I/O at least on panic

2015-05-08 Thread Mark Kettenis
> From: Mike Belopuhov 
> Date: Fri, 8 May 2015 13:53:56 +0200
> 
> On 8 May 2015 at 12:37, Martin Pieuchot  wrote:
> > On 07/05/15(Thu) 20:58, Mike Belopuhov wrote:
> >> As I've pointed out before, on panic we can be running on any
> >> CPU and our disk controller's interrupts can interrupt on the
> >> other one.  Since we'll most likely be holding a kernel lock,
> >> dealing with unlocking it might get hairy very fast.  Instead
> >> what we could do to improve the chances of a clean shutdown on
> >> panic is to instruct our disk subsystem to do polled I/O that
> >> will be run on the same CPU with the panic.
> >
> > Did you consider executing ddb's boot commands on cpu0?  I mean doing
> > an implicit "machine ddbcpu 0" before executing any "boot" command?
> >
> 
> But panic can't do it really.

And not all our architectures support the "machine ddbcpu" command.



Re: vfs_shutdown would like to do polled I/O at least on panic

2015-05-08 Thread Mike Belopuhov
On 8 May 2015 at 12:37, Martin Pieuchot  wrote:
> On 07/05/15(Thu) 20:58, Mike Belopuhov wrote:
>> As I've pointed out before, on panic we can be running on any
>> CPU and our disk controller's interrupts can interrupt on the
>> other one.  Since we'll most likely be holding a kernel lock,
>> dealing with unlocking it might get hairy very fast.  Instead
>> what we could do to improve the chances of a clean shutdown on
>> panic is to instruct our disk subsystem to do polled I/O that
>> will be run on the same CPU with the panic.
>
> Did you consider executing ddb's boot commands on cpu0?  I mean doing
> an implicit "machine ddbcpu 0" before executing any "boot" command?
>

But panic can't do it really.



Re: vfs_shutdown would like to do polled I/O at least on panic

2015-05-08 Thread Martin Pieuchot
On 07/05/15(Thu) 20:58, Mike Belopuhov wrote:
> As I've pointed out before, on panic we can be running on any
> CPU and our disk controller's interrupts can interrupt on the
> other one.  Since we'll most likely be holding a kernel lock,
> dealing with unlocking it might get hairy very fast.  Instead
> what we could do to improve the chances of a clean shutdown on
> panic is to instruct our disk subsystem to do polled I/O that
> will be run on the same CPU with the panic.

Did you consider executing ddb's boot commands on cpu0?  I mean doing
an implicit "machine ddbcpu 0" before executing any "boot" command?

> Initially I wanted to move "cold = 1" earlier in boot(), but
> after talking to Miod, it started to look like a bad idea.
> 
> Thoughts?
> 
> diff --git sys/dev/ata/ata_wdc.c sys/dev/ata/ata_wdc.c
> index 1f52488..aea9ec1 100644
> --- sys/dev/ata/ata_wdc.c
> +++ sys/dev/ata/ata_wdc.c
> @@ -199,20 +199,22 @@ wd_hibernate_io(dev_t dev, daddr_t blkno, vaddr_t addr, 
> size_t size, int op, voi
>   */
>  int
>  wdc_ata_bio(struct ata_drive_datas *drvp, struct ata_bio *ata_bio)
>  {
>   struct wdc_xfer *xfer;
>   struct channel_softc *chp = drvp->chnl_softc;
>  
>   xfer = wdc_get_xfer(WDC_NOSLEEP);
>   if (xfer == NULL)
>   return WDC_TRY_AGAIN;
> + if (panicstr)
> + ata_bio->flags |= ATA_POLL;
>   if (ata_bio->flags & ATA_POLL)
>   xfer->c_flags |= C_POLL;
>   if (!(ata_bio->flags & ATA_POLL) &&
>   (drvp->drive_flags & (DRIVE_DMA | DRIVE_UDMA)) &&
>   (ata_bio->flags & ATA_SINGLE) == 0 &&
>   (ata_bio->bcount > 512 ||
>   (chp->wdc->quirks & WDC_QUIRK_NOSHORTDMA) == 0))
>   xfer->c_flags |= C_DMA;
>   xfer->drive = drvp->drive;
>   xfer->cmd = ata_bio;
> diff --git sys/scsi/scsi_base.c sys/scsi/scsi_base.c
> index 9cf6b45..3afcc29 100644
> --- sys/scsi/scsi_base.c
> +++ sys/scsi/scsi_base.c
> @@ -1267,20 +1267,22 @@ scsi_report_luns(struct scsi_link *sc_link, int 
> selectreport,
>   return (error);
>  }
>  
>  void
>  scsi_xs_exec(struct scsi_xfer *xs)
>  {
>   xs->error = XS_NOERROR;
>   xs->resid = xs->datalen;
>   xs->status = 0;
>   CLR(xs->flags, ITSDONE);
> + if (panicstr)
> + SET(xs->flags, SCSI_AUTOCONF);
>  
>  #ifdef SCSIDEBUG
>   if (xs->sc_link->flags & SDEV_DB1) {
>   scsi_xs_show(xs);
>   if (xs->datalen && (xs->flags & SCSI_DATA_OUT))
>   scsi_show_mem(xs->data, min(64, xs->datalen));
>   }
>  #endif
>  
>   /* The adapter's scsi_cmd() is responsible for calling scsi_done(). */
> 



Re: vfs_shutdown would like to do polled I/O at least on panic

2015-05-08 Thread Mike Belopuhov
On 8 May 2015 at 11:43, Mark Kettenis  wrote:
>> Date: Thu, 7 May 2015 20:58:53 +0200
>> From: Mike Belopuhov 
>>
>> As I've pointed out before, on panic we can be running on any
>> CPU and our disk controller's interrupts can interrupt on the
>> other one.  Since we'll most likely be holding a kernel lock,
>> dealing with unlocking it might get hairy very fast.  Instead
>> what we could do to improve the chances of a clean shutdown on
>> panic is to instruct our disk subsystem to do polled I/O that
>> will be run on the same CPU with the panic.
>>
>> Initially I wanted to move "cold = 1" earlier in boot(), but
>> after talking to Miod, it started to look like a bad idea.
>>
>> Thoughts?
>
> I'm not necessarily against forcing polled I/O in this case, but it
> sucks a bit that we have to add checks in fairly low-level code to
> accomplish this.

Yes, but it's the place where we make decision what kind of I/O
we want to do.  SCSI I/O path in the controller or stack never falls
back to the polled I/O (unless it's a specific method, i.e. sddump).

> And I'm not sure if disk I/O is the only thing that
> we should worry about here.
>

What else?

> I think tsleep(9) and msleep(9) need to release and re-acquire the
> kernel lock in the "cold || panicstr" case.

Well, it's not hard to do really, but...

> We might need this for
> handling interrupts during autoconf as soon as we start distributing
> interrupts over CPUs.
>

...cold might mean that interrupts are not ready yet.  So then we might
need another flag for shutdown?



Re: vfs_shutdown would like to do polled I/O at least on panic

2015-05-08 Thread Mark Kettenis
> Date: Thu, 7 May 2015 20:58:53 +0200
> From: Mike Belopuhov 
> 
> As I've pointed out before, on panic we can be running on any
> CPU and our disk controller's interrupts can interrupt on the
> other one.  Since we'll most likely be holding a kernel lock,
> dealing with unlocking it might get hairy very fast.  Instead
> what we could do to improve the chances of a clean shutdown on
> panic is to instruct our disk subsystem to do polled I/O that
> will be run on the same CPU with the panic.
> 
> Initially I wanted to move "cold = 1" earlier in boot(), but
> after talking to Miod, it started to look like a bad idea.
> 
> Thoughts?

I'm not necessarily against forcing polled I/O in this case, but it
sucks a bit that we have to add checks in fairly low-level code to
accomplish this.  And I'm not sure if disk I/O is the only thing that
we should worry about here.

I think tsleep(9) and msleep(9) need to release and re-acquire the
kernel lock in the "cold || panicstr" case.  We might need this for
handling interrupts during autoconf as soon as we start distributing
interrupts over CPUs.

> diff --git sys/dev/ata/ata_wdc.c sys/dev/ata/ata_wdc.c
> index 1f52488..aea9ec1 100644
> --- sys/dev/ata/ata_wdc.c
> +++ sys/dev/ata/ata_wdc.c
> @@ -199,20 +199,22 @@ wd_hibernate_io(dev_t dev, daddr_t blkno, vaddr_t addr, 
> size_t size, int op, voi
>   */
>  int
>  wdc_ata_bio(struct ata_drive_datas *drvp, struct ata_bio *ata_bio)
>  {
>   struct wdc_xfer *xfer;
>   struct channel_softc *chp = drvp->chnl_softc;
>  
>   xfer = wdc_get_xfer(WDC_NOSLEEP);
>   if (xfer == NULL)
>   return WDC_TRY_AGAIN;
> + if (panicstr)
> + ata_bio->flags |= ATA_POLL;
>   if (ata_bio->flags & ATA_POLL)
>   xfer->c_flags |= C_POLL;
>   if (!(ata_bio->flags & ATA_POLL) &&
>   (drvp->drive_flags & (DRIVE_DMA | DRIVE_UDMA)) &&
>   (ata_bio->flags & ATA_SINGLE) == 0 &&
>   (ata_bio->bcount > 512 ||
>   (chp->wdc->quirks & WDC_QUIRK_NOSHORTDMA) == 0))
>   xfer->c_flags |= C_DMA;
>   xfer->drive = drvp->drive;
>   xfer->cmd = ata_bio;
> diff --git sys/scsi/scsi_base.c sys/scsi/scsi_base.c
> index 9cf6b45..3afcc29 100644
> --- sys/scsi/scsi_base.c
> +++ sys/scsi/scsi_base.c
> @@ -1267,20 +1267,22 @@ scsi_report_luns(struct scsi_link *sc_link, int 
> selectreport,
>   return (error);
>  }
>  
>  void
>  scsi_xs_exec(struct scsi_xfer *xs)
>  {
>   xs->error = XS_NOERROR;
>   xs->resid = xs->datalen;
>   xs->status = 0;
>   CLR(xs->flags, ITSDONE);
> + if (panicstr)
> + SET(xs->flags, SCSI_AUTOCONF);
>  
>  #ifdef SCSIDEBUG
>   if (xs->sc_link->flags & SDEV_DB1) {
>   scsi_xs_show(xs);
>   if (xs->datalen && (xs->flags & SCSI_DATA_OUT))
>   scsi_show_mem(xs->data, min(64, xs->datalen));
>   }
>  #endif
>  
>   /* The adapter's scsi_cmd() is responsible for calling scsi_done(). */
> 
> 



Re: vfs_shutdown would like to do polled I/O at least on panic

2015-05-07 Thread Masao Uebayashi
By doing complex VFS shutdown operation, the system's memory image will
be modified a lot since a panic was triggered.  I'd totally skip
vfs_shutdown() after a panic [1], then do the best to dump a kernel core
for analysis.

[1] OpenBSD's panic(9) sets RB_NOSYNC only when panicstr is already set
(== recursive panic()) at this moment.



vfs_shutdown would like to do polled I/O at least on panic

2015-05-07 Thread Mike Belopuhov
As I've pointed out before, on panic we can be running on any
CPU and our disk controller's interrupts can interrupt on the
other one.  Since we'll most likely be holding a kernel lock,
dealing with unlocking it might get hairy very fast.  Instead
what we could do to improve the chances of a clean shutdown on
panic is to instruct our disk subsystem to do polled I/O that
will be run on the same CPU with the panic.

Initially I wanted to move "cold = 1" earlier in boot(), but
after talking to Miod, it started to look like a bad idea.

Thoughts?

diff --git sys/dev/ata/ata_wdc.c sys/dev/ata/ata_wdc.c
index 1f52488..aea9ec1 100644
--- sys/dev/ata/ata_wdc.c
+++ sys/dev/ata/ata_wdc.c
@@ -199,20 +199,22 @@ wd_hibernate_io(dev_t dev, daddr_t blkno, vaddr_t addr, 
size_t size, int op, voi
  */
 int
 wdc_ata_bio(struct ata_drive_datas *drvp, struct ata_bio *ata_bio)
 {
struct wdc_xfer *xfer;
struct channel_softc *chp = drvp->chnl_softc;
 
xfer = wdc_get_xfer(WDC_NOSLEEP);
if (xfer == NULL)
return WDC_TRY_AGAIN;
+   if (panicstr)
+   ata_bio->flags |= ATA_POLL;
if (ata_bio->flags & ATA_POLL)
xfer->c_flags |= C_POLL;
if (!(ata_bio->flags & ATA_POLL) &&
(drvp->drive_flags & (DRIVE_DMA | DRIVE_UDMA)) &&
(ata_bio->flags & ATA_SINGLE) == 0 &&
(ata_bio->bcount > 512 ||
(chp->wdc->quirks & WDC_QUIRK_NOSHORTDMA) == 0))
xfer->c_flags |= C_DMA;
xfer->drive = drvp->drive;
xfer->cmd = ata_bio;
diff --git sys/scsi/scsi_base.c sys/scsi/scsi_base.c
index 9cf6b45..3afcc29 100644
--- sys/scsi/scsi_base.c
+++ sys/scsi/scsi_base.c
@@ -1267,20 +1267,22 @@ scsi_report_luns(struct scsi_link *sc_link, int 
selectreport,
return (error);
 }
 
 void
 scsi_xs_exec(struct scsi_xfer *xs)
 {
xs->error = XS_NOERROR;
xs->resid = xs->datalen;
xs->status = 0;
CLR(xs->flags, ITSDONE);
+   if (panicstr)
+   SET(xs->flags, SCSI_AUTOCONF);
 
 #ifdef SCSIDEBUG
if (xs->sc_link->flags & SDEV_DB1) {
scsi_xs_show(xs);
if (xs->datalen && (xs->flags & SCSI_DATA_OUT))
scsi_show_mem(xs->data, min(64, xs->datalen));
}
 #endif
 
/* The adapter's scsi_cmd() is responsible for calling scsi_done(). */