Re: config_mounroot - spinout while attaching nouveaufb0 on amd64 with LOCKDEBUG

2020-02-17 Thread Michael van Elst
jaromir.dole...@gmail.com (=?UTF-8?B?SmFyb23DrXIgRG9sZcSNZWs=?=) writes:

>I confirmed via ddb that this happens due to config_mountroot_thread()
>holding the kernel lock for too long

Probably because some driver is busy-waiting while holding the lock.

I had the same with lots of console output plus long delay() calls in sdmmc.

-- 
-- 
Michael van Elst
Internet: mlel...@serpens.de
"A potential Snark may lurk in every tree."


Re: panic: softint screwup

2020-02-17 Thread Andrew Doran
On Wed, Feb 12, 2020 at 11:09:24AM +, Andrew Doran wrote:
> On Tue, Feb 11, 2020 at 07:26:29AM +, Nick Hudson wrote:
> > On 04/02/2020 23:17, Andrew Doran wrote:
> > > On Tue, Feb 04, 2020 at 07:03:28AM -0400, Jared McNeill wrote:
> > >
> > >> First time seeing this one.. an arm64 board sitting idle at the login 
> > >> prompt
> > >> rebooted itself with this panic. Unfortunately the default ddb.onpanic=0
> > >> strikes again and I can't get any more information than this:
> > >
> > > I added this recently to replace a vague KASSERT.  Thanks for grabbing the
> > > output.
> > >
> > >> [ 364.3342263] curcpu=0, spl=4 curspl=7
> > >> [ 364.3342263] onproc=0x00237f743080 => l_stat=7 l_flag=2201 
> > >> l_cpu=0
> > >> [ 364.3342263] curlwp=0x00237f71e580 => l_stat=1 l_flag=0200 
> > >> l_cpu=0
> > >> [ 364.3342263] pinned=0x00237f71e100 => l_stat=7 l_flag=0200 
> > >> l_cpu=0
> > >> [ 364.3342263] panic: softint screwup
> > >> [ 364.3342263] cpu0: Begin traceback...
> > >> [ 364.3342263] trace fp ffc101da7be0
> > >> [ 364.3342263] fp ffc101da7c00 vpanic() at ffc0004ad728 
> > >> netbsd:vpanic+0x160
> > >> [ 364.3342263] fp ffc101da7c70 panic() at ffc0004ad81c 
> > >> netbsd:panic+0x44
> > >> [ 364.3342263] fp ffc101da7d40 softint_dispatch() at 
> > >> ffc00047bda4 netbsd:softint_dispatch+0x5c4
> > >> [ 364.3342263] fp ffc101d9fc30 cpu_switchto_softint() at 
> > >> ffc85198 netbsd:cpu_switchto_softint+0x68
> > >> [ 364.3342263] fp ffc101d9fc80 splx() at ffc040d4 
> > >> netbsd:splx+0xbc
> > >> [ 364.3342263] fp ffc101d9fcb0 callout_softclock() at 
> > >> ffc000489e04 netbsd:callout_softclock+0x36c
> > >> [ 364.3342263] fp ffc101d9fd40 softint_dispatch() at 
> > >> ffc00047b8dc netbsd:softint_dispatch+0xfc
> > >> [ 364.3342263] fp ffc101d3fcc0 cpu_switchto_softint() at 
> > >> ffc85198 netbsd:cpu_switchto_softint+0x68
> > >> [ 364.3342263] fp ffc101d3fdf8 cpu_idle() at ffc86128 
> > >> netbsd:cpu_idle+0x58
> > >> [ 364.3342263] fp ffc101d3fe40 idle_loop() at ffc0004546a4 
> > >> netbsd:idle_loop+0x174
> > >
> > > Something has cleared the LW_RUNNING flag on softclk/0 between where it is
> > > set (unlocked) at line 884 of kern_softint.c and callout_softclock().
> > 
> > Isn't it the case that softclk/0 is the victim/interrupted LWP for a 
> > soft{serial,net,bio}.
> > That's certainly how I read the FP values.
> > 
> > the callout handler blocked and softclk/0 became a victim as well maybe?
> > 
> > http://src.illumos.org/source/xref/netbsd-src/sys/kern/kern_synch.c#687
> > 
> > a soft{serial,net,bio} happends before curlwp is changed away from the 
> > blocking softint thread
> 
> I suspect putting the RUNNING flag back into l_pflag will cure it, since
> the update of l_flag without the LWP locked is dodgy. I can't think of
> sonething that would clobber the update, but it is breaking th rules so
> to speak..
> 
> I'll do just that on Saturday once back in front of a real computer.

Change made on Saturday.

Andrew


re: config_mounroot - spinout while attaching nouveaufb0 on amd64 with LOCKDEBUG

2020-02-17 Thread matthew green
FWIW, i've been running my radeon with a patch that exlicitly drops
kernel lock around the "real attach" function (the one that config
mountroot ends up calling.)

we really need to MPSAFE-ify the autoconf subsystem.  right now, it
is expected that autoconf runs with kernel lock... i am not sure of
the path we should take for this -- but let's actually have a design
in place we are happy with, while my change below works, it's ugly
and wrong.


.mrg.


Index: sys/external/bsd/drm2/radeon/radeon_pci.c
===
RCS file: /cvsroot/src/sys/external/bsd/drm2/radeon/radeon_pci.c,v
retrieving revision 1.14
diff -p -u -r1.14 radeon_pci.c
--- sys/external/bsd/drm2/radeon/radeon_pci.c   24 Jan 2020 11:44:27 -  
1.14
+++ sys/external/bsd/drm2/radeon/radeon_pci.c   17 Feb 2020 16:54:05 -
@@ -229,6 +229,9 @@ radeon_attach_real(device_t self)
unsigned long flags;
int error;
 
+   /* XXXSMP autoconf */
+   KERNEL_UNLOCK_ONE(NULL);
+
ok = radeon_pci_lookup(pa, );
KASSERT(ok);
 
@@ -274,6 +277,9 @@ radeon_attach_real(device_t self)
}
 
 out:   sc->sc_dev = self;
+
+   /* XXXSMP autoconf */
+   KERNEL_LOCK(1, NULL);
 }
 
 static int