Re: 12-Current panics on boot (didn't a week ago.)
The drm-next-kmod, and drm-stable-kmod modules panic for me. I will attach logs when I can. On Friday, March 30, 2018, Andrew Reillywrote: > Hi Jonathan, all, > > I've just compiled and booted a kernel derived from current-GENERIC > but with nooptions TCP_BLACKBOX, and much to my surprise it boots. > Possible link to network-related activities is that the next line > of boot output that was not being displayed during the crash is: > > [ath_hal] loaded > > That's vaguely network-shaped: could it be an issue? > > Please let me know if there's anything else that I could test or > poke, in order to find the real culprit. > > My make.conf says: > > KERNCONF=ZEN > WRKDIRPREFIX=/usr/obj/ports > MALLOC_PRODUCTION=yes > > My /usr/src/sys/amd64/conf/ZEN says: > > include GENERIC > nooptions TCP_BLACKBOX > > Uname -a says: > FreeBSD Zen.ac-r.nu 12.0-CURRENT FreeBSD 12.0-CURRENT #0 r331768M: Sat > Mar 31 10:47:52 AEDT 2018 root@Zen:/usr/obj/usr/src/amd64.amd64/sys/ZEN > amd64 > > Cheers, > > Andrew > > > Here's the top part of the new dmesg.boot, FYI: > Copyright (c) 1992-2018 The FreeBSD Project. > Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 > The Regents of the University of California. All rights reserved. > FreeBSD is a registered trademark of The FreeBSD Foundation. > FreeBSD 12.0-CURRENT #0 r331768M: Sat Mar 31 10:47:52 AEDT 2018 > root@Zen:/usr/obj/usr/src/amd64.amd64/sys/ZEN amd64 > FreeBSD clang version 6.0.0 (tags/RELEASE_600/final 326565) (based on LLVM > 6.0.0) > WARNING: WITNESS option enabled, expect reduced performance. > VT(vga): resolution 640x480 > CPU: AMD Ryzen 7 1700 Eight-Core Processor (2994.45-MHz K8-class > CPU) > Origin="AuthenticAMD" Id=0x800f11 Family=0x17 Model=0x1 Stepping=1 > Features=0x178bfbff APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT> > Features2=0x7ed8320b SSE4.1,SSE4.2,MOVBE,POPCNT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND> > AMD Features=0x2e500800 > AMD Features2=0x35c233ff Prefetch,OSVW,SKINIT,WDT,TCE,Topology,PCXC,PNXC,DBE,PL2I,MWAITX> > Structured Extended Features=0x209c01a9 BMI1,AVX2,SMEP,BMI2,RDSEED,ADX,SMAP,CLFLUSHOPT,SHA> > XSAVE Features=0xf > AMD Extended Feature Extensions ID EBX=0x7 > SVM: (disabled in BIOS) NP,NRIP,VClean,AFlush,DAssist,NAsids=32768 > TSC: P-state invariant, performance statistics > real memory = 34359738368 (32768 MB) > avail memory = 33271214080 (31729 MB) > Event timer "LAPIC" quality 600 > ACPI APIC Table: > FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs > FreeBSD/SMP: 1 package(s) x 2 cache groups x 4 core(s) > random: unblocking device. > Firmware Warning (ACPI): Optional FADT field Pm2ControlBlock has valid > Length but zero Address: 0x/0x1 (20180313/tbfadt-796) > ioapic0 irqs 0-23 on motherboard > ioapic1 irqs 24-55 on motherboard > SMP: AP CPU #7 Launched! > SMP: AP CPU #3 Launched! > SMP: AP CPU #2 Launched! > SMP: AP CPU #6 Launched! > SMP: AP CPU #5 Launched! > SMP: AP CPU #4 Launched! > SMP: AP CPU #1 Launched! > Timecounter "TSC-low" frequency 1497224985 Hz quality 1000 > random: entropy device external interface > [ath_hal] loaded > module_register_init: MOD_LOAD (vesa, 0x8109f600, 0) error 19 > random: registering fast source Intel Secure Key RNG > random: fast provider: "Intel Secure Key RNG" > kbd1 at kbdmux0 > netmap: loaded module > nexus0 > vtvga0: on motherboard > cryptosoft0: on motherboard > aesni0: on motherboard > acpi0: on motherboard > acpi0: Power Button (fixed) > cpu0: on acpi0 > cpu1: on acpi0 > cpu2: on acpi0 > cpu3: on acpi0 > cpu4: on acpi0 > cpu5: on acpi0 > cpu6: on acpi0 > cpu7: on acpi0 > attimer0: port 0x40-0x43 irq 0 on acpi0 > Timecounter "i8254" frequency 1193182 Hz quality 0 > Event timer "i8254" frequency 1193182 Hz quality 100 > atrtc0: port 0x70-0x71 on acpi0 > atrtc0: registered as a time-of-day clock, resolution 1.00s > Event timer "RTC" frequency 32768 Hz quality 0 > hpet0: iomem 0xfed0-0xfed003ff irq 0,8 on > acpi0 > Timecounter "HPET" frequency 14318180 Hz quality 950 > Event timer "HPET" frequency 14318180 Hz quality 350 > Event timer "HPET1" frequency 14318180 Hz quality 350 > Event timer "HPET2" frequency 14318180 Hz quality 350 > Timecounter "ACPI-fast" frequency 3579545 Hz quality 900 > acpi_timer0: <32-bit timer at 3.579545MHz> port 0x808-0x80b on acpi0 > pcib0: port 0xcf8-0xcff on acpi0 > pci0: on pcib0 > amdsmn0: on hostb0 > amdtemp0: on hostb0 > > > On Sun, Mar 25, 2018 at 04:35:31AM +, Jonathan Looney wrote: > > For now, you can update through r331485 and then take TCP_BLACKBOX out of > > your kernel config file. That won’t really “fix”
Re: 12-Current panics on boot (didn't a week ago.)
Hi Jonathan, all, I've just compiled and booted a kernel derived from current-GENERIC but with nooptions TCP_BLACKBOX, and much to my surprise it boots. Possible link to network-related activities is that the next line of boot output that was not being displayed during the crash is: [ath_hal] loaded That's vaguely network-shaped: could it be an issue? Please let me know if there's anything else that I could test or poke, in order to find the real culprit. My make.conf says: KERNCONF=ZEN WRKDIRPREFIX=/usr/obj/ports MALLOC_PRODUCTION=yes My /usr/src/sys/amd64/conf/ZEN says: include GENERIC nooptions TCP_BLACKBOX Uname -a says: FreeBSD Zen.ac-r.nu 12.0-CURRENT FreeBSD 12.0-CURRENT #0 r331768M: Sat Mar 31 10:47:52 AEDT 2018 root@Zen:/usr/obj/usr/src/amd64.amd64/sys/ZEN amd64 Cheers, Andrew Here's the top part of the new dmesg.boot, FYI: Copyright (c) 1992-2018 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 12.0-CURRENT #0 r331768M: Sat Mar 31 10:47:52 AEDT 2018 root@Zen:/usr/obj/usr/src/amd64.amd64/sys/ZEN amd64 FreeBSD clang version 6.0.0 (tags/RELEASE_600/final 326565) (based on LLVM 6.0.0) WARNING: WITNESS option enabled, expect reduced performance. VT(vga): resolution 640x480 CPU: AMD Ryzen 7 1700 Eight-Core Processor (2994.45-MHz K8-class CPU) Origin="AuthenticAMD" Id=0x800f11 Family=0x17 Model=0x1 Stepping=1 Features=0x178bfbffFeatures2=0x7ed8320b AMD Features=0x2e500800 AMD Features2=0x35c233ff Structured Extended Features=0x209c01a9 XSAVE Features=0xf AMD Extended Feature Extensions ID EBX=0x7 SVM: (disabled in BIOS) NP,NRIP,VClean,AFlush,DAssist,NAsids=32768 TSC: P-state invariant, performance statistics real memory = 34359738368 (32768 MB) avail memory = 33271214080 (31729 MB) Event timer "LAPIC" quality 600 ACPI APIC Table: FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs FreeBSD/SMP: 1 package(s) x 2 cache groups x 4 core(s) random: unblocking device. Firmware Warning (ACPI): Optional FADT field Pm2ControlBlock has valid Length but zero Address: 0x/0x1 (20180313/tbfadt-796) ioapic0 irqs 0-23 on motherboard ioapic1 irqs 24-55 on motherboard SMP: AP CPU #7 Launched! SMP: AP CPU #3 Launched! SMP: AP CPU #2 Launched! SMP: AP CPU #6 Launched! SMP: AP CPU #5 Launched! SMP: AP CPU #4 Launched! SMP: AP CPU #1 Launched! Timecounter "TSC-low" frequency 1497224985 Hz quality 1000 random: entropy device external interface [ath_hal] loaded module_register_init: MOD_LOAD (vesa, 0x8109f600, 0) error 19 random: registering fast source Intel Secure Key RNG random: fast provider: "Intel Secure Key RNG" kbd1 at kbdmux0 netmap: loaded module nexus0 vtvga0: on motherboard cryptosoft0: on motherboard aesni0: on motherboard acpi0: on motherboard acpi0: Power Button (fixed) cpu0: on acpi0 cpu1: on acpi0 cpu2: on acpi0 cpu3: on acpi0 cpu4: on acpi0 cpu5: on acpi0 cpu6: on acpi0 cpu7: on acpi0 attimer0: port 0x40-0x43 irq 0 on acpi0 Timecounter "i8254" frequency 1193182 Hz quality 0 Event timer "i8254" frequency 1193182 Hz quality 100 atrtc0: port 0x70-0x71 on acpi0 atrtc0: registered as a time-of-day clock, resolution 1.00s Event timer "RTC" frequency 32768 Hz quality 0 hpet0: iomem 0xfed0-0xfed003ff irq 0,8 on acpi0 Timecounter "HPET" frequency 14318180 Hz quality 950 Event timer "HPET" frequency 14318180 Hz quality 350 Event timer "HPET1" frequency 14318180 Hz quality 350 Event timer "HPET2" frequency 14318180 Hz quality 350 Timecounter "ACPI-fast" frequency 3579545 Hz quality 900 acpi_timer0: <32-bit timer at 3.579545MHz> port 0x808-0x80b on acpi0 pcib0: port 0xcf8-0xcff on acpi0 pci0: on pcib0 amdsmn0: on hostb0 amdtemp0: on hostb0 On Sun, Mar 25, 2018 at 04:35:31AM +, Jonathan Looney wrote: > For now, you can update through r331485 and then take TCP_BLACKBOX out of > your kernel config file. That won’t really “fix” anything, but should at > least get you a booting system (assuming the new code from r331347 is > really triggering a problem). > > > I’ll take another look to see if I missed something in the commit. But, at > the moment, I’m hard-pressed to see how r331347 would cause the problem you > describe. > > > Jonathan > > On Sat, Mar 24, 2018 at 9:17 PM Andrew Reilly >
Re: 12-Current panics on boot (didn't a week ago.)
For now, you can update through r331485 and then take TCP_BLACKBOX out of your kernel config file. That won’t really “fix” anything, but should at least get you a booting system (assuming the new code from r331347 is really triggering a problem). I’ll take another look to see if I missed something in the commit. But, at the moment, I’m hard-pressed to see how r331347 would cause the problem you describe. Jonathan On Sat, Mar 24, 2018 at 9:17 PM Andrew Reillywrote: > OK, I've completed the search: r331346 works, r331347 panics > somewhere in the initialization of random. > > In the 331347 change (Add the "TCP Blackbox Recorder") I can't see > anything obvious to tweak, unfortunately. It's a fair chunk of new > code but it's all network-stack related, and my kernel is panicking > long before any network activity happens. > > Any suggestions? > > Cheers, > > Andrew > > On Sat, Mar 24, 2018 at 05:23:18PM -0600, Warner Losh wrote: > > Thanks Andrew... I can't recreate this on my VM nor my real hardware. > > > > Warner > > > > On Sat, Mar 24, 2018 at 5:22 PM, Andrew Reilly > > wrote: > > > > > So, r331464 crashes in the same place, on my system. r331064 still > boots > > > OK. I'll keep searching. > > > > > > One week ago there was a change to randomdev to poll for signals every > so > > > often, as a defence against very large reads. That wouldn't have > > > introduced a race somewhere, > > > or left things in an unexpected state, perhaps? That change (r331070) > by > > > cem@ is just a few revisions after the one that is working for me. > I'll > > > start looking there... > > > > > > Cheers, > > > > > > Andrew > > > > > > On Sun, Mar 25, 2018 at 07:49:17AM +1100, Andrew Reilly wrote: > > > > Hi Warner, > > > > > > > > The breakage was in 331470, and at least one version earlier, that I > > > updated past when it panicked. > > > > > > > > I'm guessing that kdb's inability to dump would be down to it not > having > > > found any disk devices yet, right? So yes, bisecting to narrow down > the > > > issue is probably the best bet. I'll try your r331464: if that works > that > > > leaves only four or five revisions. Of course the breakage could be > > > hardware specific. > > > > > > > > Cheers, > > > > -- > > > > Andrew > > > > ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: 12-Current panics on boot (didn't a week ago.)
On Sun, 25 Mar 2018 05:21:10 +0200, Andrew Reilly wrote: > > OK, I've completed the search: r331346 works, r331347 panics > somewhere in the initialization of random. > > In the 331347 change (Add the "TCP Blackbox Recorder") I can't see > anything obvious to tweak, unfortunately. It's a fair chunk of new > code but it's all network-stack related, and my kernel is panicking > long before any network activity happens. > > Any suggestions? Does your system boot if you upgrade to at least r331485 and remove "options TCP_BLACKBOX" from sys/amd64/conf/GENERIC (if you build and run GENERIC)? -- Herbert ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: 12-Current panics on boot (didn't a week ago.)
OK, I've completed the search: r331346 works, r331347 panics somewhere in the initialization of random. In the 331347 change (Add the "TCP Blackbox Recorder") I can't see anything obvious to tweak, unfortunately. It's a fair chunk of new code but it's all network-stack related, and my kernel is panicking long before any network activity happens. Any suggestions? Cheers, Andrew On Sat, Mar 24, 2018 at 05:23:18PM -0600, Warner Losh wrote: > Thanks Andrew... I can't recreate this on my VM nor my real hardware. > > Warner > > On Sat, Mar 24, 2018 at 5:22 PM, Andrew Reilly> wrote: > > > So, r331464 crashes in the same place, on my system. r331064 still boots > > OK. I'll keep searching. > > > > One week ago there was a change to randomdev to poll for signals every so > > often, as a defence against very large reads. That wouldn't have > > introduced a race somewhere, > > or left things in an unexpected state, perhaps? That change (r331070) by > > cem@ is just a few revisions after the one that is working for me. I'll > > start looking there... > > > > Cheers, > > > > Andrew > > > > On Sun, Mar 25, 2018 at 07:49:17AM +1100, Andrew Reilly wrote: > > > Hi Warner, > > > > > > The breakage was in 331470, and at least one version earlier, that I > > updated past when it panicked. > > > > > > I'm guessing that kdb's inability to dump would be down to it not having > > found any disk devices yet, right? So yes, bisecting to narrow down the > > issue is probably the best bet. I'll try your r331464: if that works that > > leaves only four or five revisions. Of course the breakage could be > > hardware specific. > > > > > > Cheers, > > > -- > > > Andrew > > ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: 12-Current panics on boot (didn't a week ago.)
So, r331464 crashes in the same place, on my system. r331064 still boots OK. I'll keep searching. One week ago there was a change to randomdev to poll for signals every so often, as a defence against very large reads. That wouldn't have introduced a race somewhere, or left things in an unexpected state, perhaps? That change (r331070) by cem@ is just a few revisions after the one that is working for me. I'll start looking there... Cheers, Andrew On Sun, Mar 25, 2018 at 07:49:17AM +1100, Andrew Reilly wrote: > Hi Warner, > > The breakage was in 331470, and at least one version earlier, that I updated > past when it panicked. > > I'm guessing that kdb's inability to dump would be down to it not having > found any disk devices yet, right? So yes, bisecting to narrow down the > issue is probably the best bet. I'll try your r331464: if that works that > leaves only four or five revisions. Of course the breakage could be hardware > specific. > > Cheers, > -- > Andrew ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: 12-Current panics on boot (didn't a week ago.)
Thanks Andrew... I can't recreate this on my VM nor my real hardware. Warner On Sat, Mar 24, 2018 at 5:22 PM, Andrew Reillywrote: > So, r331464 crashes in the same place, on my system. r331064 still boots > OK. I'll keep searching. > > One week ago there was a change to randomdev to poll for signals every so > often, as a defence against very large reads. That wouldn't have > introduced a race somewhere, > or left things in an unexpected state, perhaps? That change (r331070) by > cem@ is just a few revisions after the one that is working for me. I'll > start looking there... > > Cheers, > > Andrew > > On Sun, Mar 25, 2018 at 07:49:17AM +1100, Andrew Reilly wrote: > > Hi Warner, > > > > The breakage was in 331470, and at least one version earlier, that I > updated past when it panicked. > > > > I'm guessing that kdb's inability to dump would be down to it not having > found any disk devices yet, right? So yes, bisecting to narrow down the > issue is probably the best bet. I'll try your r331464: if that works that > leaves only four or five revisions. Of course the breakage could be > hardware specific. > > > > Cheers, > > -- > > Andrew > ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: 12-Current panics on boot (didn't a week ago.)
Hi Warner, The breakage was in 331470, and at least one version earlier, that I updated past when it panicked. I'm guessing that kdb's inability to dump would be down to it not having found any disk devices yet, right? So yes, bisecting to narrow down the issue is probably the best bet. I'll try your r331464: if that works that leaves only four or five revisions. Of course the breakage could be hardware specific. Cheers, -- Andrew On 25 March 2018 1:14:40 am AEDT, Warner Loshwrote: >Also, what rev failed? I booted r331464 last night w/o issue. > >Warner > >On Fri, Mar 23, 2018 at 9:56 PM, Andrew Reilly >wrote: > >> Hi all, >> >> For reasons that still escape me, I haven't been able to get a kernel >dump >> to debug, sorry. >> >> Just thought that I'd generate a fairly low-quality report, to see if >> anyone has some ideas. >> >> The last kernel that I have that booted OK (and I'm now running) is: >> FreeBSD Zen.ac-r.nu 12.0-CURRENT FreeBSD 12.0-CURRENT #1 r331064M: >Sat >> Mar 17 07:54:51 AEDT 2018 >root@Zen:/usr/obj/usr/src/amd64.amd64/sys/GENERIC >> amd64 >> >> The machine is a: >> CPU: AMD Ryzen 7 1700 Eight-Core Processor (2994.46-MHz >K8-class >> CPU) >> Origin="AuthenticAMD" Id=0x800f11 Family=0x17 Model=0x1 >Stepping=1 >> Features=0x178bfbff > APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT> >> >> Kernels built from head as of a couple of hours ago get through >launching >> the other CPUs and then stops somewhere in random, apparently: >> >> SMP: AP CPU #2 Launched! >> Timecounter "TSC-low" frequency 1497223020 Hz quality 1000 >> random: entpanic: mtx_lock() of spin mutex (null) @ >> /usr/src/sys/kern/subr_bus.c:617 >> cpuid = 0 >> time = 1 >> KDB: stack backtrace: >> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame >> 0xfe4507a0 >> vpanic() at vpanic+0x18d/frame 0xfe450800 >> doadump () at doadump/frame 0xfe450880 >> __mtx_lock_flags() at __mtx_lock_flags+0x163/frame 0xfe4508d0 >> devctl_queue_data_f() at devctl_queue_data_f+0x6a/frame >0xfe450900 >> g_dev_taste() at g_dev_taste+0x370/frame 0xfe450a10 >> g_new_provider_event() at g_new_provider_event+0xfa/frame >> 0xfe450a30 >> g_run_events() at g_run_events+0x151/frame 0xfe450a70 >> fork_exit() at fork_exit+0x84/frame 0xfe450ab0 >> fork_trampoline() at fork_trampoline+0xe/frame 0xfe450ab0 >> --- trap 0, rip = 0, rsp = 0, rbp = 0 --- >> KDB: enter: panic >> [ thread pid 14 tid 100052 ] >> Stopped at kdb_enter+0x3b: movq$0,kdb_why >> db> dump >> Cannot dump: no dump device specified. >> db> >> >> Now dumping worked fine the last time the kernel panicked: I have >> dumpdev=AUTO in rc.conf and I have swap on nvd0p3 (first) and >> /dev/zvol/root/swap >> (second, larger than the first.) >> >> Root on the nvd0p2 is ZFS, and ther's a four-drive raidZ with user >> directories and what-not on them, and another ZFS on an external USB >drive >> that I use >> for backups, unmounted. >> >> In the new kernels, we clearly aren't even getting as far as finding >the >> hubs and controllers, let alone the drives. >> >> I've attached dmesg.boot from the last boot from last week's good >kernel. >> (While briefly in yoyo mode I turned the SMT back on, so now there >are 16 >> cores >> instead of the eight mentioned in the crash dump. Didn't help, but I >> haven't turned it back off yet.) >> >> Cheers, >> >> Andrew >> >> >> ___ >> freebsd-current@freebsd.org mailing list >> https://lists.freebsd.org/mailman/listinfo/freebsd-current >> To unsubscribe, send any mail to >"freebsd-current-unsubscr...@freebsd.org" >> >> ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: 12-Current panics on boot (didn't a week ago.)
Also, what rev failed? I booted r331464 last night w/o issue. Warner On Fri, Mar 23, 2018 at 9:56 PM, Andrew Reillywrote: > Hi all, > > For reasons that still escape me, I haven't been able to get a kernel dump > to debug, sorry. > > Just thought that I'd generate a fairly low-quality report, to see if > anyone has some ideas. > > The last kernel that I have that booted OK (and I'm now running) is: > FreeBSD Zen.ac-r.nu 12.0-CURRENT FreeBSD 12.0-CURRENT #1 r331064M: Sat > Mar 17 07:54:51 AEDT 2018 > root@Zen:/usr/obj/usr/src/amd64.amd64/sys/GENERIC > amd64 > > The machine is a: > CPU: AMD Ryzen 7 1700 Eight-Core Processor (2994.46-MHz K8-class > CPU) > Origin="AuthenticAMD" Id=0x800f11 Family=0x17 Model=0x1 Stepping=1 > Features=0x178bfbff APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT> > > Kernels built from head as of a couple of hours ago get through launching > the other CPUs and then stops somewhere in random, apparently: > > SMP: AP CPU #2 Launched! > Timecounter "TSC-low" frequency 1497223020 Hz quality 1000 > random: entpanic: mtx_lock() of spin mutex (null) @ > /usr/src/sys/kern/subr_bus.c:617 > cpuid = 0 > time = 1 > KDB: stack backtrace: > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame > 0xfe4507a0 > vpanic() at vpanic+0x18d/frame 0xfe450800 > doadump () at doadump/frame 0xfe450880 > __mtx_lock_flags() at __mtx_lock_flags+0x163/frame 0xfe4508d0 > devctl_queue_data_f() at devctl_queue_data_f+0x6a/frame 0xfe450900 > g_dev_taste() at g_dev_taste+0x370/frame 0xfe450a10 > g_new_provider_event() at g_new_provider_event+0xfa/frame > 0xfe450a30 > g_run_events() at g_run_events+0x151/frame 0xfe450a70 > fork_exit() at fork_exit+0x84/frame 0xfe450ab0 > fork_trampoline() at fork_trampoline+0xe/frame 0xfe450ab0 > --- trap 0, rip = 0, rsp = 0, rbp = 0 --- > KDB: enter: panic > [ thread pid 14 tid 100052 ] > Stopped at kdb_enter+0x3b: movq$0,kdb_why > db> dump > Cannot dump: no dump device specified. > db> > > Now dumping worked fine the last time the kernel panicked: I have > dumpdev=AUTO in rc.conf and I have swap on nvd0p3 (first) and > /dev/zvol/root/swap > (second, larger than the first.) > > Root on the nvd0p2 is ZFS, and ther's a four-drive raidZ with user > directories and what-not on them, and another ZFS on an external USB drive > that I use > for backups, unmounted. > > In the new kernels, we clearly aren't even getting as far as finding the > hubs and controllers, let alone the drives. > > I've attached dmesg.boot from the last boot from last week's good kernel. > (While briefly in yoyo mode I turned the SMT back on, so now there are 16 > cores > instead of the eight mentioned in the crash dump. Didn't help, but I > haven't turned it back off yet.) > > Cheers, > > Andrew > > > ___ > freebsd-current@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" > > ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: 12-Current panics on boot (didn't a week ago.)
That lock has been there for a long, long time (like 5 or 6 major releases)... It's surprising that it's causing issues now. Can you bisect versions to find when this starts happening? Warner On Fri, Mar 23, 2018 at 9:56 PM, Andrew Reillywrote: > Hi all, > > For reasons that still escape me, I haven't been able to get a kernel dump > to debug, sorry. > > Just thought that I'd generate a fairly low-quality report, to see if > anyone has some ideas. > > The last kernel that I have that booted OK (and I'm now running) is: > FreeBSD Zen.ac-r.nu 12.0-CURRENT FreeBSD 12.0-CURRENT #1 r331064M: Sat > Mar 17 07:54:51 AEDT 2018 > root@Zen:/usr/obj/usr/src/amd64.amd64/sys/GENERIC > amd64 > > The machine is a: > CPU: AMD Ryzen 7 1700 Eight-Core Processor (2994.46-MHz K8-class > CPU) > Origin="AuthenticAMD" Id=0x800f11 Family=0x17 Model=0x1 Stepping=1 > Features=0x178bfbff APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT> > > Kernels built from head as of a couple of hours ago get through launching > the other CPUs and then stops somewhere in random, apparently: > > SMP: AP CPU #2 Launched! > Timecounter "TSC-low" frequency 1497223020 Hz quality 1000 > random: entpanic: mtx_lock() of spin mutex (null) @ > /usr/src/sys/kern/subr_bus.c:617 > cpuid = 0 > time = 1 > KDB: stack backtrace: > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame > 0xfe4507a0 > vpanic() at vpanic+0x18d/frame 0xfe450800 > doadump () at doadump/frame 0xfe450880 > __mtx_lock_flags() at __mtx_lock_flags+0x163/frame 0xfe4508d0 > devctl_queue_data_f() at devctl_queue_data_f+0x6a/frame 0xfe450900 > g_dev_taste() at g_dev_taste+0x370/frame 0xfe450a10 > g_new_provider_event() at g_new_provider_event+0xfa/frame > 0xfe450a30 > g_run_events() at g_run_events+0x151/frame 0xfe450a70 > fork_exit() at fork_exit+0x84/frame 0xfe450ab0 > fork_trampoline() at fork_trampoline+0xe/frame 0xfe450ab0 > --- trap 0, rip = 0, rsp = 0, rbp = 0 --- > KDB: enter: panic > [ thread pid 14 tid 100052 ] > Stopped at kdb_enter+0x3b: movq$0,kdb_why > db> dump > Cannot dump: no dump device specified. > db> > > Now dumping worked fine the last time the kernel panicked: I have > dumpdev=AUTO in rc.conf and I have swap on nvd0p3 (first) and > /dev/zvol/root/swap > (second, larger than the first.) > > Root on the nvd0p2 is ZFS, and ther's a four-drive raidZ with user > directories and what-not on them, and another ZFS on an external USB drive > that I use > for backups, unmounted. > > In the new kernels, we clearly aren't even getting as far as finding the > hubs and controllers, let alone the drives. > > I've attached dmesg.boot from the last boot from last week's good kernel. > (While briefly in yoyo mode I turned the SMT back on, so now there are 16 > cores > instead of the eight mentioned in the crash dump. Didn't help, but I > haven't turned it back off yet.) > > Cheers, > > Andrew > > > ___ > freebsd-current@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" > > ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
12-Current panics on boot (didn't a week ago.)
Hi all, For reasons that still escape me, I haven't been able to get a kernel dump to debug, sorry. Just thought that I'd generate a fairly low-quality report, to see if anyone has some ideas. The last kernel that I have that booted OK (and I'm now running) is: FreeBSD Zen.ac-r.nu 12.0-CURRENT FreeBSD 12.0-CURRENT #1 r331064M: Sat Mar 17 07:54:51 AEDT 2018 root@Zen:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64 The machine is a: CPU: AMD Ryzen 7 1700 Eight-Core Processor (2994.46-MHz K8-class CPU) Origin="AuthenticAMD" Id=0x800f11 Family=0x17 Model=0x1 Stepping=1 Features=0x178bfbffKernels built from head as of a couple of hours ago get through launching the other CPUs and then stops somewhere in random, apparently: SMP: AP CPU #2 Launched! Timecounter "TSC-low" frequency 1497223020 Hz quality 1000 random: entpanic: mtx_lock() of spin mutex (null) @ /usr/src/sys/kern/subr_bus.c:617 cpuid = 0 time = 1 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe4507a0 vpanic() at vpanic+0x18d/frame 0xfe450800 doadump () at doadump/frame 0xfe450880 __mtx_lock_flags() at __mtx_lock_flags+0x163/frame 0xfe4508d0 devctl_queue_data_f() at devctl_queue_data_f+0x6a/frame 0xfe450900 g_dev_taste() at g_dev_taste+0x370/frame 0xfe450a10 g_new_provider_event() at g_new_provider_event+0xfa/frame 0xfe450a30 g_run_events() at g_run_events+0x151/frame 0xfe450a70 fork_exit() at fork_exit+0x84/frame 0xfe450ab0 fork_trampoline() at fork_trampoline+0xe/frame 0xfe450ab0 --- trap 0, rip = 0, rsp = 0, rbp = 0 --- KDB: enter: panic [ thread pid 14 tid 100052 ] Stopped at kdb_enter+0x3b: movq$0,kdb_why db> dump Cannot dump: no dump device specified. db> Now dumping worked fine the last time the kernel panicked: I have dumpdev=AUTO in rc.conf and I have swap on nvd0p3 (first) and /dev/zvol/root/swap (second, larger than the first.) Root on the nvd0p2 is ZFS, and ther's a four-drive raidZ with user directories and what-not on them, and another ZFS on an external USB drive that I use for backups, unmounted. In the new kernels, we clearly aren't even getting as far as finding the hubs and controllers, let alone the drives. I've attached dmesg.boot from the last boot from last week's good kernel. (While briefly in yoyo mode I turned the SMT back on, so now there are 16 cores instead of the eight mentioned in the crash dump. Didn't help, but I haven't turned it back off yet.) Cheers, Andrew Copyright (c) 1992-2018 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 12.0-CURRENT #1 r331064M: Sat Mar 17 07:54:51 AEDT 2018 root@Zen:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64 FreeBSD clang version 6.0.0 (tags/RELEASE_600/final 326565) (based on LLVM 6.0.0) WARNING: WITNESS option enabled, expect reduced performance. VT(vga): resolution 640x480 CPU: AMD Ryzen 7 1700 Eight-Core Processor (2994.46-MHz K8-class CPU) Origin="AuthenticAMD" Id=0x800f11 Family=0x17 Model=0x1 Stepping=1 Features=0x178bfbff Features2=0x7ed8320b AMD Features=0x2e500800 AMD Features2=0x35c233ff Structured Extended Features=0x209c01a9 XSAVE Features=0xf AMD Extended Feature Extensions ID EBX=0x7 SVM: (disabled in BIOS) NP,NRIP,VClean,AFlush,DAssist,NAsids=32768 TSC: P-state invariant, performance statistics real memory = 34359738368 (32768 MB) avail memory = 33272578048 (31731 MB) Event timer "LAPIC" quality 600 ACPI APIC Table: FreeBSD/SMP: Multiprocessor System Detected: 16 CPUs FreeBSD/SMP: 1 package(s) x 2 cache groups x 4 core(s) x 2 hardware threads random: unblocking device. Firmware Warning (ACPI): Optional FADT field Pm2ControlBlock has valid Length but zero Address: 0x/0x1 (20180313/tbfadt-796) ioapic0: Changing APIC ID to 17 ioapic1: Changing APIC ID to 18 ioapic0 irqs 0-23 on motherboard ioapic1 irqs 24-55 on motherboard SMP: AP CPU #12 Launched! SMP: AP CPU #5 Launched! SMP: AP CPU #9 Launched! SMP: AP CPU #13 Launched! SMP: AP CPU #3 Launched! SMP: AP CPU #1 Launched! SMP: AP CPU #2 Launched! SMP: AP CPU #8 Launched! SMP: AP CPU #15 Launched! SMP: AP CPU #4 Launched! SMP: AP CPU #7 Launched! SMP: