Re: 12-Current panics on boot (didn't a week ago.)

2018-03-31 Thread Joe Maloney
The drm-next-kmod, and drm-stable-kmod modules panic for me.  I will attach
logs when I can.

On Friday, March 30, 2018, Andrew Reilly  wrote:

> Hi Jonathan, all,
>
> I've just compiled and booted a kernel derived from current-GENERIC
> but with nooptions TCP_BLACKBOX, and much to my surprise it boots.
> Possible link to network-related activities is that the next line
> of boot output that was not being displayed during the crash is:
>
> [ath_hal] loaded
>
> That's vaguely network-shaped: could it be an issue?
>
> Please let me know if there's anything else that I could test or
> poke, in order to find the real culprit.
>
> My make.conf says:
>
> KERNCONF=ZEN
> WRKDIRPREFIX=/usr/obj/ports
> MALLOC_PRODUCTION=yes
>
> My /usr/src/sys/amd64/conf/ZEN says:
>
> include GENERIC
> nooptions TCP_BLACKBOX
>
> Uname -a says:
> FreeBSD Zen.ac-r.nu 12.0-CURRENT FreeBSD 12.0-CURRENT #0 r331768M: Sat
> Mar 31 10:47:52 AEDT 2018 root@Zen:/usr/obj/usr/src/amd64.amd64/sys/ZEN
> amd64
>
> Cheers,
>
> Andrew
>
>
> Here's the top part of the new dmesg.boot, FYI:
> Copyright (c) 1992-2018 The FreeBSD Project.
> Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
> The Regents of the University of California. All rights reserved.
> FreeBSD is a registered trademark of The FreeBSD Foundation.
> FreeBSD 12.0-CURRENT #0 r331768M: Sat Mar 31 10:47:52 AEDT 2018
> root@Zen:/usr/obj/usr/src/amd64.amd64/sys/ZEN amd64
> FreeBSD clang version 6.0.0 (tags/RELEASE_600/final 326565) (based on LLVM
> 6.0.0)
> WARNING: WITNESS option enabled, expect reduced performance.
> VT(vga): resolution 640x480
> CPU: AMD Ryzen 7 1700 Eight-Core Processor   (2994.45-MHz K8-class
> CPU)
>   Origin="AuthenticAMD"  Id=0x800f11  Family=0x17  Model=0x1  Stepping=1
>   Features=0x178bfbff APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT>
>   Features2=0x7ed8320b SSE4.1,SSE4.2,MOVBE,POPCNT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND>
>   AMD Features=0x2e500800
>   AMD Features2=0x35c233ff Prefetch,OSVW,SKINIT,WDT,TCE,Topology,PCXC,PNXC,DBE,PL2I,MWAITX>
>   Structured Extended Features=0x209c01a9 BMI1,AVX2,SMEP,BMI2,RDSEED,ADX,SMAP,CLFLUSHOPT,SHA>
>   XSAVE Features=0xf
>   AMD Extended Feature Extensions ID EBX=0x7
>   SVM: (disabled in BIOS) NP,NRIP,VClean,AFlush,DAssist,NAsids=32768
>   TSC: P-state invariant, performance statistics
> real memory  = 34359738368 (32768 MB)
> avail memory = 33271214080 (31729 MB)
> Event timer "LAPIC" quality 600
> ACPI APIC Table: 
> FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs
> FreeBSD/SMP: 1 package(s) x 2 cache groups x 4 core(s)
> random: unblocking device.
> Firmware Warning (ACPI): Optional FADT field Pm2ControlBlock has valid
> Length but zero Address: 0x/0x1 (20180313/tbfadt-796)
> ioapic0  irqs 0-23 on motherboard
> ioapic1  irqs 24-55 on motherboard
> SMP: AP CPU #7 Launched!
> SMP: AP CPU #3 Launched!
> SMP: AP CPU #2 Launched!
> SMP: AP CPU #6 Launched!
> SMP: AP CPU #5 Launched!
> SMP: AP CPU #4 Launched!
> SMP: AP CPU #1 Launched!
> Timecounter "TSC-low" frequency 1497224985 Hz quality 1000
> random: entropy device external interface
> [ath_hal] loaded
> module_register_init: MOD_LOAD (vesa, 0x8109f600, 0) error 19
> random: registering fast source Intel Secure Key RNG
> random: fast provider: "Intel Secure Key RNG"
> kbd1 at kbdmux0
> netmap: loaded module
> nexus0
> vtvga0:  on motherboard
> cryptosoft0:  on motherboard
> aesni0:  on motherboard
> acpi0:  on motherboard
> acpi0: Power Button (fixed)
> cpu0:  on acpi0
> cpu1:  on acpi0
> cpu2:  on acpi0
> cpu3:  on acpi0
> cpu4:  on acpi0
> cpu5:  on acpi0
> cpu6:  on acpi0
> cpu7:  on acpi0
> attimer0:  port 0x40-0x43 irq 0 on acpi0
> Timecounter "i8254" frequency 1193182 Hz quality 0
> Event timer "i8254" frequency 1193182 Hz quality 100
> atrtc0:  port 0x70-0x71 on acpi0
> atrtc0: registered as a time-of-day clock, resolution 1.00s
> Event timer "RTC" frequency 32768 Hz quality 0
> hpet0:  iomem 0xfed0-0xfed003ff irq 0,8 on
> acpi0
> Timecounter "HPET" frequency 14318180 Hz quality 950
> Event timer "HPET" frequency 14318180 Hz quality 350
> Event timer "HPET1" frequency 14318180 Hz quality 350
> Event timer "HPET2" frequency 14318180 Hz quality 350
> Timecounter "ACPI-fast" frequency 3579545 Hz quality 900
> acpi_timer0: <32-bit timer at 3.579545MHz> port 0x808-0x80b on acpi0
> pcib0:  port 0xcf8-0xcff on acpi0
> pci0:  on pcib0
> amdsmn0:  on hostb0
> amdtemp0:  on hostb0
>
>
> On Sun, Mar 25, 2018 at 04:35:31AM +, Jonathan Looney wrote:
> > For now, you can update through r331485 and then take TCP_BLACKBOX out of
> > your kernel config file. That won’t really “fix” 

Re: 12-Current panics on boot (didn't a week ago.)

2018-03-30 Thread Andrew Reilly
Hi Jonathan, all,

I've just compiled and booted a kernel derived from current-GENERIC
but with nooptions TCP_BLACKBOX, and much to my surprise it boots.
Possible link to network-related activities is that the next line
of boot output that was not being displayed during the crash is:

[ath_hal] loaded

That's vaguely network-shaped: could it be an issue?

Please let me know if there's anything else that I could test or
poke, in order to find the real culprit.

My make.conf says:

KERNCONF=ZEN
WRKDIRPREFIX=/usr/obj/ports
MALLOC_PRODUCTION=yes

My /usr/src/sys/amd64/conf/ZEN says:

include GENERIC
nooptions TCP_BLACKBOX

Uname -a says:
FreeBSD Zen.ac-r.nu 12.0-CURRENT FreeBSD 12.0-CURRENT #0 r331768M: Sat Mar 31 
10:47:52 AEDT 2018 root@Zen:/usr/obj/usr/src/amd64.amd64/sys/ZEN  amd64

Cheers,

Andrew


Here's the top part of the new dmesg.boot, FYI:
Copyright (c) 1992-2018 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 12.0-CURRENT #0 r331768M: Sat Mar 31 10:47:52 AEDT 2018
root@Zen:/usr/obj/usr/src/amd64.amd64/sys/ZEN amd64
FreeBSD clang version 6.0.0 (tags/RELEASE_600/final 326565) (based on LLVM 
6.0.0)
WARNING: WITNESS option enabled, expect reduced performance.
VT(vga): resolution 640x480
CPU: AMD Ryzen 7 1700 Eight-Core Processor   (2994.45-MHz K8-class CPU)
  Origin="AuthenticAMD"  Id=0x800f11  Family=0x17  Model=0x1  Stepping=1
  
Features=0x178bfbff
  
Features2=0x7ed8320b
  AMD Features=0x2e500800
  AMD 
Features2=0x35c233ff
  Structured Extended 
Features=0x209c01a9
  XSAVE Features=0xf
  AMD Extended Feature Extensions ID EBX=0x7
  SVM: (disabled in BIOS) NP,NRIP,VClean,AFlush,DAssist,NAsids=32768
  TSC: P-state invariant, performance statistics
real memory  = 34359738368 (32768 MB)
avail memory = 33271214080 (31729 MB)
Event timer "LAPIC" quality 600
ACPI APIC Table: 
FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs
FreeBSD/SMP: 1 package(s) x 2 cache groups x 4 core(s)
random: unblocking device.
Firmware Warning (ACPI): Optional FADT field Pm2ControlBlock has valid Length 
but zero Address: 0x/0x1 (20180313/tbfadt-796)
ioapic0  irqs 0-23 on motherboard
ioapic1  irqs 24-55 on motherboard
SMP: AP CPU #7 Launched!
SMP: AP CPU #3 Launched!
SMP: AP CPU #2 Launched!
SMP: AP CPU #6 Launched!
SMP: AP CPU #5 Launched!
SMP: AP CPU #4 Launched!
SMP: AP CPU #1 Launched!
Timecounter "TSC-low" frequency 1497224985 Hz quality 1000
random: entropy device external interface
[ath_hal] loaded
module_register_init: MOD_LOAD (vesa, 0x8109f600, 0) error 19
random: registering fast source Intel Secure Key RNG
random: fast provider: "Intel Secure Key RNG"
kbd1 at kbdmux0
netmap: loaded module
nexus0
vtvga0:  on motherboard
cryptosoft0:  on motherboard
aesni0:  on motherboard
acpi0:  on motherboard
acpi0: Power Button (fixed)
cpu0:  on acpi0
cpu1:  on acpi0
cpu2:  on acpi0
cpu3:  on acpi0
cpu4:  on acpi0
cpu5:  on acpi0
cpu6:  on acpi0
cpu7:  on acpi0
attimer0:  port 0x40-0x43 irq 0 on acpi0
Timecounter "i8254" frequency 1193182 Hz quality 0
Event timer "i8254" frequency 1193182 Hz quality 100
atrtc0:  port 0x70-0x71 on acpi0
atrtc0: registered as a time-of-day clock, resolution 1.00s
Event timer "RTC" frequency 32768 Hz quality 0
hpet0:  iomem 0xfed0-0xfed003ff irq 0,8 on acpi0
Timecounter "HPET" frequency 14318180 Hz quality 950
Event timer "HPET" frequency 14318180 Hz quality 350
Event timer "HPET1" frequency 14318180 Hz quality 350
Event timer "HPET2" frequency 14318180 Hz quality 350
Timecounter "ACPI-fast" frequency 3579545 Hz quality 900
acpi_timer0: <32-bit timer at 3.579545MHz> port 0x808-0x80b on acpi0
pcib0:  port 0xcf8-0xcff on acpi0
pci0:  on pcib0
amdsmn0:  on hostb0
amdtemp0:  on hostb0


On Sun, Mar 25, 2018 at 04:35:31AM +, Jonathan Looney wrote:
> For now, you can update through r331485 and then take TCP_BLACKBOX out of
> your kernel config file. That won’t really “fix” anything, but should at
> least get you a booting system (assuming the new code from r331347 is
> really triggering a problem).
> 
> 
> I’ll take another look to see if I missed something in the commit. But, at
> the moment, I’m hard-pressed to see how r331347 would cause the problem you
> describe.
> 
> 
> Jonathan
> 
> On Sat, Mar 24, 2018 at 9:17 PM Andrew Reilly 
> 

Re: 12-Current panics on boot (didn't a week ago.)

2018-03-25 Thread Jonathan Looney
For now, you can update through r331485 and then take TCP_BLACKBOX out of
your kernel config file. That won’t really “fix” anything, but should at
least get you a booting system (assuming the new code from r331347 is
really triggering a problem).


I’ll take another look to see if I missed something in the commit. But, at
the moment, I’m hard-pressed to see how r331347 would cause the problem you
describe.


Jonathan

On Sat, Mar 24, 2018 at 9:17 PM Andrew Reilly 
wrote:

> OK, I've completed the search: r331346 works, r331347 panics
> somewhere in the initialization of random.
>
> In the 331347 change (Add the "TCP Blackbox Recorder") I can't see
> anything obvious to tweak, unfortunately.  It's a fair chunk of new
> code but it's all network-stack related, and my kernel is panicking
> long before any network activity happens.
>
> Any suggestions?
>
> Cheers,
>
> Andrew
>
> On Sat, Mar 24, 2018 at 05:23:18PM -0600, Warner Losh wrote:
> > Thanks Andrew... I can't recreate this on my VM nor my real hardware.
> >
> > Warner
> >
> > On Sat, Mar 24, 2018 at 5:22 PM, Andrew Reilly 
> > wrote:
> >
> > > So, r331464 crashes in the same place, on my system.  r331064 still
> boots
> > > OK.  I'll keep searching.
> > >
> > > One week ago there was a change to randomdev to poll for signals every
> so
> > > often, as a defence against very large reads.  That wouldn't have
> > > introduced a race somewhere,
> > > or left things in an unexpected state, perhaps?  That change (r331070)
> by
> > > cem@ is just a few revisions after the one that is working for me.
> I'll
> > > start looking there...
> > >
> > > Cheers,
> > >
> > > Andrew
> > >
> > > On Sun, Mar 25, 2018 at 07:49:17AM +1100, Andrew Reilly wrote:
> > > > Hi Warner,
> > > >
> > > > The breakage was in 331470,  and at least one version earlier, that I
> > > updated past when it panicked.
> > > >
> > > > I'm guessing that kdb's inability to dump would be down to it not
> having
> > > found any disk devices yet, right?  So yes, bisecting to narrow down
> the
> > > issue is probably the best bet.  I'll try your r331464: if that works
> that
> > > leaves only four or five revisions.  Of course the breakage could be
> > > hardware specific.
> > > >
> > > > Cheers,
> > > > --
> > > > Andrew
> > >
>
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: 12-Current panics on boot (didn't a week ago.)

2018-03-25 Thread Herbert J. Skuhra
On Sun, 25 Mar 2018 05:21:10 +0200, Andrew Reilly wrote:
> 
> OK, I've completed the search: r331346 works, r331347 panics
> somewhere in the initialization of random.
> 
> In the 331347 change (Add the "TCP Blackbox Recorder") I can't see
> anything obvious to tweak, unfortunately.  It's a fair chunk of new
> code but it's all network-stack related, and my kernel is panicking
> long before any network activity happens.
> 
> Any suggestions?

Does your system boot if you upgrade to at least r331485 and remove
"options TCP_BLACKBOX" from sys/amd64/conf/GENERIC (if you build and
run GENERIC)?

--
Herbert
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: 12-Current panics on boot (didn't a week ago.)

2018-03-24 Thread Andrew Reilly
OK, I've completed the search: r331346 works, r331347 panics
somewhere in the initialization of random.

In the 331347 change (Add the "TCP Blackbox Recorder") I can't see
anything obvious to tweak, unfortunately.  It's a fair chunk of new
code but it's all network-stack related, and my kernel is panicking
long before any network activity happens.

Any suggestions?

Cheers,

Andrew

On Sat, Mar 24, 2018 at 05:23:18PM -0600, Warner Losh wrote:
> Thanks Andrew... I can't recreate this on my VM nor my real hardware.
> 
> Warner
> 
> On Sat, Mar 24, 2018 at 5:22 PM, Andrew Reilly 
> wrote:
> 
> > So, r331464 crashes in the same place, on my system.  r331064 still boots
> > OK.  I'll keep searching.
> >
> > One week ago there was a change to randomdev to poll for signals every so
> > often, as a defence against very large reads.  That wouldn't have
> > introduced a race somewhere,
> > or left things in an unexpected state, perhaps?  That change (r331070) by
> > cem@ is just a few revisions after the one that is working for me.  I'll
> > start looking there...
> >
> > Cheers,
> >
> > Andrew
> >
> > On Sun, Mar 25, 2018 at 07:49:17AM +1100, Andrew Reilly wrote:
> > > Hi Warner,
> > >
> > > The breakage was in 331470,  and at least one version earlier, that I
> > updated past when it panicked.
> > >
> > > I'm guessing that kdb's inability to dump would be down to it not having
> > found any disk devices yet, right?  So yes, bisecting to narrow down the
> > issue is probably the best bet.  I'll try your r331464: if that works that
> > leaves only four or five revisions.  Of course the breakage could be
> > hardware specific.
> > >
> > > Cheers,
> > > --
> > > Andrew
> >
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: 12-Current panics on boot (didn't a week ago.)

2018-03-24 Thread Andrew Reilly
So, r331464 crashes in the same place, on my system.  r331064 still boots OK.  
I'll keep searching.

One week ago there was a change to randomdev to poll for signals every so 
often, as a defence against very large reads.  That wouldn't have introduced a 
race somewhere,
or left things in an unexpected state, perhaps?  That change (r331070) by cem@ 
is just a few revisions after the one that is working for me.  I'll start 
looking there...

Cheers,

Andrew

On Sun, Mar 25, 2018 at 07:49:17AM +1100, Andrew Reilly wrote:
> Hi Warner,
> 
> The breakage was in 331470,  and at least one version earlier, that I updated 
> past when it panicked.
> 
> I'm guessing that kdb's inability to dump would be down to it not having 
> found any disk devices yet, right?  So yes, bisecting to narrow down the 
> issue is probably the best bet.  I'll try your r331464: if that works that 
> leaves only four or five revisions.  Of course the breakage could be hardware 
> specific.
> 
> Cheers,
> -- 
> Andrew
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: 12-Current panics on boot (didn't a week ago.)

2018-03-24 Thread Warner Losh
Thanks Andrew... I can't recreate this on my VM nor my real hardware.

Warner

On Sat, Mar 24, 2018 at 5:22 PM, Andrew Reilly 
wrote:

> So, r331464 crashes in the same place, on my system.  r331064 still boots
> OK.  I'll keep searching.
>
> One week ago there was a change to randomdev to poll for signals every so
> often, as a defence against very large reads.  That wouldn't have
> introduced a race somewhere,
> or left things in an unexpected state, perhaps?  That change (r331070) by
> cem@ is just a few revisions after the one that is working for me.  I'll
> start looking there...
>
> Cheers,
>
> Andrew
>
> On Sun, Mar 25, 2018 at 07:49:17AM +1100, Andrew Reilly wrote:
> > Hi Warner,
> >
> > The breakage was in 331470,  and at least one version earlier, that I
> updated past when it panicked.
> >
> > I'm guessing that kdb's inability to dump would be down to it not having
> found any disk devices yet, right?  So yes, bisecting to narrow down the
> issue is probably the best bet.  I'll try your r331464: if that works that
> leaves only four or five revisions.  Of course the breakage could be
> hardware specific.
> >
> > Cheers,
> > --
> > Andrew
>
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: 12-Current panics on boot (didn't a week ago.)

2018-03-24 Thread Andrew Reilly
Hi Warner,

The breakage was in 331470,  and at least one version earlier, that I updated 
past when it panicked.

I'm guessing that kdb's inability to dump would be down to it not having found 
any disk devices yet, right?  So yes, bisecting to narrow down the issue is 
probably the best bet.  I'll try your r331464: if that works that leaves only 
four or five revisions.  Of course the breakage could be hardware specific.

Cheers,
-- 
Andrew

On 25 March 2018 1:14:40 am AEDT, Warner Losh  wrote:
>Also, what rev failed? I booted r331464 last night w/o issue.
>
>Warner
>
>On Fri, Mar 23, 2018 at 9:56 PM, Andrew Reilly 
>wrote:
>
>> Hi all,
>>
>> For reasons that still escape me, I haven't been able to get a kernel
>dump
>> to debug, sorry.
>>
>> Just thought that I'd generate a fairly low-quality report, to see if
>> anyone has some ideas.
>>
>> The last kernel that I have that booted OK (and I'm now running) is:
>> FreeBSD Zen.ac-r.nu 12.0-CURRENT FreeBSD 12.0-CURRENT #1 r331064M:
>Sat
>> Mar 17 07:54:51 AEDT 2018
>root@Zen:/usr/obj/usr/src/amd64.amd64/sys/GENERIC
>> amd64
>>
>> The machine is a:
>> CPU: AMD Ryzen 7 1700 Eight-Core Processor   (2994.46-MHz
>K8-class
>> CPU)
>>   Origin="AuthenticAMD"  Id=0x800f11  Family=0x17  Model=0x1 
>Stepping=1
>>   Features=0x178bfbff> APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT>
>>
>> Kernels built from head as of a couple of hours ago get through
>launching
>> the other CPUs and then stops somewhere in random, apparently:
>>
>> SMP: AP CPU #2 Launched!
>> Timecounter "TSC-low" frequency 1497223020 Hz quality 1000
>> random: entpanic: mtx_lock() of spin mutex (null) @
>> /usr/src/sys/kern/subr_bus.c:617
>> cpuid = 0
>> time = 1
>> KDB: stack backtrace:
>> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
>> 0xfe4507a0
>> vpanic() at vpanic+0x18d/frame 0xfe450800
>> doadump () at doadump/frame 0xfe450880
>> __mtx_lock_flags() at __mtx_lock_flags+0x163/frame 0xfe4508d0
>> devctl_queue_data_f() at devctl_queue_data_f+0x6a/frame
>0xfe450900
>> g_dev_taste() at g_dev_taste+0x370/frame 0xfe450a10
>> g_new_provider_event() at g_new_provider_event+0xfa/frame
>> 0xfe450a30
>> g_run_events() at g_run_events+0x151/frame 0xfe450a70
>> fork_exit() at fork_exit+0x84/frame 0xfe450ab0
>> fork_trampoline() at fork_trampoline+0xe/frame 0xfe450ab0
>> --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
>> KDB: enter: panic
>> [ thread pid 14 tid 100052 ]
>> Stopped at kdb_enter+0x3b: movq$0,kdb_why
>> db> dump
>> Cannot dump: no dump device specified.
>> db>
>>
>> Now dumping worked fine the last time the kernel panicked: I have
>> dumpdev=AUTO in rc.conf and I have swap on nvd0p3 (first) and
>> /dev/zvol/root/swap
>> (second, larger than the first.)
>>
>> Root on the nvd0p2 is ZFS, and ther's a four-drive raidZ with user
>> directories and what-not on them, and another ZFS on an external USB
>drive
>> that I use
>> for backups, unmounted.
>>
>> In the new kernels, we clearly aren't even getting as far as finding
>the
>> hubs and controllers, let alone the drives.
>>
>> I've attached dmesg.boot from the last boot from last week's good
>kernel.
>> (While briefly in yoyo mode I turned the SMT back on, so now there
>are 16
>> cores
>> instead of the eight mentioned in the crash dump.  Didn't help, but I
>> haven't turned it back off yet.)
>>
>> Cheers,
>>
>> Andrew
>>
>>
>> ___
>> freebsd-current@freebsd.org mailing list
>> https://lists.freebsd.org/mailman/listinfo/freebsd-current
>> To unsubscribe, send any mail to
>"freebsd-current-unsubscr...@freebsd.org"
>>
>>
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: 12-Current panics on boot (didn't a week ago.)

2018-03-24 Thread Warner Losh
Also, what rev failed? I booted r331464 last night w/o issue.

Warner

On Fri, Mar 23, 2018 at 9:56 PM, Andrew Reilly 
wrote:

> Hi all,
>
> For reasons that still escape me, I haven't been able to get a kernel dump
> to debug, sorry.
>
> Just thought that I'd generate a fairly low-quality report, to see if
> anyone has some ideas.
>
> The last kernel that I have that booted OK (and I'm now running) is:
> FreeBSD Zen.ac-r.nu 12.0-CURRENT FreeBSD 12.0-CURRENT #1 r331064M: Sat
> Mar 17 07:54:51 AEDT 2018 
> root@Zen:/usr/obj/usr/src/amd64.amd64/sys/GENERIC
> amd64
>
> The machine is a:
> CPU: AMD Ryzen 7 1700 Eight-Core Processor   (2994.46-MHz K8-class
> CPU)
>   Origin="AuthenticAMD"  Id=0x800f11  Family=0x17  Model=0x1  Stepping=1
>   Features=0x178bfbff APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT>
>
> Kernels built from head as of a couple of hours ago get through launching
> the other CPUs and then stops somewhere in random, apparently:
>
> SMP: AP CPU #2 Launched!
> Timecounter "TSC-low" frequency 1497223020 Hz quality 1000
> random: entpanic: mtx_lock() of spin mutex (null) @
> /usr/src/sys/kern/subr_bus.c:617
> cpuid = 0
> time = 1
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
> 0xfe4507a0
> vpanic() at vpanic+0x18d/frame 0xfe450800
> doadump () at doadump/frame 0xfe450880
> __mtx_lock_flags() at __mtx_lock_flags+0x163/frame 0xfe4508d0
> devctl_queue_data_f() at devctl_queue_data_f+0x6a/frame 0xfe450900
> g_dev_taste() at g_dev_taste+0x370/frame 0xfe450a10
> g_new_provider_event() at g_new_provider_event+0xfa/frame
> 0xfe450a30
> g_run_events() at g_run_events+0x151/frame 0xfe450a70
> fork_exit() at fork_exit+0x84/frame 0xfe450ab0
> fork_trampoline() at fork_trampoline+0xe/frame 0xfe450ab0
> --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
> KDB: enter: panic
> [ thread pid 14 tid 100052 ]
> Stopped at kdb_enter+0x3b: movq$0,kdb_why
> db> dump
> Cannot dump: no dump device specified.
> db>
>
> Now dumping worked fine the last time the kernel panicked: I have
> dumpdev=AUTO in rc.conf and I have swap on nvd0p3 (first) and
> /dev/zvol/root/swap
> (second, larger than the first.)
>
> Root on the nvd0p2 is ZFS, and ther's a four-drive raidZ with user
> directories and what-not on them, and another ZFS on an external USB drive
> that I use
> for backups, unmounted.
>
> In the new kernels, we clearly aren't even getting as far as finding the
> hubs and controllers, let alone the drives.
>
> I've attached dmesg.boot from the last boot from last week's good kernel.
> (While briefly in yoyo mode I turned the SMT back on, so now there are 16
> cores
> instead of the eight mentioned in the crash dump.  Didn't help, but I
> haven't turned it back off yet.)
>
> Cheers,
>
> Andrew
>
>
> ___
> freebsd-current@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
>
>
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: 12-Current panics on boot (didn't a week ago.)

2018-03-24 Thread Warner Losh
That lock has been there for a long, long time  (like 5 or 6 major
releases)... It's surprising that it's causing issues now.

Can you bisect versions to find when this starts happening?

Warner

On Fri, Mar 23, 2018 at 9:56 PM, Andrew Reilly 
wrote:

> Hi all,
>
> For reasons that still escape me, I haven't been able to get a kernel dump
> to debug, sorry.
>
> Just thought that I'd generate a fairly low-quality report, to see if
> anyone has some ideas.
>
> The last kernel that I have that booted OK (and I'm now running) is:
> FreeBSD Zen.ac-r.nu 12.0-CURRENT FreeBSD 12.0-CURRENT #1 r331064M: Sat
> Mar 17 07:54:51 AEDT 2018 
> root@Zen:/usr/obj/usr/src/amd64.amd64/sys/GENERIC
> amd64
>
> The machine is a:
> CPU: AMD Ryzen 7 1700 Eight-Core Processor   (2994.46-MHz K8-class
> CPU)
>   Origin="AuthenticAMD"  Id=0x800f11  Family=0x17  Model=0x1  Stepping=1
>   Features=0x178bfbff APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT>
>
> Kernels built from head as of a couple of hours ago get through launching
> the other CPUs and then stops somewhere in random, apparently:
>
> SMP: AP CPU #2 Launched!
> Timecounter "TSC-low" frequency 1497223020 Hz quality 1000
> random: entpanic: mtx_lock() of spin mutex (null) @
> /usr/src/sys/kern/subr_bus.c:617
> cpuid = 0
> time = 1
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
> 0xfe4507a0
> vpanic() at vpanic+0x18d/frame 0xfe450800
> doadump () at doadump/frame 0xfe450880
> __mtx_lock_flags() at __mtx_lock_flags+0x163/frame 0xfe4508d0
> devctl_queue_data_f() at devctl_queue_data_f+0x6a/frame 0xfe450900
> g_dev_taste() at g_dev_taste+0x370/frame 0xfe450a10
> g_new_provider_event() at g_new_provider_event+0xfa/frame
> 0xfe450a30
> g_run_events() at g_run_events+0x151/frame 0xfe450a70
> fork_exit() at fork_exit+0x84/frame 0xfe450ab0
> fork_trampoline() at fork_trampoline+0xe/frame 0xfe450ab0
> --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
> KDB: enter: panic
> [ thread pid 14 tid 100052 ]
> Stopped at kdb_enter+0x3b: movq$0,kdb_why
> db> dump
> Cannot dump: no dump device specified.
> db>
>
> Now dumping worked fine the last time the kernel panicked: I have
> dumpdev=AUTO in rc.conf and I have swap on nvd0p3 (first) and
> /dev/zvol/root/swap
> (second, larger than the first.)
>
> Root on the nvd0p2 is ZFS, and ther's a four-drive raidZ with user
> directories and what-not on them, and another ZFS on an external USB drive
> that I use
> for backups, unmounted.
>
> In the new kernels, we clearly aren't even getting as far as finding the
> hubs and controllers, let alone the drives.
>
> I've attached dmesg.boot from the last boot from last week's good kernel.
> (While briefly in yoyo mode I turned the SMT back on, so now there are 16
> cores
> instead of the eight mentioned in the crash dump.  Didn't help, but I
> haven't turned it back off yet.)
>
> Cheers,
>
> Andrew
>
>
> ___
> freebsd-current@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
>
>
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


12-Current panics on boot (didn't a week ago.)

2018-03-24 Thread Andrew Reilly
Hi all,

For reasons that still escape me, I haven't been able to get a kernel dump to 
debug, sorry.

Just thought that I'd generate a fairly low-quality report, to see if anyone 
has some ideas.

The last kernel that I have that booted OK (and I'm now running) is:
FreeBSD Zen.ac-r.nu 12.0-CURRENT FreeBSD 12.0-CURRENT #1 r331064M: Sat Mar 17 
07:54:51 AEDT 2018 root@Zen:/usr/obj/usr/src/amd64.amd64/sys/GENERIC  amd64

The machine is a:
CPU: AMD Ryzen 7 1700 Eight-Core Processor   (2994.46-MHz K8-class CPU)
  Origin="AuthenticAMD"  Id=0x800f11  Family=0x17  Model=0x1  Stepping=1
  
Features=0x178bfbff

Kernels built from head as of a couple of hours ago get through launching the 
other CPUs and then stops somewhere in random, apparently:

SMP: AP CPU #2 Launched!
Timecounter "TSC-low" frequency 1497223020 Hz quality 1000
random: entpanic: mtx_lock() of spin mutex (null) @ 
/usr/src/sys/kern/subr_bus.c:617
cpuid = 0
time = 1
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe4507a0
vpanic() at vpanic+0x18d/frame 0xfe450800
doadump () at doadump/frame 0xfe450880
__mtx_lock_flags() at __mtx_lock_flags+0x163/frame 0xfe4508d0
devctl_queue_data_f() at devctl_queue_data_f+0x6a/frame 0xfe450900
g_dev_taste() at g_dev_taste+0x370/frame 0xfe450a10
g_new_provider_event() at g_new_provider_event+0xfa/frame 0xfe450a30
g_run_events() at g_run_events+0x151/frame 0xfe450a70
fork_exit() at fork_exit+0x84/frame 0xfe450ab0
fork_trampoline() at fork_trampoline+0xe/frame 0xfe450ab0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic
[ thread pid 14 tid 100052 ]
Stopped at kdb_enter+0x3b: movq$0,kdb_why
db> dump
Cannot dump: no dump device specified.
db> 

Now dumping worked fine the last time the kernel panicked: I have dumpdev=AUTO 
in rc.conf and I have swap on nvd0p3 (first) and /dev/zvol/root/swap
(second, larger than the first.)

Root on the nvd0p2 is ZFS, and ther's a four-drive raidZ with user directories 
and what-not on them, and another ZFS on an external USB drive that I use
for backups, unmounted.

In the new kernels, we clearly aren't even getting as far as finding the hubs 
and controllers, let alone the drives.

I've attached dmesg.boot from the last boot from last week's good kernel.  
(While briefly in yoyo mode I turned the SMT back on, so now there are 16 cores
instead of the eight mentioned in the crash dump.  Didn't help, but I haven't 
turned it back off yet.)

Cheers,

Andrew

Copyright (c) 1992-2018 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 12.0-CURRENT #1 r331064M: Sat Mar 17 07:54:51 AEDT 2018
root@Zen:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64
FreeBSD clang version 6.0.0 (tags/RELEASE_600/final 326565) (based on LLVM 
6.0.0)
WARNING: WITNESS option enabled, expect reduced performance.
VT(vga): resolution 640x480
CPU: AMD Ryzen 7 1700 Eight-Core Processor   (2994.46-MHz K8-class CPU)
  Origin="AuthenticAMD"  Id=0x800f11  Family=0x17  Model=0x1  Stepping=1
  
Features=0x178bfbff
  
Features2=0x7ed8320b
  AMD Features=0x2e500800
  AMD 
Features2=0x35c233ff
  Structured Extended 
Features=0x209c01a9
  XSAVE Features=0xf
  AMD Extended Feature Extensions ID EBX=0x7
  SVM: (disabled in BIOS) NP,NRIP,VClean,AFlush,DAssist,NAsids=32768
  TSC: P-state invariant, performance statistics
real memory  = 34359738368 (32768 MB)
avail memory = 33272578048 (31731 MB)
Event timer "LAPIC" quality 600
ACPI APIC Table: 
FreeBSD/SMP: Multiprocessor System Detected: 16 CPUs
FreeBSD/SMP: 1 package(s) x 2 cache groups x 4 core(s) x 2 hardware threads
random: unblocking device.
Firmware Warning (ACPI): Optional FADT field Pm2ControlBlock has valid Length 
but zero Address: 0x/0x1 (20180313/tbfadt-796)
ioapic0: Changing APIC ID to 17
ioapic1: Changing APIC ID to 18
ioapic0  irqs 0-23 on motherboard
ioapic1  irqs 24-55 on motherboard
SMP: AP CPU #12 Launched!
SMP: AP CPU #5 Launched!
SMP: AP CPU #9 Launched!
SMP: AP CPU #13 Launched!
SMP: AP CPU #3 Launched!
SMP: AP CPU #1 Launched!
SMP: AP CPU #2 Launched!
SMP: AP CPU #8 Launched!
SMP: AP CPU #15 Launched!
SMP: AP CPU #4 Launched!
SMP: AP CPU #7 Launched!
SMP: