Re: > r353680: multiuser crash due to: m_getzone: Inavlid cluster size 0
I discovered a similar kernel panic. To reproduce, just run CURRENT in bhyve with e1000 network backend. I think the problem is that the debugnet_any_ifnet_update () function calls iflib_debugnet_init () when the private driver data is not yet fully initialized. sys/net/iflib.c: 6724iflib_debugnet_init(if_t ifp, int *nrxr, int *ncl, int *clsize) 6725{ 6726if_ctx_t ctx; 6727 6728ctx = if_getsoftc(ifp); 6729CTX_LOCK(ctx); 6730*nrxr = NRXQSETS(ctx); 6731*ncl = ctx->ifc_rxqs[0].ifr_fl->ifl_size; 6732*clsize = ctx->ifc_rxqs[0].ifr_fl->ifl_buf_size; -- ifl_buf_size is equal zero!!! 6733CTX_UNLOCK(ctx); 6734} So, it seems that ifnet_link_event EVENTHANDLER is too early to initialize debugnet. Because ifl_buf_size is initialized with ctx-> ifc_rx_mbuf_sz, which is initialized with iflib_calc_rx_mbuf_sz (), I use the following patch, as a workaround: diff --git a/sys/net/iflib.c b/sys/net/iflib.c index 73606981a492..1caf3505932a 100644 --- a/sys/net/iflib.c +++ b/sys/net/iflib.c @@ -6729,7 +6729,8 @@ iflib_debugnet_init(if_t ifp, int *nrxr, int *ncl, int *clsize) CTX_LOCK(ctx); *nrxr = NRXQSETS(ctx); *ncl = ctx->ifc_rxqs[0].ifr_fl->ifl_size; - *clsize = ctx->ifc_rxqs[0].ifr_fl->ifl_buf_size; + iflib_calc_rx_mbuf_sz(ctx); + *clsize = iflib_get_rx_mbuf_sz(ctx); CTX_UNLOCK(ctx); } em0: port 0x2000-0x2007 mem 0xc000-0xc001,0xc002-0xc002 irq 16 at device 2.0 on pci0 em0: Using 1024 TX descriptors and 1024 RX descriptors em0: Ethernet address: 00:a0:98:b9:5c:99 em0: netmap queues/slots: TX 1/1024, RX 1/1024 virtio_pci0: port 0x2040-0x207f mem 0xc003-0xc0031fff irq 17 at device 3.0 on pci0 vtblk0: on virtio_pci0 vtblk0: 16384MB (33554432 512 byte sectors) atkbdc0: port 0x60,0x64 irq 1 on acpi0 atkbd0: irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] driver bug: Unable to set devclass (class: atkbdc devname: (unknown)) Unhandled ps2 mouse command 0xe1 psm0: irq 12 on atkbdc0 psm0: [GIANT-LOCKED] psm0: model Generic PS/2 mouse, device ID 0 uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 uart0: console (9600,n,8,1) uart1: <16550 or compatible> port 0x2f8-0x2ff irq 3 on acpi0 vga0: at port 0x3b0-0x3bb iomem 0xb-0xb7fff pnpid PNP0900 on isa0 Timecounters tick every 10.000 msec usb_needs_explore_all: no devclass em0: link state changed to UP panic: m_getzone: invalid cluster size 0 cpuid = 0 time = 1 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe0011b8d7f0 vpanic() at vpanic+0x17e/frame 0xfe0011b8d850 panic() at panic+0x43/frame 0xfe0011b8d8b0 debugnet_mbuf_reinit() at debugnet_mbuf_reinit+0x21b/frame 0xfe0011b8d8f0 debugnet_any_ifnet_update() at debugnet_any_ifnet_update+0x107/frame 0xfe0011b8d940 do_link_state_change() at do_link_state_change+0x1b3/frame 0xfe0011b8d990 taskqueue_run_locked() at taskqueue_run_locked+0x10c/frame 0xfe0011b8d9f0 taskqueue_run() at taskqueue_run+0x4a/frame 0xfe0011b8da10 ithread_loop() at ithread_loop+0x1c6/frame 0xfe0011b8da70 fork_exit() at fork_exit+0x80/frame 0xfe0011b8dab0 fork_trampoline() at fork_trampoline+0xe/frame 0xfe0011b8dab0 --- trap 0, rip = 0, rsp = 0, rbp = 0 --- KDB: enter: panic [ thread pid 12 tid 100010 ] Stopped at kdb_enter+0x37: movq$0,0x1098a86(%rip) db> ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
> r353680: multiuser crash due to: m_getzone: Inavlid cluster size 0
The last known good update of CURRENT on a Fujitsu Primergy RX2530-M5 (only one of two sockets equipted, 64 GB RAM) was October, 17th, 2019 before 15 o'clock, I suppose that was r353680 that time. Today's update to r353881 resulted in an immediate crash when the network (igb0-igb3, two built-in i350 NICs and two i350 NICs placed on a i350-T2 server adapter) comes up, just when rc scripts configure the NIC's. Last message I see is something like m_getzone: Inavlid cluster size 0 and "dubugnet" or similar. Since the crash wrecked the installation (it seems after updating, the UFS filesystem received, as so often, inconsistencies, so I can not start vi or other applications after a full fsck -yf on all partitons, those programs fail with some serious trap, stating that ELF is corrupt, I can't remember the exact message). We do not have debugging facilities enabled on that kernel suite, so I can not provide more proper informations. For emergency rescue we downloaded the latest CURRENT memstick image, FreeBSD-13.0-CURRENT-amd64-20191018-r353709-memstick.img dated Oct., 18th, which also shows the bug described above. It seems that I have to go back to memimage FreeBSD-13.0-CURRENT-amd64-20191011-r353427-memstick.img which dates to 11th October 2019. Since the crash resulted in a serious damage of the base filesystem and the installation, I need to copy first the installation tarballs from the install memstick into place and try then to rebuild the system with sources up to the version which is deemed working. The I'll report, hopefully, more information. Kind regards, oh ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: > r353680: multiuser crash due to: m_getzone: Inavlid cluster size 0
This is 241403 in bugzilla. On Wed, Oct 23, 2019, 12:57 AM O. Hartmann wrote: > The last known good update of CURRENT on a Fujitsu Primergy RX2530-M5 > (only one > of two sockets equipted, 64 GB RAM) was October, 17th, 2019 before 15 > o'clock, > I suppose that was r353680 that time. Today's update to r353881 resulted > in an > immediate crash when the network (igb0-igb3, two built-in i350 NICs and two > i350 NICs placed on a i350-T2 server adapter) comes up, just when rc > scripts > configure the NIC's. > > Last message I see is something like m_getzone: Inavlid cluster size 0 and > "dubugnet" or similar. Since the crash wrecked the installation (it seems > after > updating, the UFS filesystem received, as so often, inconsistencies, so I > can > not start vi or other applications after a full fsck -yf on all partitons, > those programs fail with some serious trap, stating that ELF is corrupt, I > can't remember the exact message). We do not have debugging facilities > enabled > on that kernel suite, so I can not provide more proper informations. > > For emergency rescue we downloaded the latest CURRENT memstick image, > FreeBSD-13.0-CURRENT-amd64-20191018-r353709-memstick.img dated Oct., 18th, > which > also shows the bug described above. > > It seems that I have to go back to memimage > FreeBSD-13.0-CURRENT-amd64-20191011-r353427-memstick.img which dates to > 11th > October 2019. > Since the crash resulted in a serious damage of the base filesystem and the > installation, I need to copy first the installation tarballs from the > install > memstick into place and try then to rebuild the system with sources up to > the > version which is deemed working. The I'll report, hopefully, more > information. > > Kind regards, > oh > > Addendum: > > r353680 works > r353709 doesn't work > > > [...] > /etc # more /var/crash/info.last > Dump header from device: /dev/da0p2 > Architecture: amd64 > Architecture Version: 2 > Dump Length: 2952835072 > Blocksize: 512 > Compression: none > Dumptime: Tue Oct 22 12:13:19 2019 > Hostname: wotan.lan101.bundesimmobilien.intern > Magic: FreeBSD Kernel Dump > Version String: FreeBSD 13.0-CURRENT #11 r353877: Tue Oct 22 11:02:32 > CEST > 2019 root@:/usr/obj/usr/src/amd64.amd64/sys/WOTAN > Panic String: m_getzone: invalid cluster size 0 > Dump Parity: 2027469319 > Bounds: 0 > Dump Status: good > [...] > > [...] > > # more /var/crash/core.txt.0 > /dev/stdin:1: Error in sourced command file: > Cannot access memory at address 0x65657246 > /dev/stdin:1: Error in sourced command file: > Cannot access memory at address 0x65657246 > /dev/stdin:1: Error in sourced command file: > Cannot access memory at address 0x65657246 > /dev/stdin:1: Error in sourced command file: > Cannot access memory at address 0x65657246 > /dev/stdin:1: Error in sourced command file: > Cannot access memory at address 0x65657246 > > > [...] > > > [...] > ---<>--- > Copyright (c) 1992-2019 The FreeBSD Project. > Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 > The Regents of the University of California. All rights reserved. > FreeBSD is a registered trademark of The FreeBSD Foundation. > FreeBSD 13.0-CURRENT #14 r353680: Wed Oct 23 08:50:04 CEST 2019 > root@wotan.lan101.bundesimmobilien.intern > :/usr/obj/usr/src/amd64.amd64/sys/WOTAN > amd64 FreeBSD clang version 9.0.0 (tags/RELEASE_900/final 372316) (based on > LLVM 9.0.0) VT(efifb): resolution 1280x1024 > CPU microcode: no matching update found > CPU: Intel(R) Xeon(R) Gold 5217 CPU @ 3.00GHz (2993.05-MHz K8-class CPU) > Origin="GenuineIntel" Id=0x50657 Family=0x6 Model=0x55 Stepping=7 > > Features=0xbfebfbff > > Features2=0x7ffefbff > AMD Features=0x2c100800 > AMD Features2=0x121 > Structured Extended > > Features=0xd39b > Structured Extended Features2=0x808 Structured Extended > Features3=0xbc000400 XSAVE > Features=0xf > IA32_ARCH_CAPS=0x2b VT-x: > PAT,HLT,MTF,PAUSE,EPT,UG,VPID,VID,PostIntr TSC: P-state invariant, > performance > statistics real memory = 68719476736 (65536 MB) > avail memory = 66361274368 (63287 MB) > Event timer "LAPIC" quality 600 > ACPI APIC Table: > FreeBSD/SMP: Multiprocessor System Detected: 16 CPUs > FreeBSD/SMP: 1 package(s) x 8 core(s) x 2 hardware threads > random: registering fast source Intel Secure Key RNG > random: fast provider: "Intel Secure Key RNG" > random: unblocking device. > Security policy loaded: MAC/ntpd (mac_ntpd) > ioapic0 irqs 0-23 > ioapic1 irqs 24-31 > ioapic2 irqs 32-39 > ioapic3 irqs 40-47 > ioapic4 irqs 48-55 > Launching APs: 1 13 5 12 9 14 8 7 10 6 11 15 3 4 2 > Timecounter "TSC-low" frequency 1496523352 Hz quality 1000 > random: entropy device external interface > kbd0 at kbdmux0 > [...] > ___ > freebsd-current@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" >
> r353680: multiuser crash due to: m_getzone: Inavlid cluster size 0
The last known good update of CURRENT on a Fujitsu Primergy RX2530-M5 (only one of two sockets equipted, 64 GB RAM) was October, 17th, 2019 before 15 o'clock, I suppose that was r353680 that time. Today's update to r353881 resulted in an immediate crash when the network (igb0-igb3, two built-in i350 NICs and two i350 NICs placed on a i350-T2 server adapter) comes up, just when rc scripts configure the NIC's. Last message I see is something like m_getzone: Inavlid cluster size 0 and "dubugnet" or similar. Since the crash wrecked the installation (it seems after updating, the UFS filesystem received, as so often, inconsistencies, so I can not start vi or other applications after a full fsck -yf on all partitons, those programs fail with some serious trap, stating that ELF is corrupt, I can't remember the exact message). We do not have debugging facilities enabled on that kernel suite, so I can not provide more proper informations. For emergency rescue we downloaded the latest CURRENT memstick image, FreeBSD-13.0-CURRENT-amd64-20191018-r353709-memstick.img dated Oct., 18th, which also shows the bug described above. It seems that I have to go back to memimage FreeBSD-13.0-CURRENT-amd64-20191011-r353427-memstick.img which dates to 11th October 2019. Since the crash resulted in a serious damage of the base filesystem and the installation, I need to copy first the installation tarballs from the install memstick into place and try then to rebuild the system with sources up to the version which is deemed working. The I'll report, hopefully, more information. Kind regards, oh Addendum: r353680 works r353709 doesn't work [...] /etc # more /var/crash/info.last Dump header from device: /dev/da0p2 Architecture: amd64 Architecture Version: 2 Dump Length: 2952835072 Blocksize: 512 Compression: none Dumptime: Tue Oct 22 12:13:19 2019 Hostname: wotan.lan101.bundesimmobilien.intern Magic: FreeBSD Kernel Dump Version String: FreeBSD 13.0-CURRENT #11 r353877: Tue Oct 22 11:02:32 CEST 2019 root@:/usr/obj/usr/src/amd64.amd64/sys/WOTAN Panic String: m_getzone: invalid cluster size 0 Dump Parity: 2027469319 Bounds: 0 Dump Status: good [...] [...] # more /var/crash/core.txt.0 /dev/stdin:1: Error in sourced command file: Cannot access memory at address 0x65657246 /dev/stdin:1: Error in sourced command file: Cannot access memory at address 0x65657246 /dev/stdin:1: Error in sourced command file: Cannot access memory at address 0x65657246 /dev/stdin:1: Error in sourced command file: Cannot access memory at address 0x65657246 /dev/stdin:1: Error in sourced command file: Cannot access memory at address 0x65657246 [...] [...] ---<>--- Copyright (c) 1992-2019 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 13.0-CURRENT #14 r353680: Wed Oct 23 08:50:04 CEST 2019 root@wotan.lan101.bundesimmobilien.intern:/usr/obj/usr/src/amd64.amd64/sys/WOTAN amd64 FreeBSD clang version 9.0.0 (tags/RELEASE_900/final 372316) (based on LLVM 9.0.0) VT(efifb): resolution 1280x1024 CPU microcode: no matching update found CPU: Intel(R) Xeon(R) Gold 5217 CPU @ 3.00GHz (2993.05-MHz K8-class CPU) Origin="GenuineIntel" Id=0x50657 Family=0x6 Model=0x55 Stepping=7 Features=0xbfebfbff Features2=0x7ffefbff AMD Features=0x2c100800 AMD Features2=0x121 Structured Extended Features=0xd39b Structured Extended Features2=0x808 Structured Extended Features3=0xbc000400 XSAVE Features=0xf IA32_ARCH_CAPS=0x2b VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID,VID,PostIntr TSC: P-state invariant, performance statistics real memory = 68719476736 (65536 MB) avail memory = 66361274368 (63287 MB) Event timer "LAPIC" quality 600 ACPI APIC Table: FreeBSD/SMP: Multiprocessor System Detected: 16 CPUs FreeBSD/SMP: 1 package(s) x 8 core(s) x 2 hardware threads random: registering fast source Intel Secure Key RNG random: fast provider: "Intel Secure Key RNG" random: unblocking device. Security policy loaded: MAC/ntpd (mac_ntpd) ioapic0 irqs 0-23 ioapic1 irqs 24-31 ioapic2 irqs 32-39 ioapic3 irqs 40-47 ioapic4 irqs 48-55 Launching APs: 1 13 5 12 9 14 8 7 10 6 11 15 3 4 2 Timecounter "TSC-low" frequency 1496523352 Hz quality 1000 random: entropy device external interface kbd0 at kbdmux0 [...] ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"