Re: > r353680: multiuser crash due to: m_getzone: Inavlid cluster size 0

2019-10-24 Thread Fedorov, Aleksandr
I discovered a similar kernel panic.

To reproduce, just run CURRENT in bhyve with e1000 network backend.

I think the problem is that the debugnet_any_ifnet_update () function calls 
iflib_debugnet_init () when the private driver data is not yet fully 
initialized.

sys/net/iflib.c:
6724iflib_debugnet_init(if_t ifp, int *nrxr, int *ncl, int *clsize)
6725{
6726if_ctx_t ctx;
6727
6728ctx = if_getsoftc(ifp);
6729CTX_LOCK(ctx);
6730*nrxr = NRXQSETS(ctx);
6731*ncl = ctx->ifc_rxqs[0].ifr_fl->ifl_size;
6732*clsize = ctx->ifc_rxqs[0].ifr_fl->ifl_buf_size; -- 
ifl_buf_size is equal zero!!!
6733CTX_UNLOCK(ctx);
6734}

So, it seems that ifnet_link_event EVENTHANDLER is too early to initialize 
debugnet.

Because ifl_buf_size is initialized with ctx-> ifc_rx_mbuf_sz, which is 
initialized with iflib_calc_rx_mbuf_sz (), I use the following patch, as a 
workaround:

diff --git a/sys/net/iflib.c b/sys/net/iflib.c
index 73606981a492..1caf3505932a 100644
--- a/sys/net/iflib.c
+++ b/sys/net/iflib.c
@@ -6729,7 +6729,8 @@ iflib_debugnet_init(if_t ifp, int *nrxr, int *ncl, int 
*clsize)
CTX_LOCK(ctx);
*nrxr = NRXQSETS(ctx);
*ncl = ctx->ifc_rxqs[0].ifr_fl->ifl_size;
-   *clsize = ctx->ifc_rxqs[0].ifr_fl->ifl_buf_size;
+   iflib_calc_rx_mbuf_sz(ctx);
+   *clsize = iflib_get_rx_mbuf_sz(ctx);
CTX_UNLOCK(ctx);
 }

em0:  port 0x2000-0x2007 mem 
0xc000-0xc001,0xc002-0xc002 irq 16 at device 2.0 on pci0
em0: Using 1024 TX descriptors and 1024 RX descriptors
em0: Ethernet address: 00:a0:98:b9:5c:99
em0: netmap queues/slots: TX 1/1024, RX 1/1024
virtio_pci0:  port 0x2040-0x207f mem 
0xc003-0xc0031fff irq 17 at device 3.0 on pci0
vtblk0:  on virtio_pci0
vtblk0: 16384MB (33554432 512 byte sectors)
atkbdc0:  port 0x60,0x64 irq 1 on acpi0
atkbd0:  irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
driver bug: Unable to set devclass (class: atkbdc devname: (unknown))
Unhandled ps2 mouse command 0xe1
psm0:  irq 12 on atkbdc0
psm0: [GIANT-LOCKED]
psm0: model Generic PS/2 mouse, device ID 0
uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
uart0: console (9600,n,8,1)
uart1: <16550 or compatible> port 0x2f8-0x2ff irq 3 on acpi0
vga0:  at port 0x3b0-0x3bb iomem 0xb-0xb7fff pnpid PNP0900 
on isa0
Timecounters tick every 10.000 msec
usb_needs_explore_all: no devclass
em0: link state changed to UP
panic: m_getzone: invalid cluster size 0
cpuid = 0
time = 1
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe0011b8d7f0
vpanic() at vpanic+0x17e/frame 0xfe0011b8d850
panic() at panic+0x43/frame 0xfe0011b8d8b0
debugnet_mbuf_reinit() at debugnet_mbuf_reinit+0x21b/frame 0xfe0011b8d8f0
debugnet_any_ifnet_update() at debugnet_any_ifnet_update+0x107/frame 
0xfe0011b8d940
do_link_state_change() at do_link_state_change+0x1b3/frame 0xfe0011b8d990
taskqueue_run_locked() at taskqueue_run_locked+0x10c/frame 0xfe0011b8d9f0
taskqueue_run() at taskqueue_run+0x4a/frame 0xfe0011b8da10
ithread_loop() at ithread_loop+0x1c6/frame 0xfe0011b8da70
fork_exit() at fork_exit+0x80/frame 0xfe0011b8dab0
fork_trampoline() at fork_trampoline+0xe/frame 0xfe0011b8dab0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic
[ thread pid 12 tid 100010 ]
Stopped at  kdb_enter+0x37: movq$0,0x1098a86(%rip)
db> 
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


> r353680: multiuser crash due to: m_getzone: Inavlid cluster size 0

2019-10-24 Thread O. Hartmann
The last known good update of CURRENT on a Fujitsu Primergy RX2530-M5 (only one
of two sockets equipted, 64 GB RAM) was October, 17th, 2019 before 15 o'clock,
I suppose that was r353680 that time. Today's update to r353881 resulted in an
immediate crash when the network (igb0-igb3, two built-in i350 NICs and two
i350 NICs placed on a i350-T2 server adapter) comes up, just when rc scripts
configure the NIC's.

Last message I see is something like m_getzone: Inavlid cluster size 0 and
"dubugnet" or similar. Since the crash wrecked the installation (it seems after
updating, the UFS filesystem received, as so often, inconsistencies, so I can
not start vi or other applications after a full fsck -yf on all partitons,
those programs fail with some serious trap, stating that ELF is corrupt, I
can't remember the exact message). We do not have debugging facilities enabled
on that kernel suite, so I can not provide more proper informations.

For emergency rescue we downloaded the latest CURRENT memstick image,
FreeBSD-13.0-CURRENT-amd64-20191018-r353709-memstick.img dated Oct., 18th, which
also shows the bug described above.

It seems that I have to go back to memimage
FreeBSD-13.0-CURRENT-amd64-20191011-r353427-memstick.img which dates to 11th
October 2019.
Since the crash resulted in a serious damage of the base filesystem and the
installation, I need to copy first the installation tarballs from the install
memstick into place and try then to rebuild the system with sources up to the
version which is deemed working. The I'll report, hopefully, more information.

Kind regards,
oh
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: > r353680: multiuser crash due to: m_getzone: Inavlid cluster size 0

2019-10-23 Thread Navdeep Parhar
This is 241403 in bugzilla.



On Wed, Oct 23, 2019, 12:57 AM O. Hartmann  wrote:

> The last known good update of CURRENT on a Fujitsu Primergy RX2530-M5
> (only one
> of two sockets equipted, 64 GB RAM) was October, 17th, 2019 before 15
> o'clock,
> I suppose that was r353680 that time. Today's update to r353881 resulted
> in an
> immediate crash when the network (igb0-igb3, two built-in i350 NICs and two
> i350 NICs placed on a i350-T2 server adapter) comes up, just when rc
> scripts
> configure the NIC's.
>
> Last message I see is something like m_getzone: Inavlid cluster size 0 and
> "dubugnet" or similar. Since the crash wrecked the installation (it seems
> after
> updating, the UFS filesystem received, as so often, inconsistencies, so I
> can
> not start vi or other applications after a full fsck -yf on all partitons,
> those programs fail with some serious trap, stating that ELF is corrupt, I
> can't remember the exact message). We do not have debugging facilities
> enabled
> on that kernel suite, so I can not provide more proper informations.
>
> For emergency rescue we downloaded the latest CURRENT memstick image,
> FreeBSD-13.0-CURRENT-amd64-20191018-r353709-memstick.img dated Oct., 18th,
> which
> also shows the bug described above.
>
> It seems that I have to go back to memimage
> FreeBSD-13.0-CURRENT-amd64-20191011-r353427-memstick.img which dates to
> 11th
> October 2019.
> Since the crash resulted in a serious damage of the base filesystem and the
> installation, I need to copy first the installation tarballs from the
> install
> memstick into place and try then to rebuild the system with sources up to
> the
> version which is deemed working. The I'll report, hopefully, more
> information.
>
> Kind regards,
> oh
>
> Addendum:
>
> r353680 works
> r353709 doesn't work
>
>
> [...]
> /etc # more /var/crash/info.last
> Dump header from device: /dev/da0p2
>   Architecture: amd64
>   Architecture Version: 2
>   Dump Length: 2952835072
>   Blocksize: 512
>   Compression: none
>   Dumptime: Tue Oct 22 12:13:19 2019
>   Hostname: wotan.lan101.bundesimmobilien.intern
>   Magic: FreeBSD Kernel Dump
>   Version String: FreeBSD 13.0-CURRENT #11 r353877: Tue Oct 22 11:02:32
> CEST
> 2019 root@:/usr/obj/usr/src/amd64.amd64/sys/WOTAN
>   Panic String: m_getzone: invalid cluster size 0
>   Dump Parity: 2027469319
>   Bounds: 0
>   Dump Status: good
> [...]
>
> [...]
>
>  # more /var/crash/core.txt.0
> /dev/stdin:1: Error in sourced command file:
> Cannot access memory at address 0x65657246
> /dev/stdin:1: Error in sourced command file:
> Cannot access memory at address 0x65657246
> /dev/stdin:1: Error in sourced command file:
> Cannot access memory at address 0x65657246
> /dev/stdin:1: Error in sourced command file:
> Cannot access memory at address 0x65657246
> /dev/stdin:1: Error in sourced command file:
> Cannot access memory at address 0x65657246
>
>
> [...]
>
>
> [...]
> ---<>---
> Copyright (c) 1992-2019 The FreeBSD Project.
> Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
> The Regents of the University of California. All rights reserved.
> FreeBSD is a registered trademark of The FreeBSD Foundation.
> FreeBSD 13.0-CURRENT #14 r353680: Wed Oct 23 08:50:04 CEST 2019
> root@wotan.lan101.bundesimmobilien.intern
> :/usr/obj/usr/src/amd64.amd64/sys/WOTAN
> amd64 FreeBSD clang version 9.0.0 (tags/RELEASE_900/final 372316) (based on
> LLVM 9.0.0) VT(efifb): resolution 1280x1024
> CPU microcode: no matching update found
> CPU: Intel(R) Xeon(R) Gold 5217 CPU @ 3.00GHz (2993.05-MHz K8-class CPU)
>   Origin="GenuineIntel"  Id=0x50657  Family=0x6  Model=0x55  Stepping=7
>
> Features=0xbfebfbff
>
> Features2=0x7ffefbff
>   AMD Features=0x2c100800
>   AMD Features2=0x121
>   Structured Extended
>
> Features=0xd39b
> Structured Extended Features2=0x808 Structured Extended
> Features3=0xbc000400 XSAVE
> Features=0xf
> IA32_ARCH_CAPS=0x2b VT-x:
> PAT,HLT,MTF,PAUSE,EPT,UG,VPID,VID,PostIntr TSC: P-state invariant,
> performance
> statistics real memory  = 68719476736 (65536 MB)
> avail memory = 66361274368 (63287 MB)
> Event timer "LAPIC" quality 600
> ACPI APIC Table: 
> FreeBSD/SMP: Multiprocessor System Detected: 16 CPUs
> FreeBSD/SMP: 1 package(s) x 8 core(s) x 2 hardware threads
> random: registering fast source Intel Secure Key RNG
> random: fast provider: "Intel Secure Key RNG"
> random: unblocking device.
> Security policy loaded: MAC/ntpd (mac_ntpd)
> ioapic0  irqs 0-23
> ioapic1  irqs 24-31
> ioapic2  irqs 32-39
> ioapic3  irqs 40-47
> ioapic4  irqs 48-55
> Launching APs: 1 13 5 12 9 14 8 7 10 6 11 15 3 4 2
> Timecounter "TSC-low" frequency 1496523352 Hz quality 1000
> random: entropy device external interface
> kbd0 at kbdmux0
> [...]
> ___
> freebsd-current@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
>

> r353680: multiuser crash due to: m_getzone: Inavlid cluster size 0

2019-10-23 Thread O. Hartmann
The last known good update of CURRENT on a Fujitsu Primergy RX2530-M5 (only one
of two sockets equipted, 64 GB RAM) was October, 17th, 2019 before 15 o'clock,
I suppose that was r353680 that time. Today's update to r353881 resulted in an
immediate crash when the network (igb0-igb3, two built-in i350 NICs and two
i350 NICs placed on a i350-T2 server adapter) comes up, just when rc scripts
configure the NIC's.

Last message I see is something like m_getzone: Inavlid cluster size 0 and
"dubugnet" or similar. Since the crash wrecked the installation (it seems after
updating, the UFS filesystem received, as so often, inconsistencies, so I can
not start vi or other applications after a full fsck -yf on all partitons,
those programs fail with some serious trap, stating that ELF is corrupt, I
can't remember the exact message). We do not have debugging facilities enabled
on that kernel suite, so I can not provide more proper informations.

For emergency rescue we downloaded the latest CURRENT memstick image,
FreeBSD-13.0-CURRENT-amd64-20191018-r353709-memstick.img dated Oct., 18th, which
also shows the bug described above.

It seems that I have to go back to memimage
FreeBSD-13.0-CURRENT-amd64-20191011-r353427-memstick.img which dates to 11th
October 2019.
Since the crash resulted in a serious damage of the base filesystem and the
installation, I need to copy first the installation tarballs from the install
memstick into place and try then to rebuild the system with sources up to the
version which is deemed working. The I'll report, hopefully, more information.

Kind regards,
oh

Addendum:

r353680 works
r353709 doesn't work


[...]
/etc # more /var/crash/info.last
Dump header from device: /dev/da0p2
  Architecture: amd64
  Architecture Version: 2
  Dump Length: 2952835072
  Blocksize: 512
  Compression: none
  Dumptime: Tue Oct 22 12:13:19 2019
  Hostname: wotan.lan101.bundesimmobilien.intern
  Magic: FreeBSD Kernel Dump
  Version String: FreeBSD 13.0-CURRENT #11 r353877: Tue Oct 22 11:02:32 CEST
2019 root@:/usr/obj/usr/src/amd64.amd64/sys/WOTAN
  Panic String: m_getzone: invalid cluster size 0
  Dump Parity: 2027469319
  Bounds: 0
  Dump Status: good
[...]

[...]

 # more /var/crash/core.txt.0
/dev/stdin:1: Error in sourced command file:
Cannot access memory at address 0x65657246
/dev/stdin:1: Error in sourced command file:
Cannot access memory at address 0x65657246
/dev/stdin:1: Error in sourced command file:
Cannot access memory at address 0x65657246
/dev/stdin:1: Error in sourced command file:
Cannot access memory at address 0x65657246
/dev/stdin:1: Error in sourced command file:
Cannot access memory at address 0x65657246


[...]


[...]
---<>---
Copyright (c) 1992-2019 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 13.0-CURRENT #14 r353680: Wed Oct 23 08:50:04 CEST 2019

root@wotan.lan101.bundesimmobilien.intern:/usr/obj/usr/src/amd64.amd64/sys/WOTAN
amd64 FreeBSD clang version 9.0.0 (tags/RELEASE_900/final 372316) (based on
LLVM 9.0.0) VT(efifb): resolution 1280x1024
CPU microcode: no matching update found
CPU: Intel(R) Xeon(R) Gold 5217 CPU @ 3.00GHz (2993.05-MHz K8-class CPU)
  Origin="GenuineIntel"  Id=0x50657  Family=0x6  Model=0x55  Stepping=7
  
Features=0xbfebfbff
  
Features2=0x7ffefbff
  AMD Features=0x2c100800
  AMD Features2=0x121
  Structured Extended
Features=0xd39b
Structured Extended Features2=0x808 Structured Extended
Features3=0xbc000400 XSAVE
Features=0xf
IA32_ARCH_CAPS=0x2b VT-x:
PAT,HLT,MTF,PAUSE,EPT,UG,VPID,VID,PostIntr TSC: P-state invariant, performance
statistics real memory  = 68719476736 (65536 MB)
avail memory = 66361274368 (63287 MB)
Event timer "LAPIC" quality 600
ACPI APIC Table: 
FreeBSD/SMP: Multiprocessor System Detected: 16 CPUs
FreeBSD/SMP: 1 package(s) x 8 core(s) x 2 hardware threads
random: registering fast source Intel Secure Key RNG
random: fast provider: "Intel Secure Key RNG"
random: unblocking device.
Security policy loaded: MAC/ntpd (mac_ntpd)
ioapic0  irqs 0-23
ioapic1  irqs 24-31
ioapic2  irqs 32-39
ioapic3  irqs 40-47
ioapic4  irqs 48-55
Launching APs: 1 13 5 12 9 14 8 7 10 6 11 15 3 4 2
Timecounter "TSC-low" frequency 1496523352 Hz quality 1000
random: entropy device external interface
kbd0 at kbdmux0
[...]
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"