Hi!
Long story short: looks like r360843 can lead to kernel panic at disk
initialization in 11.4-STABLE (12-STABLE shall be affected too, however,
this is not tested).
Long story longer: after routine upgrade from 11.2-STABLE to 11.4-STABLE,
host panics during disc initialization. Hardware: Dell R530, onboard
PERC adapter with two drives exported to system as JBOD / SYSPD:
mfi0 Adapter:
Product Name: PERC H730 Mini
Serial Number: 83M024U
Firmware: 25.5.5.0005
mfi0 Configuration: 0 arrays, 0 volumes, 0 spares
mfi0 Physical Drives:
0 ( 447G) JBOD SATA E1:S0
1 ( 447G) JBOD SATA E1:S1
(zfs mirror is no worse than perc one).
=== console log starts (a bit garbled by other devices init)
mfisyspd0 numa-domain 0 on mfi0
mfisyspd0: 457862MB (937703088 sectors) SYSPD volume (deviceid: 0)
mfisyspd0: SYSPD volume attached
mfi0: DJA NA XXX SYSPDIO
ses0 at ahciem0 bus 0 scbus4 target 0 lun 0
Fatal trap 12: page fault while in kernel mode
uhub0: ses0: SEMB S-E-S 2.00 device
ses0: SEMB SES Device
cpuid = 18; on usbus0
apic id = 18
ses1 at ahciem1 bus 0 scbus11 target 0 lun 0
ses1: uhub1: on usbus1
fault virtual address = 0x0
SEMB S-E-S 2.00 device
ses1: SEMB SES Device
fault code = supervisor read data, page not present
instruction pointer = 0x20:0x803daacb
pass1 at ahcich9 bus 0 scbus10 target 0 lun 0
pass1: Removable CD-ROM SCSI device
stack pointer = 0x28:0xfe07c2f457f0
frame pointer = 0x28:0xfe07c2f45820
pass1: Serial Number JD6H1PLC0084O50KOA00
pass1: 150.000MB/s transfers (SATA 1.x, UDMA6, ATAPI 12bytes, PIO 8192bytes)
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
ses1: pass1 in 'Slot 05', SATA Slot: scbus10 target 0
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 13 (g_down)
trap number = 12
panic: page fault
cpuid = 18
KDB: stack backtrace:
#0 0x805e5a45 at kdb_backtrace+0x65
#1 0x8059fd7e at vpanic+0x15e
#2 0x8059fc13 at panic+0x43
#3 0x80861515 at trap_fatal+0x365
#4 0x80861569 at trap_pfault+0x49
#5 0x80860c1e at trap+0x27e
#6 0x808427af at calltrap+0x8
#7 0x803d39f8 at mfi_send_frame+0x28
#8 0x803d395f at mfi_data_cb+0x2bf
#9 0x805de0be at bus_dmamap_load_bio+0xae
#10 0x803d351e at mfi_mapcmd+0xae
#11 0x803d292b at mfi_startio+0xeb
#12 0x803d8a39 at mfi_syspd_strategy+0x99
#13 0x804f8c99 at g_disk_start+0x369
#14 0x804fc3c3 at g_io_schedule_down+0x173
#15 0x804fcc5c at g_down_procbody+0x6c
#16 0x8056b0de at fork_exit+0x7e
#17 0x808437ce at fork_trampoline+0xe
Uptime: 1s
= console log ends
this line from log
mfi0: DJA NA XXX SYSPDIO
suggests that instead of proceeding to initializing req_desc (line :
https://svnweb.freebsd.org/base/stable/11/sys/dev/mfi/mfi_tbolt.c?revision=360843=markup#l1110)
code just prints this message and continues to MFI_WRITE (line 1141) with
req_desc initialized to NULL (line 1093).
Manual rollback of mentioned patch leads to following warning
during compilation:
cc -target x86_64-unknown-freebsd11.4 --sysroot=/usr/obj/usr/src/tmp
-B/usr/obj/usr/src/tmp/usr/bin -c -O2 -pipe -fno-strict-aliasing -g -nostdinc
-I. -I/usr/src/sys -I/usr/src/sys/contrib/libfdt -D_KERNEL
-DHAVE_KERNEL_OPTION_HEADERS -include opt_global.h -fno-omit-frame-pointer
-mno-omit-leaf-frame-pointer -MD -MF.depend.mfi_tbolt.o -MTmfi_tbolt.o
-mcmodel=kernel -mno-red-zone -mno-mmx -mno-sse -msoft-float
-fno-asynchronous-unwind-tables -ffreestanding -fwrapv -fstack-protector
-gdwarf-2 -Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes
-Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual -Wundef
-Wno-pointer-sign -D__printf__=__freebsd_kprintf__ -Wmissing-include-dirs
-fdiagnostics-show-option -Wno-unknown-pragmas -Wno-error-tautological-compare
-Wno-error-empty-body -Wno-error-parentheses-equality
-Wno-error-unused-function -Wno-error-pointer-sign
-Wno-error-shift-negative-value -Wno-address-of-packed-member -mno-aes
-mno-avx -std=iso9899:1999 -Werror /us
r/src/sys/dev/mfi/mfi_tbolt.c
/usr/src/sys/dev/mfi/mfi_tbolt.c:1110:22: warning: overlapping comparisons
always evaluate to true [-Wtautological-overlap-compare]
if (cdb[0] != 0x28 || cdb[0] != 0x2A) {
~~~^
1 warning generated.
however, system boots and works just fine (all variands of cdb[0] now
translated to correct req_desc).
Attempt to return error in case of cdb[0] in 0x28/0x2A leads to numerous
read errors in console log and inability to boot (geom thinks that gpart
is broken, zfs is unable to find pool), so this is not the option:
[..]
mfisyspd0 numa-domain 0 on mfi0
mfisyspd0: 457862MB (937703088 sectors)