Hi, I’ve been trying to debug the PANIC and OEM string handling and am running out of ideas whether this is a bug or whether something so subtle has changed in my config that I’m just not seeing it.
(Note: I’m willing to pay for consulting.) I have machines that we’ve moved from an older setup (Gentoo, (mostly) vanilla kernel 4.19.157) to a newer setup (NixOS, (mostly) vanilla kernel 5.10.159) and I’m now experiencing crashes that seem to be kernel panics but do not get the usual messages in the IPMI SEL. The kernel does include the necessary drivers, the watchdog is active and the SEL shows the watchdog action. I have reason to think that it’s a panic because the typical behaviour of the timeout jumping to 255 happens. Here’s the IPMI-related config and cmdline from the old kernel where it works: BOOT_IMAGE=/kernel-genkernel-x86_64-4.19.157 root=/dev/vgsys/root ro rootfstype=ext4 dolvm ipmi_watchdog.timeout=60 igb.InterruptThrottleRate=1 ixgbe.InterruptThrottleRate=1 console=ttyS2,57600 # CONFIG_ACPI_IPMI is not set CONFIG_IPMI_HANDLER=y CONFIG_IPMI_DMI_DECODE=y CONFIG_IPMI_PANIC_EVENT=y CONFIG_IPMI_PANIC_STRING=y CONFIG_IPMI_DEVICE_INTERFACE=y CONFIG_IPMI_SI=y # CONFIG_IPMI_SSIF is not set CONFIG_IPMI_WATCHDOG=y CONFIG_IPMI_POWEROFF=y On that system (as everything is statically compiled) the lsmod is empty WRT ipmi and the kernel log shows: [ 4.374757] ipmi device interface [ 4.389388] ipmi_si dmi-ipmi-si.0: ipmi_platform: probing via SMBIOS [ 4.402087] ipmi_si: SMBIOS: io 0xca2 regsize 1 spacing 1 irq 0 [ 4.413907] ipmi_si: Adding SMBIOS-specified kcs state machine [ 4.425570] ipmi_si IPI0001:00: ipmi_platform: probing via ACPI [ 4.437408] ipmi_si IPI0001:00: [io 0x0ca2] regsize 1 spacing 1 irq 0 [ 4.450449] ipmi_si dmi-ipmi-si.0: Removing SMBIOS-specified kcs state machine in favor of ACPI [ 4.467818] ipmi_si: Adding ACPI-specified kcs state machine [ 4.479139] ipmi_si: Trying ACPI-specified kcs state machine at i/o address 0xca2, slave address 0x0, irq 0 [ 4.567613] ipmi_si IPI0001:00: The BMC does not support clearing the recv irq bit, compensating, but the BMC needs to be fixed. [ 4.617693] ipmi_si IPI0001:00: Found new BMC (man_id: 0x002a7c, prod_id: 0x0624, dev_id: 0x20) [ 4.671871] ipmi_si IPI0001:00: IPMI kcs interface initialized And here’s the controller info: Device ID : 32 Device Revision : 1 Firmware Revision : 2.24 IPMI Version : 2.0 Manufacturer ID : 10876 Manufacturer Name : Supermicro Product ID : 1572 (0x0624) Product Name : Unknown (0x624) Device Available : yes Provides Device SDRs : no Additional Device Support : Sensor Device SDR Repository Device SEL Device FRU Inventory Device IPMB Event Receiver IPMB Event Generator Chassis Device Aux Firmware Rev Info : 0x06 0x00 0x00 0x00 And here’s the NixOS environment where it doesn’t work: BOOT_IMAGE=/kernels/qy42jhicvvqb0p7x2h0i46b2x0f1w74q-linux-5.10.159-bzImage init=/nix/store/qx33nyr0f60y76yzmbgsikxr60lqzdb3-nixos-system-...-21.05pre-git/init dolvm ipmi_watchdog.timeout=60 igb.InterruptThrottleRate=1 ixgbe.InterruptThrottleRate=1 panic=1 boot.panic_on_fail systemd.journald.forward_to_console=no systemd.log_target=kmsg console=ttyS1,115200 loglevel=7 CONFIG_ACPI_IPMI=m CONFIG_IPMI_HANDLER=m CONFIG_IPMI_DMI_DECODE=y CONFIG_IPMI_PLAT_DATA=y CONFIG_IPMI_PANIC_EVENT=y CONFIG_IPMI_PANIC_STRING=y CONFIG_IPMI_DEVICE_INTERFACE=m CONFIG_IPMI_SI=m CONFIG_IPMI_SSIF=m CONFIG_IPMI_WATCHDOG=m CONFIG_IPMI_POWEROFF=m On the newer system this is what appears in the kernel log: [ 22.070935] ipmi device interface [ 22.086353] systemd-modules-load[572]: Inserted module 'ipmi_watchdog' [ 22.904717] ipmi_si: IPMI System Interface driver [ 22.911022] ipmi_si dmi-ipmi-si.0: ipmi_platform: probing via SMBIOS [ 22.917393] ipmi_platform: ipmi_si: SMBIOS: io 0xca8 regsize 1 spacing 4 irq 0 [ 22.925092] ipmi_si: Adding SMBIOS-specified kcs state machine [ 22.931023] ipmi_si: Trying SMBIOS-specified kcs state machine at i/o address 0xca8, slave address 0x20, irq 0 [ 23.119892] ipmi_si dmi-ipmi-si.0: IPMI message handler: Found new BMC (man_id: 0x0002a2, prod_id: 0x0100, dev_id: 0x20) [ 23.438469] ipmi_si dmi-ipmi-si.0: IPMI kcs interface initialized [ 23.441630] ipmi_ssif: IPMI SSIF Interface driver And the ipmi-related modules look like this: ipmi_ssif 40960 0 ipmi_si 73728 1 ipmi_watchdog 32768 1 ipmi_devintf 20480 0 ipmi_msghandler 73728 4 ipmi_devintf,ipmi_si,ipmi_watchdog,ipmi_ssif i2c_core 102400 5 drm_kms_helper,i2c_algo_bit,mgag200,ipmi_ssif,drm In this case it’s a DELL IPMI controller: Device ID : 32 Device Revision : 0 Firmware Revision : 1.52 IPMI Version : 2.0 Manufacturer ID : 674 Manufacturer Name : DELL Inc Product ID : 256 (0x0100) Product Name : Unknown (0x100) Device Available : yes Provides Device SDRs : yes Additional Device Support : Sensor Device SDR Repository Device SEL Device FRU Inventory Device IPMB Event Receiver Bridge Chassis Device Aux Firmware Rev Info : 0x00 0x0a 0x00 0x00 But the behaviour has been the same on various SuperMicro machines. So, after running out of ideas what to look for, I’m left with those questions: 1. when I trigger a panic manually via “echo c > /proc/sysrq-trigger” - that should also create a panic message that appears in the SEL, right? 2. Is there anything that comes to mind that I could have configured incorrectly in the kernel? 3. Or is there anything I can inspect after boot to know which setting the “panic_op” has? I have running systems with both the old and new setups available that I can freely poke to analyze it interactively. Any help is appreciated as I’ve run out of ideas and (just to make sure) I’m happy to pay proper consulting rates (especially happy to support open source work). Liebe Grüße, Christian Theune -- Christian Theune · c...@flyingcircus.io · +49 345 219401 0 Flying Circus Internet Operations GmbH · https://flyingcircus.io Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick _______________________________________________ Openipmi-developer mailing list Openipmi-developer@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openipmi-developer