I’m going to actually attach a serial console to watch the “echo c” panic, maybe that gives _some_ indication.
Otherwise: I can quickly run patches on the kernel there to try out things. (And the funding offer still stands.) Christian > On 1. Mar 2023, at 17:58, Corey Minyard <miny...@acm.org> wrote: > > On Tue, Feb 28, 2023 at 06:36:17PM +0100, Christian Theune wrote: >> Thanks, both machines report: >> >> # cat /sys/module/ipmi_msghandler/parameters/panic_op >> string > > At this point, I have no idea. I'd have to start adding printks into > the code and cause crashes to see what is happing. > > Maybe something is getting in the way of the panic notifiers and doing > something to prevent the IPMI driver from working. > > -corey > >> >> >>> On 28. Feb 2023, at 18:04, Corey Minyard <miny...@acm.org> wrote: >>> >>> Oh, I forgot. You can look at panic_op in >>> /sys/module/ipmi_msghandler/parameters/panic_op >>> >>> -corey >>> >>> On Tue, Feb 28, 2023 at 05:48:07PM +0100, Christian Theune via >>> Openipmi-developer wrote: >>>> Hi, >>>> >>>>> On 28. Feb 2023, at 17:36, Corey Minyard <miny...@acm.org> wrote: >>>>> >>>>> On Tue, Feb 28, 2023 at 02:53:12PM +0100, Christian Theune via >>>>> Openipmi-developer wrote: >>>>>> Hi, >>>>>> >>>>>> I’ve been trying to debug the PANIC and OEM string handling and am >>>>>> running out of ideas whether this is a bug or whether something so >>>>>> subtle has changed in my config that I’m just not seeing it. >>>>>> >>>>>> (Note: I’m willing to pay for consulting.) >>>>> >>>>> Probably not necessary. >>>> >>>> Thanks! The offer always stands. If we should ever meet I’m also able to >>>> pay in beverages. ;) >>>> >>>>>> I have machines that we’ve moved from an older setup (Gentoo, (mostly) >>>>>> vanilla kernel 4.19.157) to a newer setup (NixOS, (mostly) vanilla >>>>>> kernel 5.10.159) and I’m now experiencing crashes that seem to be kernel >>>>>> panics but do not get the usual messages in the IPMI SEL. >>>>> >>>>> I just tested on stock 5.10.159 and it worked without issue. Everything >>>>> you have below looks ok. >>>>> >>>>> Can you test by causing a crash with: >>>>> >>>>> echo c >/proc/sysrq-trigger >>>>> >>>>> and see if it works? >>>> >>>> Yeah, already tried that and unfortunately that _doesn’t_ work. >>>> >>>>> It sounds like you are having some type of crash that you would normally >>>>> use the IPMI logs to debug. However, they aren't perfect, the system >>>>> has to stay up long enough to get them into the event log. >>>> >>>> I think they are staying up long enough because a panic triggers the 255 >>>> second bump in the watchdog and only then pass on. However, i’ve also >>>> noticed that the kernel _should_ be rebooting after a panic much faster >>>> (and not rely on the watchdog) and that doesn’t happen either. (Sorry this >>>> just popped from the back of my head). >>>> >>>>> In this situation, getting a serial console (probably through IPMI >>>>> Serial over LAN) and getting the console output on a crash is probably >>>>> your best option. You can use ipmitool for this, or I have a library >>>>> that is able to make connections to serial ports, including through IPMI >>>>> SoL. >>>> >>>> Yup. Been there, too. :) >>>> >>>> Unfortunately we’re currently chasing something that pops up very randomly >>>> on somewhat odd machines and I also have the feeling that it’s >>>> systematically broken right now (as the “echo c” doesn’t work). >>>> >>>> Thanks a lot, >>>> Christian >>>> >>>> -- >>>> Christian Theune · c...@flyingcircus.io · +49 345 219401 0 >>>> Flying Circus Internet Operations GmbH · https://flyingcircus.io >>>> Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland >>>> HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian >>>> Zagrodnick >>>> >>>> >>>> >>>> _______________________________________________ >>>> Openipmi-developer mailing list >>>> Openipmi-developer@lists.sourceforge.net >>>> https://lists.sourceforge.net/lists/listinfo/openipmi-developer >> >> Liebe Grüße, >> Christian Theune >> >> -- >> Christian Theune · c...@flyingcircus.io · +49 345 219401 0 >> Flying Circus Internet Operations GmbH · https://flyingcircus.io >> Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland >> HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian >> Zagrodnick >> Liebe Grüße, Christian Theune -- Christian Theune · c...@flyingcircus.io · +49 345 219401 0 Flying Circus Internet Operations GmbH · https://flyingcircus.io Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick _______________________________________________ Openipmi-developer mailing list Openipmi-developer@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openipmi-developer