Hi,

> On 28. Feb 2023, at 17:36, Corey Minyard <miny...@acm.org> wrote:
> 
> On Tue, Feb 28, 2023 at 02:53:12PM +0100, Christian Theune via 
> Openipmi-developer wrote:
>> Hi,
>> 
>> I’ve been trying to debug the PANIC and OEM string handling and am running 
>> out of ideas whether this is a bug or whether something so subtle has 
>> changed in my config that I’m just not seeing it.
>> 
>> (Note: I’m willing to pay for consulting.)
> 
> Probably not necessary.

Thanks! The offer always stands. If we should ever meet I’m also able to pay in 
beverages. ;)

>> I have machines that we’ve moved from an older setup (Gentoo, (mostly) 
>> vanilla kernel 4.19.157) to a newer setup (NixOS, (mostly) vanilla kernel 
>> 5.10.159) and I’m now experiencing crashes that seem to be kernel panics but 
>> do not get the usual messages in the IPMI SEL.
> 
> I just tested on stock 5.10.159 and it worked without issue.  Everything
> you have below looks ok.
> 
> Can you test by causing a crash with:
> 
>  echo c >/proc/sysrq-trigger
> 
> and see if it works?

Yeah, already tried that and unfortunately that _doesn’t_ work.

> It sounds like you are having some type of crash that you would normally
> use the IPMI logs to debug.  However, they aren't perfect, the system
> has to stay up long enough to get them into the event log.

I think they are staying up long enough because a panic triggers the 255 second 
bump in the watchdog and only then pass on. However, i’ve also noticed that the 
kernel _should_ be rebooting after a panic much faster (and not rely on the 
watchdog) and that doesn’t happen either. (Sorry this just popped from the back 
of my head).

> In this situation, getting a serial console (probably through IPMI
> Serial over LAN) and getting the console output on a crash is probably
> your best option.  You can use ipmitool for this, or I have a library
> that is able to make connections to serial ports, including through IPMI
> SoL.

Yup. Been there, too. :)

Unfortunately we’re currently chasing something that pops up very randomly on 
somewhat odd machines and I also have the feeling that it’s systematically 
broken right now (as the “echo c” doesn’t work).

Thanks a lot,
Christian

-- 
Christian Theune · c...@flyingcircus.io · +49 345 219401 0
Flying Circus Internet Operations GmbH · https://flyingcircus.io
Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick



_______________________________________________
Openipmi-developer mailing list
Openipmi-developer@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openipmi-developer

Reply via email to