I’m going to actually attach a serial console to watch the “echo c” panic, 
maybe that gives _some_ indication.

Otherwise: I can quickly run patches on the kernel there to try out things. 
(And the funding offer still stands.)

Christian

> On 1. Mar 2023, at 17:58, Corey Minyard <miny...@acm.org> wrote:
> 
> On Tue, Feb 28, 2023 at 06:36:17PM +0100, Christian Theune wrote:
>> Thanks, both machines report:
>> 
>> # cat /sys/module/ipmi_msghandler/parameters/panic_op
>> string
> 
> At this point, I have no idea.  I'd have to start adding printks into
> the code and cause crashes to see what is happing.
> 
> Maybe something is getting in the way of the panic notifiers and doing
> something to prevent the IPMI driver from working.
> 
> -corey
> 
>> 
>> 
>>> On 28. Feb 2023, at 18:04, Corey Minyard <miny...@acm.org> wrote:
>>> 
>>> Oh, I forgot.  You can look at panic_op in 
>>> /sys/module/ipmi_msghandler/parameters/panic_op
>>> 
>>> -corey
>>> 
>>> On Tue, Feb 28, 2023 at 05:48:07PM +0100, Christian Theune via 
>>> Openipmi-developer wrote:
>>>> Hi,
>>>> 
>>>>> On 28. Feb 2023, at 17:36, Corey Minyard <miny...@acm.org> wrote:
>>>>> 
>>>>> On Tue, Feb 28, 2023 at 02:53:12PM +0100, Christian Theune via 
>>>>> Openipmi-developer wrote:
>>>>>> Hi,
>>>>>> 
>>>>>> I’ve been trying to debug the PANIC and OEM string handling and am 
>>>>>> running out of ideas whether this is a bug or whether something so 
>>>>>> subtle has changed in my config that I’m just not seeing it.
>>>>>> 
>>>>>> (Note: I’m willing to pay for consulting.)
>>>>> 
>>>>> Probably not necessary.
>>>> 
>>>> Thanks! The offer always stands. If we should ever meet I’m also able to 
>>>> pay in beverages. ;)
>>>> 
>>>>>> I have machines that we’ve moved from an older setup (Gentoo, (mostly) 
>>>>>> vanilla kernel 4.19.157) to a newer setup (NixOS, (mostly) vanilla 
>>>>>> kernel 5.10.159) and I’m now experiencing crashes that seem to be kernel 
>>>>>> panics but do not get the usual messages in the IPMI SEL.
>>>>> 
>>>>> I just tested on stock 5.10.159 and it worked without issue.  Everything
>>>>> you have below looks ok.
>>>>> 
>>>>> Can you test by causing a crash with:
>>>>> 
>>>>> echo c >/proc/sysrq-trigger
>>>>> 
>>>>> and see if it works?
>>>> 
>>>> Yeah, already tried that and unfortunately that _doesn’t_ work.
>>>> 
>>>>> It sounds like you are having some type of crash that you would normally
>>>>> use the IPMI logs to debug.  However, they aren't perfect, the system
>>>>> has to stay up long enough to get them into the event log.
>>>> 
>>>> I think they are staying up long enough because a panic triggers the 255 
>>>> second bump in the watchdog and only then pass on. However, i’ve also 
>>>> noticed that the kernel _should_ be rebooting after a panic much faster 
>>>> (and not rely on the watchdog) and that doesn’t happen either. (Sorry this 
>>>> just popped from the back of my head).
>>>> 
>>>>> In this situation, getting a serial console (probably through IPMI
>>>>> Serial over LAN) and getting the console output on a crash is probably
>>>>> your best option.  You can use ipmitool for this, or I have a library
>>>>> that is able to make connections to serial ports, including through IPMI
>>>>> SoL.
>>>> 
>>>> Yup. Been there, too. :)
>>>> 
>>>> Unfortunately we’re currently chasing something that pops up very randomly 
>>>> on somewhat odd machines and I also have the feeling that it’s 
>>>> systematically broken right now (as the “echo c” doesn’t work).
>>>> 
>>>> Thanks a lot,
>>>> Christian
>>>> 
>>>> -- 
>>>> Christian Theune · c...@flyingcircus.io · +49 345 219401 0
>>>> Flying Circus Internet Operations GmbH · https://flyingcircus.io
>>>> Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
>>>> HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian 
>>>> Zagrodnick
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> Openipmi-developer mailing list
>>>> Openipmi-developer@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/openipmi-developer
>> 
>> Liebe Grüße,
>> Christian Theune
>> 
>> -- 
>> Christian Theune · c...@flyingcircus.io · +49 345 219401 0
>> Flying Circus Internet Operations GmbH · https://flyingcircus.io
>> Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
>> HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian 
>> Zagrodnick
>> 

Liebe Grüße,
Christian Theune

-- 
Christian Theune · c...@flyingcircus.io · +49 345 219401 0
Flying Circus Internet Operations GmbH · https://flyingcircus.io
Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick



_______________________________________________
Openipmi-developer mailing list
Openipmi-developer@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openipmi-developer

Reply via email to