On 17 September, 2021 - Corey Minyard wrote: > On Fri, Sep 17, 2021 at 02:55:25PM +0200, Anton Lundin wrote: > > On 17 September, 2021 - Corey Minyard wrote: > > > > > On Fri, Sep 17, 2021 at 12:14:19PM +0200, Anton Lundin wrote: > > > > On 16 September, 2021 - Corey Minyard wrote: > > > > > > > > > On Thu, Sep 16, 2021 at 04:53:00PM +0200, Anton Lundin wrote: > > > > > > Hi. > > > > > > > > > > > > I've just done a upgrade of the kernel we're using in a product from > > > > > > 4.19 to 5.10 and I noted a issue. > > > > > > > > > > > > It started that with that we didn't get panic and oops dumps in our > > > > > > erst > > > > > > backed pstore, and when debugging that I noted that the reboot on > > > > > > panic > > > > > > timer didn't work either. > > > > > > > > > > > > I've bisected it down to 2033f6858970 ("ipmi: Free receive messages > > > > > > when > > > > > > in an oops"). > > > > > > > > > > Hmm. Unfortunately removing that will break other things. Can you > > > > > try > > > > > the following patch? It's a good idea, in general, to do as little as > > > > > possible in the panic path, this should cover a multitude of issues. > > > > > > > > > > Thanks for the report. > > > > > > > > > > > > > I'm sorry to report that the patch didn't solve the issue, and the > > > > machine locked up in the panic path as before. > > > > > > I missed something. Can you try the following? If this doesn't work, > > > I'm going to have to figure out how to reproduce this. > > > > > > > Sorry, still no joy. > > > > My guess is that there is something locking up due to these Supermicro > > machines have their ERST memory backed by the BMC, and the same BMC is > > is the other end of all the ipmi communications. > > > > I've reproduced this on Server/X11SCZ-F and Server/H11SSL-i but I'm > > guessing it can be reproduced on most, if not all, of their hardware > > with the same setup. > > > > We're using the ERST backend for pstore, because we're still > > bios-booting them and don't have efi services available to use as pstore > > backend. > > > > > > I've tested to just yank out the ipmi modules from the kernel and that > > fixes the panic timer and we get crash dumps to pstore. > > Dang. I'm going to have to look deeper at what that could change to > cause an issue like this. Are you using the IPMI watchdog? Do you have > CONFIG_IPMI_PANIC_EVENT=y set in the config?
# CONFIG_IPMI_PANIC_EVENT is not set We're using the IPMI watchdog. //Anton _______________________________________________ Openipmi-developer mailing list Openipmi-developer@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openipmi-developer