Well, dang, I had already fixed this a year and a half ago.  I wish I
had a better memory.

Anyway, the fix is commit db05ddf7f321634c5659a0cf7ea56594e22365f7
("ipmi:watchdog: Set panic count to proper value on a panic") in
mainstream 5.16.  I'm attaching that patch.

-corey

On Tue, Mar 14, 2023 at 03:58:26PM +0100, Christian Theune via 
Openipmi-developer wrote:
> Awesome!
> 
> > On 14. Mar 2023, at 15:54, Corey Minyard <miny...@acm.org> wrote:
> > 
> > On Tue, Mar 14, 2023 at 03:22:39PM +0100, Christian Theune via 
> > Openipmi-developer wrote:
> >> Hi,
> >> 
> >> sorry, I didn’t expect you to make me a branch. I had already taken your 
> >> diff over to 5.10 as it applied cleanly … sorry for the additional work 
> >> and thanks anyways.
> > 
> > Ok, that's great.  It's something about the IPMI watchdog panic
> > routines, and I can reproduce.  I should be able to fix this pretty
> > quickly.  I'll send a patch when I get this fixed.
> > 
> > Thanks,
> > 
> > -corey
> > 
> >> 
> >> Here’s the output:
> >> 
> >> [ 6521.905890] sysrq: Trigger a crash
> >> [ 6521.909294] Kernel panic - not syncing: sysrq triggered crash
> >> [ 6521.915026] CPU: 1 PID: 43785 Comm: bash Tainted: G          I       
> >> 5.10.159 #1-NixOS
> >> [ 6521.922925] Hardware name: Dell Inc. PowerEdge R510/00HDP0, BIOS 1.11.0 
> >> 07/23/2012
> >> [ 6521.930475] Call Trace:
> >> [ 6521.932923]  dump_stack+0x6b/0x83
> >> [ 6521.936230]  panic+0x101/0x2c8
> >> [ 6521.939276]  ? printk+0x58/0x73
> >> [ 6521.942408]  sysrq_handle_crash+0x16/0x20
> >> [ 6521.946407]  __handle_sysrq.cold+0x43/0x11a
> >> [ 6521.950580]  write_sysrq_trigger+0x24/0x40
> >> [ 6521.954668]  proc_reg_write+0x51/0x90
> >> [ 6521.958322]  vfs_write+0xc3/0x280
> >> [ 6521.961627]  ksys_write+0x5f/0xe0
> >> [ 6521.964935]  do_syscall_64+0x33/0x40
> >> [ 6521.968502]  entry_SYSCALL_64_after_hwframe+0x61/0xc6
> >> [ 6521.973540] RIP: 0033:0x7f2c6b91a133
> >> [ 6521.977106] Code: 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 
> >> 80 00 00 00 00 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 01 00 00 00 0f 05 
> >> <48> 3d 00 f0 ff ff 77 55 c3 0f 1f 40 00 41 54 49 89 d4 55 48 89 f5
> >> [ 6521.995836] RSP: 002b:00007ffc4cf11088 EFLAGS: 00000246 ORIG_RAX: 
> >> 0000000000000001
> >> [ 6522.003387] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 
> >> 00007f2c6b91a133
> >> [ 6522.010505] RDX: 0000000000000002 RSI: 0000000001555c08 RDI: 
> >> 0000000000000001
> >> [ 6522.017623] RBP: 0000000001555c08 R08: 000000000000000a R09: 
> >> 00007f2c6b9aaf40
> >> [ 6522.024743] R10: 00000000016e4218 R11: 0000000000000246 R12: 
> >> 0000000000000002
> >> [ 6522.031864] R13: 00007f2c6b9e8520 R14: 00007f2c6b9e8720 R15: 
> >> 0000000000000002
> >> [ 6522.039085] Calling notifier panic_event+0x0/0x410 [ipmi_msghandler] 
> >> (000000008eb8cb44)
> >> [ 6522.047071] IPMI message handler: IPMI: panic event handler
> >> [ 6522.052628] IPMI message handler: IPMI: handling panic event for intf 
> >> 0: 00000000443777b3 0000000067d05ff8
> >> …
> >> and then it reboots after the 255 seconds from the watchdog timer are 
> >> passed.
> >> 
> >> Christian
> >> 
> >>> On 13. Mar 2023, at 18:13, Corey Minyard <miny...@acm.org> wrote:
> >>> 
> >>> On Mon, Mar 13, 2023 at 05:42:39PM +0100, Christian Theune wrote:
> >>>> Hrghs. I’m applying your patch to 5.10 as my distro build infrastructure 
> >>>> has some patches that don’t apply to 6.2 and that I don’t know how to 
> >>>> circumvent quickly enough… :)
> >>> 
> >>> Ok, there's a
> >>> 
> >>> https://github.com/cminyard/linux-ipmi.git:debug-panic-oem-events-5.10
> >>> 
> >>> branch available for you to pull.  It's on top of latest 5.10.
> >>> 
> >>> -corey
> >>> 
> >>>> 
> >>>>> On 13. Mar 2023, at 16:59, Christian Theune <c...@flyingcircus.io> 
> >>>>> wrote:
> >>>>> 
> >>>>> I should be easily able to run 6.2, no worries.
> >>>>> 
> >>>>> 
> >>>>>> On 13. Mar 2023, at 16:33, Corey Minyard <miny...@acm.org> wrote:
> >>>>>> 
> >>>>>> On Mon, Mar 13, 2023 at 02:07:01PM +0100, Christian Theune wrote:
> >>>>>>> Hi,
> >>>>>>> 
> >>>>>>> yeah, the IPMI log is fine. This is a 10 minute interval job in our 
> >>>>>>> system that exports the log and clears it:
> >>>>>>> 
> >>>>>>> The job looks like this:
> >>>>>>> 
> >>>>>>> /nix/store/m7lb36dr93qj27r9vskmjihz8imywy86-ipmitool-1.8.18/bin/ipmitool
> >>>>>>>  sel elist
> >>>>>>> /nix/store/m7lb36dr93qj27r9vskmjihz8imywy86-ipmitool-1.8.18/bin/ipmitool
> >>>>>>>  sel clear
> >>>>>>> 
> >>>>>>> So it’s not atomic but it runs after the boot and the elist should 
> >>>>>>> output it properly … at least it did in the past. ;)
> >>>>>>> 
> >>>>>>> As I said - I’m happy to run any patches you have. If you point me to 
> >>>>>>> a git branch somewhere I can switch that system easily.
> >>>>>> 
> >>>>>> Ok, I have a branch at
> >>>>>> 
> >>>>>> https://github.com/cminyard/linux-ipmi.git:debug-panic-oem-events
> >>>>>> 
> >>>>>> that has debug tracing.  It will print the function for all panic event
> >>>>>> handlers, their return values, and adds traces in the IPMI panic event
> >>>>>> handlers.
> >>>>>> 
> >>>>>> It's a single patch right on top of 6.2; I'm not sure how portable it 
> >>>>>> is
> >>>>>> to other kernel versions.  I can port if you like.
> >>>>>> 
> >>>>>> Thanks,
> >>>>>> 
> >>>>>> -corey
> >>>>>> 
> >>>>>>> 
> >>>>>>> Cheers,
> >>>>>>> Christian
> >>>>>>> 
> >>>>>>>>> On 13. Mar 2023, at 13:58, Corey Minyard <miny...@acm.org> wrote:
> >>>>>>>> 
> >>>>>>>> On Mon, Mar 13, 2023 at 10:27:51AM +0100, Christian Theune wrote:
> >>>>>>>>> Hi,
> >>>>>>>>> 
> >>>>>>>>> alright, so here’s the output from the NixOS machine:
> >>>>>>>>> 
> >>>>>>>>> root@xxx ~ # echo c >/proc/sysrq-trigger
> >>>>>>>>> client_loop: send disconnect: Broken pipe
> >>>>>>>>> …
> >>>>>>>>> 
> >>>>>>>>> root@xxx ~ # journalctl -u ipmi-log.service
> >>>>>>>>> -- Journal begins at Sun 2023-02-26 14:25:36 CET, ends at Mon 
> >>>>>>>>> 2023-03-13 10:25:27 CET. --
> >>>>>>>>> Mar 13 10:12:38 xxx ipmi-log-start[520973]: Clearing SEL.  Please 
> >>>>>>>>> allow a few seconds to erase.
> >>>>>>>>> ...
> >>>>>>>>> -- Boot fdef496e784e4541abd9ae40df472a0b --
> >>>>>>>>> Mar 13 10:25:07 xxx ipmi-log-start[1973]:    1 | 03/13/2023 | 
> >>>>>>>>> 09:12:49 | Event Logging Disabled SEL | Log area reset/cleared | 
> >>>>>>>>> Asserted
> >>>>>>>>> Mar 13 10:25:07 xxx ipmi-log-start[1973]:    2 | 03/13/2023 | 
> >>>>>>>>> 09:21:06 | Watchdog2 OS Watchdog | Hard reset | Asserted
> >>>>>>>>> Mar 13 10:25:07 xxx ipmi-log-start[1977]: Clearing SEL.  Please 
> >>>>>>>>> allow a few seconds to erase.
> >>>>>>>> 
> >>>>>>>> Hmm, the SEL got cleared.  That would clear out any of the logs that
> >>>>>>>> were issued before that time.  I'm not sure when the above happened
> >>>>>>>> verses the crash, though.  It looks like it occurred as part of the
> >>>>>>>> reboot, but I'm not sure what I'm seeing.  Maybe you have a startup
> >>>>>>>> process that clears the SEL?
> >>>>>>>> 
> >>>>>>>> Assuming that's not the issue, what you have looks ok.  I'd need to 
> >>>>>>>> add
> >>>>>>>> some logs to the kernel to see if the log operation ever happens.
> >>>>>>>> 
> >>>>>>>> -corey
> >>>>>>>> 
> >>>>>>>>> 
> >>>>>>>>> The SOL log looks like this:
> >>>>>>>>> 
> >>>>>>>>> 
> >>>>>>>>> [1107585.917689] sysrq: Trigger a crash
> >>>>>>>>> [1107585.921272] Kernel panic - not syncing: sysrq triggered crash
> >>>>>>>>> [1107585.927178] CPU: 1 PID: 521033 Comm: bash Tainted: G          
> >>>>>>>>> I       5.10.159 #1-NixOS
> >>>>>>>>> [1107585.935335] Hardware name: Dell Inc. PowerEdge R510/00HDP0, 
> >>>>>>>>> BIOS 1.11.0 07/23/2012
> >>>>>>>>> [1107585.943058] Call Trace:
> >>>>>>>>> [1107585.945680]  dump_stack+0x6b/0x83
> >>>>>>>>> [1107585.949158]  panic+0x101/0x2c8
> >>>>>>>>> [1107585.952379]  ? printk+0x58/0x73
> >>>>>>>>> [1107585.955687]  sysrq_handle_crash+0x16/0x20
> >>>>>>>>> [1107585.959859]  __handle_sysrq.cold+0x43/0x11a
> >>>>>>>>> [1107585.964203]  write_sysrq_trigger+0x24/0x40
> >>>>>>>>> [1107585.968463]  proc_reg_write+0x51/0x90
> >>>>>>>>> [1107585.972290]  vfs_write+0xc3/0x280
> >>>>>>>>> [1107585.975768]  ksys_write+0x5f/0xe0
> >>>>>>>>> [1107585.979248]  do_syscall_64+0x33/0x40
> >>>>>>>>> [1107585.982987]  entry_SYSCALL_64_after_hwframe+0x61/0xc6
> >>>>>>>>> [1107585.988199] RIP: 0033:0x7f5873932133
> >>>>>>>>> [1107585.991938] Code: 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb 
> >>>>>>>>> b3 0f 1f 80 00 00 00 00 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 01 
> >>>>>>>>> 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 55 c3 0f 1f 40 00 41 54 49 89 
> >>>>>>>>> d4 55 48 89 f5
> >>>>>>>>> [1107586.010842] RSP: 002b:00007ffcc13808c8 EFLAGS: 00000246 
> >>>>>>>>> ORIG_RAX: 0000000000000001
> >>>>>>>>> [1107586.018566] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 
> >>>>>>>>> 00007f5873932133
> >>>>>>>>> [1107586.025923] RDX: 0000000000000002 RSI: 00000000005c1c08 RDI: 
> >>>>>>>>> 0000000000000001
> >>>>>>>>> [1107586.033213] RBP: 00000000005c1c08 R08: 000000000000000a R09: 
> >>>>>>>>> 00007f58739c2f40
> >>>>>>>>> [1107586.040504] R10: 00000000005cc348 R11: 0000000000000246 R12: 
> >>>>>>>>> 0000000000000002
> >>>>>>>>> [1107586.047794] R13: 00007f5873a00520 R14: 00007f5873a00720 R15: 
> >>>>>>>>> 0000000000000002
> >>>>>>>>> 
> >>>>>>>>> Nothing obvious to me here … if you have any further ideas what to 
> >>>>>>>>> test, let me know. I should be more responsive again now.
> >>>>>>>>> 
> >>>>>>>>> Thanks and kind regards,
> >>>>>>>>> Christian
> >>>>>>>>> 
> >>>>>>>>>> On 5. Mar 2023, at 23:53, Corey Minyard <miny...@acm.org> wrote:
> >>>>>>>>>> 
> >>>>>>>>>> On Wed, Mar 01, 2023 at 06:00:07PM +0100, Christian Theune wrote:
> >>>>>>>>>>> I’m going to actually attach a serial console to watch the “echo 
> >>>>>>>>>>> c” panic, maybe that gives _some_ indication.
> >>>>>>>>>>> 
> >>>>>>>>>>> Otherwise: I can quickly run patches on the kernel there to try 
> >>>>>>>>>>> out things. (And the funding offer still stands.)
> >>>>>>>>>> 
> >>>>>>>>>> Any news on this?  I'm curious what this could be.
> >>>>>>>>>> 
> >>>>>>>>>> -corey
> >>>>>>>>>> 
> >>>>>>>>>>> 
> >>>>>>>>>>> Christian
> >>>>>>>>>>> 
> >>>>>>>>>>>> On 1. Mar 2023, at 17:58, Corey Minyard <miny...@acm.org> wrote:
> >>>>>>>>>>>> 
> >>>>>>>>>>>> On Tue, Feb 28, 2023 at 06:36:17PM +0100, Christian Theune wrote:
> >>>>>>>>>>>>> Thanks, both machines report:
> >>>>>>>>>>>>> 
> >>>>>>>>>>>>> # cat /sys/module/ipmi_msghandler/parameters/panic_op
> >>>>>>>>>>>>> string
> >>>>>>>>>>>> 
> >>>>>>>>>>>> At this point, I have no idea.  I'd have to start adding printks 
> >>>>>>>>>>>> into
> >>>>>>>>>>>> the code and cause crashes to see what is happing.
> >>>>>>>>>>>> 
> >>>>>>>>>>>> Maybe something is getting in the way of the panic notifiers and 
> >>>>>>>>>>>> doing
> >>>>>>>>>>>> something to prevent the IPMI driver from working.
> >>>>>>>>>>>> 
> >>>>>>>>>>>> -corey
> >>>>>>>>>>>> 
> >>>>>>>>>>>>> 
> >>>>>>>>>>>>> 
> >>>>>>>>>>>>>> On 28. Feb 2023, at 18:04, Corey Minyard <miny...@acm.org> 
> >>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>> 
> >>>>>>>>>>>>>> Oh, I forgot.  You can look at panic_op in 
> >>>>>>>>>>>>>> /sys/module/ipmi_msghandler/parameters/panic_op
> >>>>>>>>>>>>>> 
> >>>>>>>>>>>>>> -corey
> >>>>>>>>>>>>>> 
> >>>>>>>>>>>>>> On Tue, Feb 28, 2023 at 05:48:07PM +0100, Christian Theune via 
> >>>>>>>>>>>>>> Openipmi-developer wrote:
> >>>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>> On 28. Feb 2023, at 17:36, Corey Minyard <miny...@acm.org> 
> >>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>> On Tue, Feb 28, 2023 at 02:53:12PM +0100, Christian Theune 
> >>>>>>>>>>>>>>>> via Openipmi-developer wrote:
> >>>>>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>>> I’ve been trying to debug the PANIC and OEM string handling 
> >>>>>>>>>>>>>>>>> and am running out of ideas whether this is a bug or 
> >>>>>>>>>>>>>>>>> whether something so subtle has changed in my config that 
> >>>>>>>>>>>>>>>>> I’m just not seeing it.
> >>>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>>> (Note: I’m willing to pay for consulting.)
> >>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>> Probably not necessary.
> >>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>> Thanks! The offer always stands. If we should ever meet I’m 
> >>>>>>>>>>>>>>> also able to pay in beverages. ;)
> >>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>>> I have machines that we’ve moved from an older setup 
> >>>>>>>>>>>>>>>>> (Gentoo, (mostly) vanilla kernel 4.19.157) to a newer setup 
> >>>>>>>>>>>>>>>>> (NixOS, (mostly) vanilla kernel 5.10.159) and I’m now 
> >>>>>>>>>>>>>>>>> experiencing crashes that seem to be kernel panics but do 
> >>>>>>>>>>>>>>>>> not get the usual messages in the IPMI SEL.
> >>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>> I just tested on stock 5.10.159 and it worked without issue. 
> >>>>>>>>>>>>>>>>  Everything
> >>>>>>>>>>>>>>>> you have below looks ok.
> >>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>> Can you test by causing a crash with:
> >>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>> echo c >/proc/sysrq-trigger
> >>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>> and see if it works?
> >>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>> Yeah, already tried that and unfortunately that _doesn’t_ 
> >>>>>>>>>>>>>>> work.
> >>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>> It sounds like you are having some type of crash that you 
> >>>>>>>>>>>>>>>> would normally
> >>>>>>>>>>>>>>>> use the IPMI logs to debug.  However, they aren't perfect, 
> >>>>>>>>>>>>>>>> the system
> >>>>>>>>>>>>>>>> has to stay up long enough to get them into the event log.
> >>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>> I think they are staying up long enough because a panic 
> >>>>>>>>>>>>>>> triggers the 255 second bump in the watchdog and only then 
> >>>>>>>>>>>>>>> pass on. However, i’ve also noticed that the kernel _should_ 
> >>>>>>>>>>>>>>> be rebooting after a panic much faster (and not rely on the 
> >>>>>>>>>>>>>>> watchdog) and that doesn’t happen either. (Sorry this just 
> >>>>>>>>>>>>>>> popped from the back of my head).
> >>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>> In this situation, getting a serial console (probably 
> >>>>>>>>>>>>>>>> through IPMI
> >>>>>>>>>>>>>>>> Serial over LAN) and getting the console output on a crash 
> >>>>>>>>>>>>>>>> is probably
> >>>>>>>>>>>>>>>> your best option.  You can use ipmitool for this, or I have 
> >>>>>>>>>>>>>>>> a library
> >>>>>>>>>>>>>>>> that is able to make connections to serial ports, including 
> >>>>>>>>>>>>>>>> through IPMI
> >>>>>>>>>>>>>>>> SoL.
> >>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>> Yup. Been there, too. :)
> >>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>> Unfortunately we’re currently chasing something that pops up 
> >>>>>>>>>>>>>>> very randomly on somewhat odd machines and I also have the 
> >>>>>>>>>>>>>>> feeling that it’s systematically broken right now (as the 
> >>>>>>>>>>>>>>> “echo c” doesn’t work).
> >>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>> Thanks a lot,
> >>>>>>>>>>>>>>> Christian
> >>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>> -- 
> >>>>>>>>>>>>>>> Christian Theune · c...@flyingcircus.io · +49 345 219401 0
> >>>>>>>>>>>>>>> Flying Circus Internet Operations GmbH · 
> >>>>>>>>>>>>>>> https://flyingcircus.io
> >>>>>>>>>>>>>>> Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
> >>>>>>>>>>>>>>> HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, 
> >>>>>>>>>>>>>>> Christian Zagrodnick
> >>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>> _______________________________________________
> >>>>>>>>>>>>>>> Openipmi-developer mailing list
> >>>>>>>>>>>>>>> Openipmi-developer@lists.sourceforge.net
> >>>>>>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/openipmi-developer
> >>>>>>>>>>>>> 
> >>>>>>>>>>>>> Liebe Grüße,
> >>>>>>>>>>>>> Christian Theune
> >>>>>>>>>>>>> 
> >>>>>>>>>>>>> -- 
> >>>>>>>>>>>>> Christian Theune · c...@flyingcircus.io · +49 345 219401 0
> >>>>>>>>>>>>> Flying Circus Internet Operations GmbH · https://flyingcircus.io
> >>>>>>>>>>>>> Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
> >>>>>>>>>>>>> HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, 
> >>>>>>>>>>>>> Christian Zagrodnick
> >>>>>>>>>>>>> 
> >>>>>>>>>>> 
> >>>>>>>>>>> Liebe Grüße,
> >>>>>>>>>>> Christian Theune
> >>>>>>>>>>> 
> >>>>>>>>>>> -- 
> >>>>>>>>>>> Christian Theune · c...@flyingcircus.io · +49 345 219401 0
> >>>>>>>>>>> Flying Circus Internet Operations GmbH · https://flyingcircus.io
> >>>>>>>>>>> Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
> >>>>>>>>>>> HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, 
> >>>>>>>>>>> Christian Zagrodnick
> >>>>>>>>>>> 
> >>>>>>>>> 
> >>>>>>>>> Liebe Grüße,
> >>>>>>>>> Christian Theune
> >>>>>>>>> 
> >>>>>>>>> -- 
> >>>>>>>>> Christian Theune · c...@flyingcircus.io · +49 345 219401 0
> >>>>>>>>> Flying Circus Internet Operations GmbH · https://flyingcircus.io
> >>>>>>>>> Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
> >>>>>>>>> HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian 
> >>>>>>>>> Zagrodnick
> >>>>>>> 
> >>>>>>> 
> >>>>>>> Liebe Grüße,
> >>>>>>> Christian Theune
> >>>>>>> 
> >>>>>>> -- 
> >>>>>>> Christian Theune · c...@flyingcircus.io · +49 345 219401 0
> >>>>>>> Flying Circus Internet Operations GmbH · https://flyingcircus.io
> >>>>>>> Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
> >>>>>>> HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian 
> >>>>>>> Zagrodnick
> >>>>>>> 
> >>>> 
> >>>> Liebe Grüße,
> >>>> Christian Theune
> >>>> 
> >>>> -- 
> >>>> Christian Theune · c...@flyingcircus.io · +49 345 219401 0
> >>>> Flying Circus Internet Operations GmbH · https://flyingcircus.io
> >>>> Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
> >>>> HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian 
> >>>> Zagrodnick
> >>>> 
> >> 
> >> Liebe Grüße,
> >> Christian Theune
> >> 
> >> -- 
> >> Christian Theune · c...@flyingcircus.io · +49 345 219401 0
> >> Flying Circus Internet Operations GmbH · https://flyingcircus.io
> >> Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
> >> HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian 
> >> Zagrodnick
> >> 
> >> 
> >> 
> >> _______________________________________________
> >> Openipmi-developer mailing list
> >> Openipmi-developer@lists.sourceforge.net
> >> https://lists.sourceforge.net/lists/listinfo/openipmi-developer
> 
> Liebe Grüße,
> Christian Theune
> 
> -- 
> Christian Theune · c...@flyingcircus.io · +49 345 219401 0
> Flying Circus Internet Operations GmbH · https://flyingcircus.io
> Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
> HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick
> 
> 
> 
> _______________________________________________
> Openipmi-developer mailing list
> Openipmi-developer@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/openipmi-developer
>From db05ddf7f321634c5659a0cf7ea56594e22365f7 Mon Sep 17 00:00:00 2001
From: Corey Minyard <cminy...@mvista.com>
Date: Mon, 20 Sep 2021 06:25:37 -0500
Subject: [PATCH] ipmi:watchdog: Set panic count to proper value on a panic

You will get two decrements when the messages on a panic are sent, not
one, since commit 2033f6858970 ("ipmi: Free receive messages when in an
oops") was added, but the watchdog code had a bug where it didn't set
the value properly.

Reported-by: Anton Lundin <gla...@acc.umu.se>
Cc: <sta...@vger.kernel.org> # v5.4+
Fixes: 2033f6858970 ("ipmi: Free receive messages when in an oops")
Signed-off-by: Corey Minyard <cminy...@mvista.com>
---
 drivers/char/ipmi/ipmi_watchdog.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/char/ipmi/ipmi_watchdog.c b/drivers/char/ipmi/ipmi_watchdog.c
index e4ff3b50de7f..f855a9665c28 100644
--- a/drivers/char/ipmi/ipmi_watchdog.c
+++ b/drivers/char/ipmi/ipmi_watchdog.c
@@ -497,7 +497,7 @@ static void panic_halt_ipmi_heartbeat(void)
 	msg.cmd = IPMI_WDOG_RESET_TIMER;
 	msg.data = NULL;
 	msg.data_len = 0;
-	atomic_inc(&panic_done_count);
+	atomic_add(2, &panic_done_count);
 	rv = ipmi_request_supply_msgs(watchdog_user,
 				      (struct ipmi_addr *) &addr,
 				      0,
@@ -507,7 +507,7 @@ static void panic_halt_ipmi_heartbeat(void)
 				      &panic_halt_heartbeat_recv_msg,
 				      1);
 	if (rv)
-		atomic_dec(&panic_done_count);
+		atomic_sub(2, &panic_done_count);
 }
 
 static struct ipmi_smi_msg panic_halt_smi_msg = {
@@ -531,12 +531,12 @@ static void panic_halt_ipmi_set_timeout(void)
 	/* Wait for the messages to be free. */
 	while (atomic_read(&panic_done_count) != 0)
 		ipmi_poll_interface(watchdog_user);
-	atomic_inc(&panic_done_count);
+	atomic_add(2, &panic_done_count);
 	rv = __ipmi_set_timeout(&panic_halt_smi_msg,
 				&panic_halt_recv_msg,
 				&send_heartbeat_now);
 	if (rv) {
-		atomic_dec(&panic_done_count);
+		atomic_sub(2, &panic_done_count);
 		pr_warn("Unable to extend the watchdog timeout\n");
 	} else {
 		if (send_heartbeat_now)
-- 
2.34.1

_______________________________________________
Openipmi-developer mailing list
Openipmi-developer@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openipmi-developer

Reply via email to