On Wed, Mar 15, 2023 at 01:12:05PM +0100, Christian Theune via 
Openipmi-developer wrote:
> Ah, fantastic! That explains it of course … :)
> 
> From my side I guess this works and I don’t have to retry with that, but I’d 
> be happy to just wait for 5.10.175 … or would you prefer me explicitly 
> testing your original?

We can just wait.  The problem is obvious now, and the backports are in
progress.

Thanks for helping me with this.

-corey

> 
> Christian
> 
> > On 15. Mar 2023, at 13:07, Corey Minyard <miny...@acm.org> wrote:
> > 
> > On Wed, Mar 15, 2023 at 07:32:41AM +0100, Christian Theune via 
> > Openipmi-developer wrote:
> >> Hi,
> >> 
> >> that didn’t apply on 5.10. Here’s what I’m currently trying to build after 
> >> manually inspecting the rejected patch:
> >> 
> > 
> > Well, I guess I should have sent the prerequisite patch, too.  Her it
> > is:
> > 
> > a01a89b1db ("ipmi/watchdog: replace atomic_add() and atomic_sub()")
> > 
> > Also attached.
> > 
> > -corey
> > 
> >> 
> >> 
> >>> On 14. Mar 2023, at 18:29, Corey Minyard <miny...@acm.org> wrote:
> >>> 
> >>> Well, dang, I had already fixed this a year and a half ago.  I wish I
> >>> had a better memory.
> >>> 
> >>> Anyway, the fix is commit db05ddf7f321634c5659a0cf7ea56594e22365f7
> >>> ("ipmi:watchdog: Set panic count to proper value on a panic") in
> >>> mainstream 5.16.  I'm attaching that patch.
> >>> 
> >>> -corey
> >>> 
> >>> On Tue, Mar 14, 2023 at 03:58:26PM +0100, Christian Theune via 
> >>> Openipmi-developer wrote:
> >>>> Awesome!
> >>>> 
> >>>>> On 14. Mar 2023, at 15:54, Corey Minyard <miny...@acm.org> wrote:
> >>>>> 
> >>>>> On Tue, Mar 14, 2023 at 03:22:39PM +0100, Christian Theune via 
> >>>>> Openipmi-developer wrote:
> >>>>>> Hi,
> >>>>>> 
> >>>>>> sorry, I didn’t expect you to make me a branch. I had already taken 
> >>>>>> your diff over to 5.10 as it applied cleanly … sorry for the 
> >>>>>> additional work and thanks anyways.
> >>>>> 
> >>>>> Ok, that's great.  It's something about the IPMI watchdog panic
> >>>>> routines, and I can reproduce.  I should be able to fix this pretty
> >>>>> quickly.  I'll send a patch when I get this fixed.
> >>>>> 
> >>>>> Thanks,
> >>>>> 
> >>>>> -corey
> >>>>> 
> >>>>>> 
> >>>>>> Here’s the output:
> >>>>>> 
> >>>>>> [ 6521.905890] sysrq: Trigger a crash
> >>>>>> [ 6521.909294] Kernel panic - not syncing: sysrq triggered crash
> >>>>>> [ 6521.915026] CPU: 1 PID: 43785 Comm: bash Tainted: G          I      
> >>>>>>  5.10.159 #1-NixOS
> >>>>>> [ 6521.922925] Hardware name: Dell Inc. PowerEdge R510/00HDP0, BIOS 
> >>>>>> 1.11.0 07/23/2012
> >>>>>> [ 6521.930475] Call Trace:
> >>>>>> [ 6521.932923]  dump_stack+0x6b/0x83
> >>>>>> [ 6521.936230]  panic+0x101/0x2c8
> >>>>>> [ 6521.939276]  ? printk+0x58/0x73
> >>>>>> [ 6521.942408]  sysrq_handle_crash+0x16/0x20
> >>>>>> [ 6521.946407]  __handle_sysrq.cold+0x43/0x11a
> >>>>>> [ 6521.950580]  write_sysrq_trigger+0x24/0x40
> >>>>>> [ 6521.954668]  proc_reg_write+0x51/0x90
> >>>>>> [ 6521.958322]  vfs_write+0xc3/0x280
> >>>>>> [ 6521.961627]  ksys_write+0x5f/0xe0
> >>>>>> [ 6521.964935]  do_syscall_64+0x33/0x40
> >>>>>> [ 6521.968502]  entry_SYSCALL_64_after_hwframe+0x61/0xc6
> >>>>>> [ 6521.973540] RIP: 0033:0x7f2c6b91a133
> >>>>>> [ 6521.977106] Code: 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b3 
> >>>>>> 0f 1f 80 00 00 00 00 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 01 00 00 
> >>>>>> 00 0f 05 <48> 3d 00 f0 ff ff 77 55 c3 0f 1f 40 00 41 54 49 89 d4 55 48 
> >>>>>> 89 f5
> >>>>>> [ 6521.995836] RSP: 002b:00007ffc4cf11088 EFLAGS: 00000246 ORIG_RAX: 
> >>>>>> 0000000000000001
> >>>>>> [ 6522.003387] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 
> >>>>>> 00007f2c6b91a133
> >>>>>> [ 6522.010505] RDX: 0000000000000002 RSI: 0000000001555c08 RDI: 
> >>>>>> 0000000000000001
> >>>>>> [ 6522.017623] RBP: 0000000001555c08 R08: 000000000000000a R09: 
> >>>>>> 00007f2c6b9aaf40
> >>>>>> [ 6522.024743] R10: 00000000016e4218 R11: 0000000000000246 R12: 
> >>>>>> 0000000000000002
> >>>>>> [ 6522.031864] R13: 00007f2c6b9e8520 R14: 00007f2c6b9e8720 R15: 
> >>>>>> 0000000000000002
> >>>>>> [ 6522.039085] Calling notifier panic_event+0x0/0x410 
> >>>>>> [ipmi_msghandler] (000000008eb8cb44)
> >>>>>> [ 6522.047071] IPMI message handler: IPMI: panic event handler
> >>>>>> [ 6522.052628] IPMI message handler: IPMI: handling panic event for 
> >>>>>> intf 0: 00000000443777b3 0000000067d05ff8
> >>>>>> …
> >>>>>> and then it reboots after the 255 seconds from the watchdog timer are 
> >>>>>> passed.
> >>>>>> 
> >>>>>> Christian
> >>>>>> 
> >>>>>>> On 13. Mar 2023, at 18:13, Corey Minyard <miny...@acm.org> wrote:
> >>>>>>> 
> >>>>>>> On Mon, Mar 13, 2023 at 05:42:39PM +0100, Christian Theune wrote:
> >>>>>>>> Hrghs. I’m applying your patch to 5.10 as my distro build 
> >>>>>>>> infrastructure has some patches that don’t apply to 6.2 and that I 
> >>>>>>>> don’t know how to circumvent quickly enough… :)
> >>>>>>> 
> >>>>>>> Ok, there's a
> >>>>>>> 
> >>>>>>> https://github.com/cminyard/linux-ipmi.git:debug-panic-oem-events-5.10
> >>>>>>> 
> >>>>>>> branch available for you to pull.  It's on top of latest 5.10.
> >>>>>>> 
> >>>>>>> -corey
> >>>>>>> 
> >>>>>>>> 
> >>>>>>>>> On 13. Mar 2023, at 16:59, Christian Theune <c...@flyingcircus.io> 
> >>>>>>>>> wrote:
> >>>>>>>>> 
> >>>>>>>>> I should be easily able to run 6.2, no worries.
> >>>>>>>>> 
> >>>>>>>>> 
> >>>>>>>>>> On 13. Mar 2023, at 16:33, Corey Minyard <miny...@acm.org> wrote:
> >>>>>>>>>> 
> >>>>>>>>>> On Mon, Mar 13, 2023 at 02:07:01PM +0100, Christian Theune wrote:
> >>>>>>>>>>> Hi,
> >>>>>>>>>>> 
> >>>>>>>>>>> yeah, the IPMI log is fine. This is a 10 minute interval job in 
> >>>>>>>>>>> our system that exports the log and clears it:
> >>>>>>>>>>> 
> >>>>>>>>>>> The job looks like this:
> >>>>>>>>>>> 
> >>>>>>>>>>> /nix/store/m7lb36dr93qj27r9vskmjihz8imywy86-ipmitool-1.8.18/bin/ipmitool
> >>>>>>>>>>>  sel elist
> >>>>>>>>>>> /nix/store/m7lb36dr93qj27r9vskmjihz8imywy86-ipmitool-1.8.18/bin/ipmitool
> >>>>>>>>>>>  sel clear
> >>>>>>>>>>> 
> >>>>>>>>>>> So it’s not atomic but it runs after the boot and the elist 
> >>>>>>>>>>> should output it properly … at least it did in the past. ;)
> >>>>>>>>>>> 
> >>>>>>>>>>> As I said - I’m happy to run any patches you have. If you point 
> >>>>>>>>>>> me to a git branch somewhere I can switch that system easily.
> >>>>>>>>>> 
> >>>>>>>>>> Ok, I have a branch at
> >>>>>>>>>> 
> >>>>>>>>>> https://github.com/cminyard/linux-ipmi.git:debug-panic-oem-events
> >>>>>>>>>> 
> >>>>>>>>>> that has debug tracing.  It will print the function for all panic 
> >>>>>>>>>> event
> >>>>>>>>>> handlers, their return values, and adds traces in the IPMI panic 
> >>>>>>>>>> event
> >>>>>>>>>> handlers.
> >>>>>>>>>> 
> >>>>>>>>>> It's a single patch right on top of 6.2; I'm not sure how portable 
> >>>>>>>>>> it is
> >>>>>>>>>> to other kernel versions.  I can port if you like.
> >>>>>>>>>> 
> >>>>>>>>>> Thanks,
> >>>>>>>>>> 
> >>>>>>>>>> -corey
> >>>>>>>>>> 
> >>>>>>>>>>> 
> >>>>>>>>>>> Cheers,
> >>>>>>>>>>> Christian
> >>>>>>>>>>> 
> >>>>>>>>>>>>> On 13. Mar 2023, at 13:58, Corey Minyard <miny...@acm.org> 
> >>>>>>>>>>>>> wrote:
> >>>>>>>>>>>> 
> >>>>>>>>>>>> On Mon, Mar 13, 2023 at 10:27:51AM +0100, Christian Theune wrote:
> >>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>> 
> >>>>>>>>>>>>> alright, so here’s the output from the NixOS machine:
> >>>>>>>>>>>>> 
> >>>>>>>>>>>>> root@xxx ~ # echo c >/proc/sysrq-trigger
> >>>>>>>>>>>>> client_loop: send disconnect: Broken pipe
> >>>>>>>>>>>>> …
> >>>>>>>>>>>>> 
> >>>>>>>>>>>>> root@xxx ~ # journalctl -u ipmi-log.service
> >>>>>>>>>>>>> -- Journal begins at Sun 2023-02-26 14:25:36 CET, ends at Mon 
> >>>>>>>>>>>>> 2023-03-13 10:25:27 CET. --
> >>>>>>>>>>>>> Mar 13 10:12:38 xxx ipmi-log-start[520973]: Clearing SEL.  
> >>>>>>>>>>>>> Please allow a few seconds to erase.
> >>>>>>>>>>>>> ...
> >>>>>>>>>>>>> -- Boot fdef496e784e4541abd9ae40df472a0b --
> >>>>>>>>>>>>> Mar 13 10:25:07 xxx ipmi-log-start[1973]:    1 | 03/13/2023 | 
> >>>>>>>>>>>>> 09:12:49 | Event Logging Disabled SEL | Log area reset/cleared 
> >>>>>>>>>>>>> | Asserted
> >>>>>>>>>>>>> Mar 13 10:25:07 xxx ipmi-log-start[1973]:    2 | 03/13/2023 | 
> >>>>>>>>>>>>> 09:21:06 | Watchdog2 OS Watchdog | Hard reset | Asserted
> >>>>>>>>>>>>> Mar 13 10:25:07 xxx ipmi-log-start[1977]: Clearing SEL.  Please 
> >>>>>>>>>>>>> allow a few seconds to erase.
> >>>>>>>>>>>> 
> >>>>>>>>>>>> Hmm, the SEL got cleared.  That would clear out any of the logs 
> >>>>>>>>>>>> that
> >>>>>>>>>>>> were issued before that time.  I'm not sure when the above 
> >>>>>>>>>>>> happened
> >>>>>>>>>>>> verses the crash, though.  It looks like it occurred as part of 
> >>>>>>>>>>>> the
> >>>>>>>>>>>> reboot, but I'm not sure what I'm seeing.  Maybe you have a 
> >>>>>>>>>>>> startup
> >>>>>>>>>>>> process that clears the SEL?
> >>>>>>>>>>>> 
> >>>>>>>>>>>> Assuming that's not the issue, what you have looks ok.  I'd need 
> >>>>>>>>>>>> to add
> >>>>>>>>>>>> some logs to the kernel to see if the log operation ever happens.
> >>>>>>>>>>>> 
> >>>>>>>>>>>> -corey
> >>>>>>>>>>>> 
> >>>>>>>>>>>>> 
> >>>>>>>>>>>>> The SOL log looks like this:
> >>>>>>>>>>>>> 
> >>>>>>>>>>>>> 
> >>>>>>>>>>>>> [1107585.917689] sysrq: Trigger a crash
> >>>>>>>>>>>>> [1107585.921272] Kernel panic - not syncing: sysrq triggered 
> >>>>>>>>>>>>> crash
> >>>>>>>>>>>>> [1107585.927178] CPU: 1 PID: 521033 Comm: bash Tainted: G       
> >>>>>>>>>>>>>    I 5.10.159 #1-NixOS
> >>>>>>>>>>>>> [1107585.935335] Hardware name: Dell Inc. PowerEdge 
> >>>>>>>>>>>>> R510/00HDP0, BIOS 1.11.0 07/23/2012
> >>>>>>>>>>>>> [1107585.943058] Call Trace:
> >>>>>>>>>>>>> [1107585.945680]  dump_stack+0x6b/0x83
> >>>>>>>>>>>>> [1107585.949158]  panic+0x101/0x2c8
> >>>>>>>>>>>>> [1107585.952379]  ? printk+0x58/0x73
> >>>>>>>>>>>>> [1107585.955687]  sysrq_handle_crash+0x16/0x20
> >>>>>>>>>>>>> [1107585.959859]  __handle_sysrq.cold+0x43/0x11a
> >>>>>>>>>>>>> [1107585.964203]  write_sysrq_trigger+0x24/0x40
> >>>>>>>>>>>>> [1107585.968463]  proc_reg_write+0x51/0x90
> >>>>>>>>>>>>> [1107585.972290]  vfs_write+0xc3/0x280
> >>>>>>>>>>>>> [1107585.975768]  ksys_write+0x5f/0xe0
> >>>>>>>>>>>>> [1107585.979248]  do_syscall_64+0x33/0x40
> >>>>>>>>>>>>> [1107585.982987]  entry_SYSCALL_64_after_hwframe+0x61/0xc6
> >>>>>>>>>>>>> [1107585.988199] RIP: 0033:0x7f5873932133
> >>>>>>>>>>>>> [1107585.991938] Code: 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff 
> >>>>>>>>>>>>> ff eb b3 0f 1f 80 00 00 00 00 64 8b 04 25 18 00 00 00 85 c0 75 
> >>>>>>>>>>>>> 14 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 55 c3 0f 1f 40 
> >>>>>>>>>>>>> 00 41 54 49 89 d4 55 48 89 f5
> >>>>>>>>>>>>> [1107586.010842] RSP: 002b:00007ffcc13808c8 EFLAGS: 00000246 
> >>>>>>>>>>>>> ORIG_RAX: 0000000000000001
> >>>>>>>>>>>>> [1107586.018566] RAX: ffffffffffffffda RBX: 0000000000000002 
> >>>>>>>>>>>>> RCX: 00007f5873932133
> >>>>>>>>>>>>> [1107586.025923] RDX: 0000000000000002 RSI: 00000000005c1c08 
> >>>>>>>>>>>>> RDI: 0000000000000001
> >>>>>>>>>>>>> [1107586.033213] RBP: 00000000005c1c08 R08: 000000000000000a 
> >>>>>>>>>>>>> R09: 00007f58739c2f40
> >>>>>>>>>>>>> [1107586.040504] R10: 00000000005cc348 R11: 0000000000000246 
> >>>>>>>>>>>>> R12: 0000000000000002
> >>>>>>>>>>>>> [1107586.047794] R13: 00007f5873a00520 R14: 00007f5873a00720 
> >>>>>>>>>>>>> R15: 0000000000000002
> >>>>>>>>>>>>> 
> >>>>>>>>>>>>> Nothing obvious to me here … if you have any further ideas what 
> >>>>>>>>>>>>> to test, let me know. I should be more responsive again now.
> >>>>>>>>>>>>> 
> >>>>>>>>>>>>> Thanks and kind regards,
> >>>>>>>>>>>>> Christian
> >>>>>>>>>>>>> 
> >>>>>>>>>>>>>> On 5. Mar 2023, at 23:53, Corey Minyard <miny...@acm.org> 
> >>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>> 
> >>>>>>>>>>>>>> On Wed, Mar 01, 2023 at 06:00:07PM +0100, Christian Theune 
> >>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>> I’m going to actually attach a serial console to watch the 
> >>>>>>>>>>>>>>> “echo c” panic, maybe that gives _some_ indication.
> >>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>> Otherwise: I can quickly run patches on the kernel there to 
> >>>>>>>>>>>>>>> try out things. (And the funding offer still stands.)
> >>>>>>>>>>>>>> 
> >>>>>>>>>>>>>> Any news on this?  I'm curious what this could be.
> >>>>>>>>>>>>>> 
> >>>>>>>>>>>>>> -corey
> >>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>> Christian
> >>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>> On 1. Mar 2023, at 17:58, Corey Minyard <miny...@acm.org> 
> >>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>> On Tue, Feb 28, 2023 at 06:36:17PM +0100, Christian Theune 
> >>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>> Thanks, both machines report:
> >>>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>>> # cat /sys/module/ipmi_msghandler/parameters/panic_op
> >>>>>>>>>>>>>>>>> string
> >>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>> At this point, I have no idea.  I'd have to start adding 
> >>>>>>>>>>>>>>>> printks into
> >>>>>>>>>>>>>>>> the code and cause crashes to see what is happing.
> >>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>> Maybe something is getting in the way of the panic notifiers 
> >>>>>>>>>>>>>>>> and doing
> >>>>>>>>>>>>>>>> something to prevent the IPMI driver from working.
> >>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>> -corey
> >>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>>>> On 28. Feb 2023, at 18:04, Corey Minyard <miny...@acm.org> 
> >>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>>>> Oh, I forgot.  You can look at panic_op in 
> >>>>>>>>>>>>>>>>>> /sys/module/ipmi_msghandler/parameters/panic_op
> >>>>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>>>> -corey
> >>>>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>>>> On Tue, Feb 28, 2023 at 05:48:07PM +0100, Christian Theune 
> >>>>>>>>>>>>>>>>>> via Openipmi-developer wrote:
> >>>>>>>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>>>>>> On 28. Feb 2023, at 17:36, Corey Minyard 
> >>>>>>>>>>>>>>>>>>>> <miny...@acm.org> wrote:
> >>>>>>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>>>>>> On Tue, Feb 28, 2023 at 02:53:12PM +0100, Christian 
> >>>>>>>>>>>>>>>>>>>> Theune via Openipmi-developer wrote:
> >>>>>>>>>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>>>>>>> I’ve been trying to debug the PANIC and OEM string 
> >>>>>>>>>>>>>>>>>>>>> handling and am running out of ideas whether this is a 
> >>>>>>>>>>>>>>>>>>>>> bug or whether something so subtle has changed in my 
> >>>>>>>>>>>>>>>>>>>>> config that I’m just not seeing it.
> >>>>>>>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>>>>>>> (Note: I’m willing to pay for consulting.)
> >>>>>>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>>>>>> Probably not necessary.
> >>>>>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>>>>> Thanks! The offer always stands. If we should ever meet 
> >>>>>>>>>>>>>>>>>>> I’m also able to pay in beverages. ;)
> >>>>>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>>>>>>> I have machines that we’ve moved from an older setup 
> >>>>>>>>>>>>>>>>>>>>> (Gentoo, (mostly) vanilla kernel 4.19.157) to a newer 
> >>>>>>>>>>>>>>>>>>>>> setup (NixOS, (mostly) vanilla kernel 5.10.159) and I’m 
> >>>>>>>>>>>>>>>>>>>>> now experiencing crashes that seem to be kernel panics 
> >>>>>>>>>>>>>>>>>>>>> but do not get the usual messages in the IPMI SEL.
> >>>>>>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>>>>>> I just tested on stock 5.10.159 and it worked without 
> >>>>>>>>>>>>>>>>>>>> issue.  Everything
> >>>>>>>>>>>>>>>>>>>> you have below looks ok.
> >>>>>>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>>>>>> Can you test by causing a crash with:
> >>>>>>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>>>>>> echo c >/proc/sysrq-trigger
> >>>>>>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>>>>>> and see if it works?
> >>>>>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>>>>> Yeah, already tried that and unfortunately that _doesn’t_ 
> >>>>>>>>>>>>>>>>>>> work.
> >>>>>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>>>>>> It sounds like you are having some type of crash that 
> >>>>>>>>>>>>>>>>>>>> you would normally
> >>>>>>>>>>>>>>>>>>>> use the IPMI logs to debug.  However, they aren't 
> >>>>>>>>>>>>>>>>>>>> perfect, the system
> >>>>>>>>>>>>>>>>>>>> has to stay up long enough to get them into the event 
> >>>>>>>>>>>>>>>>>>>> log.
> >>>>>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>>>>> I think they are staying up long enough because a panic 
> >>>>>>>>>>>>>>>>>>> triggers the 255 second bump in the watchdog and only 
> >>>>>>>>>>>>>>>>>>> then pass on. However, i’ve also noticed that the kernel 
> >>>>>>>>>>>>>>>>>>> _should_ be rebooting after a panic much faster (and not 
> >>>>>>>>>>>>>>>>>>> rely on the watchdog) and that doesn’t happen either. 
> >>>>>>>>>>>>>>>>>>> (Sorry this just popped from the back of my head).
> >>>>>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>>>>>> In this situation, getting a serial console (probably 
> >>>>>>>>>>>>>>>>>>>> through IPMI
> >>>>>>>>>>>>>>>>>>>> Serial over LAN) and getting the console output on a 
> >>>>>>>>>>>>>>>>>>>> crash is probably
> >>>>>>>>>>>>>>>>>>>> your best option.  You can use ipmitool for this, or I 
> >>>>>>>>>>>>>>>>>>>> have a library
> >>>>>>>>>>>>>>>>>>>> that is able to make connections to serial ports, 
> >>>>>>>>>>>>>>>>>>>> including through IPMI
> >>>>>>>>>>>>>>>>>>>> SoL.
> >>>>>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>>>>> Yup. Been there, too. :)
> >>>>>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>>>>> Unfortunately we’re currently chasing something that pops 
> >>>>>>>>>>>>>>>>>>> up very randomly on somewhat odd machines and I also have 
> >>>>>>>>>>>>>>>>>>> the feeling that it’s systematically broken right now (as 
> >>>>>>>>>>>>>>>>>>> the “echo c” doesn’t work).
> >>>>>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>>>>> Thanks a lot,
> >>>>>>>>>>>>>>>>>>> Christian
> >>>>>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>>>>> -- 
> >>>>>>>>>>>>>>>>>>> Christian Theune · c...@flyingcircus.io · +49 345 219401 0
> >>>>>>>>>>>>>>>>>>> Flying Circus Internet Operations GmbH · 
> >>>>>>>>>>>>>>>>>>> https://flyingcircus.io
> >>>>>>>>>>>>>>>>>>> Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
> >>>>>>>>>>>>>>>>>>> HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, 
> >>>>>>>>>>>>>>>>>>> Christian Zagrodnick
> >>>>>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>>>>> _______________________________________________
> >>>>>>>>>>>>>>>>>>> Openipmi-developer mailing list
> >>>>>>>>>>>>>>>>>>> Openipmi-developer@lists.sourceforge.net
> >>>>>>>>>>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/openipmi-developer
> >>>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>>> Liebe Grüße,
> >>>>>>>>>>>>>>>>> Christian Theune
> >>>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>>> -- 
> >>>>>>>>>>>>>>>>> Christian Theune · c...@flyingcircus.io · +49 345 219401 0
> >>>>>>>>>>>>>>>>> Flying Circus Internet Operations GmbH · 
> >>>>>>>>>>>>>>>>> https://flyingcircus.io
> >>>>>>>>>>>>>>>>> Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
> >>>>>>>>>>>>>>>>> HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, 
> >>>>>>>>>>>>>>>>> Christian Zagrodnick
> >>>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>> Liebe Grüße,
> >>>>>>>>>>>>>>> Christian Theune
> >>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>> -- 
> >>>>>>>>>>>>>>> Christian Theune · c...@flyingcircus.io · +49 345 219401 0
> >>>>>>>>>>>>>>> Flying Circus Internet Operations GmbH · 
> >>>>>>>>>>>>>>> https://flyingcircus.io
> >>>>>>>>>>>>>>> Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
> >>>>>>>>>>>>>>> HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, 
> >>>>>>>>>>>>>>> Christian Zagrodnick
> >>>>>>>>>>>>>>> 
> >>>>>>>>>>>>> 
> >>>>>>>>>>>>> Liebe Grüße,
> >>>>>>>>>>>>> Christian Theune
> >>>>>>>>>>>>> 
> >>>>>>>>>>>>> -- 
> >>>>>>>>>>>>> Christian Theune · c...@flyingcircus.io · +49 345 219401 0
> >>>>>>>>>>>>> Flying Circus Internet Operations GmbH · https://flyingcircus.io
> >>>>>>>>>>>>> Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
> >>>>>>>>>>>>> HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, 
> >>>>>>>>>>>>> Christian Zagrodnick
> >>>>>>>>>>> 
> >>>>>>>>>>> 
> >>>>>>>>>>> Liebe Grüße,
> >>>>>>>>>>> Christian Theune
> >>>>>>>>>>> 
> >>>>>>>>>>> -- 
> >>>>>>>>>>> Christian Theune · c...@flyingcircus.io · +49 345 219401 0
> >>>>>>>>>>> Flying Circus Internet Operations GmbH · https://flyingcircus.io
> >>>>>>>>>>> Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
> >>>>>>>>>>> HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, 
> >>>>>>>>>>> Christian Zagrodnick
> >>>>>>>>>>> 
> >>>>>>>> 
> >>>>>>>> Liebe Grüße,
> >>>>>>>> Christian Theune
> >>>>>>>> 
> >>>>>>>> -- 
> >>>>>>>> Christian Theune · c...@flyingcircus.io · +49 345 219401 0
> >>>>>>>> Flying Circus Internet Operations GmbH · https://flyingcircus.io
> >>>>>>>> Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
> >>>>>>>> HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian 
> >>>>>>>> Zagrodnick
> >>>>>>>> 
> >>>>>> 
> >>>>>> Liebe Grüße,
> >>>>>> Christian Theune
> >>>>>> 
> >>>>>> -- 
> >>>>>> Christian Theune · c...@flyingcircus.io · +49 345 219401 0
> >>>>>> Flying Circus Internet Operations GmbH · https://flyingcircus.io
> >>>>>> Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
> >>>>>> HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian 
> >>>>>> Zagrodnick
> >>>>>> 
> >>>>>> 
> >>>>>> 
> >>>>>> _______________________________________________
> >>>>>> Openipmi-developer mailing list
> >>>>>> Openipmi-developer@lists.sourceforge.net
> >>>>>> https://lists.sourceforge.net/lists/listinfo/openipmi-developer
> >>>> 
> >>>> Liebe Grüße,
> >>>> Christian Theune
> >>>> 
> >>>> -- 
> >>>> Christian Theune · c...@flyingcircus.io · +49 345 219401 0
> >>>> Flying Circus Internet Operations GmbH · https://flyingcircus.io
> >>>> Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
> >>>> HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian 
> >>>> Zagrodnick
> >>>> 
> >>>> 
> >>>> 
> >>>> _______________________________________________
> >>>> Openipmi-developer mailing list
> >>>> Openipmi-developer@lists.sourceforge.net
> >>>> https://lists.sourceforge.net/lists/listinfo/openipmi-developer
> >>> <0001-ipmi-watchdog-Set-panic-count-to-proper-value-on-a-p.patch>
> >> 
> >> Liebe Grüße,
> >> Christian Theune
> >> 
> >> -- 
> >> Christian Theune · c...@flyingcircus.io · +49 345 219401 0
> >> Flying Circus Internet Operations GmbH · https://flyingcircus.io
> >> Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
> >> HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian 
> >> Zagrodnick
> >> 
> > 
> > 
> >> _______________________________________________
> >> Openipmi-developer mailing list
> >> Openipmi-developer@lists.sourceforge.net
> >> https://lists.sourceforge.net/lists/listinfo/openipmi-developer
> > 
> > <0001-ipmi-watchdog-replace-atomic_add-and-atomic_sub.patch>
> 
> Liebe Grüße,
> Christian Theune
> 
> -- 
> Christian Theune · c...@flyingcircus.io · +49 345 219401 0
> Flying Circus Internet Operations GmbH · https://flyingcircus.io
> Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
> HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick
> 
> 
> 
> _______________________________________________
> Openipmi-developer mailing list
> Openipmi-developer@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/openipmi-developer


_______________________________________________
Openipmi-developer mailing list
Openipmi-developer@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openipmi-developer

Reply via email to