On Thu, Feb 24, 2022 at 4:22 AM Ulrich Windl
<ulrich.wi...@rz.uni-regensburg.de> wrote:
>
> Hi!
>
> After reading about fence_kdump and fence_kdump_send I wonder:
> Does anybody use that in production?

Quite a lot of people, in fact.

> Having the networking and bonding in initrd does not sound like a good idea 
> to me.
> Wouldn't it be easier to integrate that functionality into sbd?
> I mean: Let sbd wait for a "kdump-ed" message that initrd could send when 
> kdump is complete.
> Basically that would be the same mechanism, but using storage instead of 
> networking.
>
> If I get it right, the original fence_kdump would also introduce an extra 
> fencing delay, and I wonder what happens with a hardware watchdog while a 
> kdump is in progress...
>
> The background of all this is that our nodes kernel-panic, and support says 
> the kdumps are all incomplete.
> The events are most likely:
> node1: panics (kdump)
> other_node: seens node1 had failed and fences it (via sbd).
>
> However sbd fencing wont work while kdump is executing (IMHO)
>
> So what happens most likely is that the watchdog terminates the kdump.
> In that case all the mess with fence_kdump won't help, right?

You can configure extra_modules in your /etc/kdump.conf file to
include the watchdog module, and then restart kdump.service. For
example:

# grep ^extra_modules /etc/kdump.conf
extra_modules i6300esb

If you're not sure of the name of your watchdog module, wdctl can help
you find it. sbd needs to be stopped first, because it keeps the
watchdog device timer busy.

# pcs cluster stop --all
# wdctl | grep Identity
Identity:      i6300ESB timer [version 0]
# lsmod | grep -i i6300ESB
i6300esb               13566  0


If you're also using fence_sbd (poison-pill fencing via block device),
then you should be able to protect yourself from that during a dump by
configuring fencing levels so that fence_kdump is level 1 and
fence_sbd is level 2.


>
> Regards,
> Ulrich
>
>
>
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>


-- 
Regards,

Reid Wahl (He/Him), RHCA
Senior Software Maintenance Engineer, Red Hat
CEE - Platform Support Delivery - ClusterHA

_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Reply via email to