[ClusterLabs] Pacemaker 2.1.4-rc1 now available

2022-06-03 Thread Ken Gaillot
Hi all,

The first (and likely only) release candidate for Pacemaker 2.1.4 is
now available at:

https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-2.1.4-rc1

This is a bug fix release due to a couple of regressions being found in
2.1.3. Since there are very few changes compared to 2.1.3, the final
release will probably be made in the next week or two.

For more details, please see the above link.

Everyone is encouraged to download, compile and test the new release.
We do many regression tests and simulations, but we can't cover all
possible use cases, so your feedback is important and appreciated.

Many thanks to all contributors of source code to this release,
including Chris Lumens, Ken Gaillot, Petr Pavlu, and Reid Wahl.
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] normal reboot with active sbd does not work

2022-06-03 Thread Andrei Borzenkov
On 03.06.2022 16:51, Zoran Bošnjak wrote:
> Thanks for all your answers. Sorry, my mistake. The ipmi_watchdog is indeed 
> OK. I was first experimenting with "softdog", which is blacklisted. So the 
> reasonable question is how to properly start "softdog" on ubuntu.
> 

blacklist prevents autoloading of modules by alias during hardware
detection. Neither softdog or ipmi_watchdog have any alias so they
cannot be autoloaded and blacklist is irrelevant here.

> The reason to unload watchdog module (ipmi or softdog) is that there seems to 
> be a difference between normal reboot and watchdog reboot.
> In case of ipmi watchdog timer reboot:
> - the system hangs at the end of reboot cycle for some time
> - restart seems to be harder (like power off/on cycle), BIOS runs more 
> diagnostics at startup
> - it turns on HW diagnostic indication on the server front panel (dell 
> server) which stays on forever
> - it logs the event to IDRAC, which is unnecessary, because it was not a 
> hardware event, but just a normal reboot
> 
> In case of "sudo reboot" command, I would like to skip this... so the idea is 
> to fully stop the watchdog just before reboot. I am not sure how to do this 
> properly.
> 
> The "softdog" is better in this respect. It does not trigger nothing from the 
> list above, but I still get the message during reboot
> [ ... ] watchdog: watchdog0: watchdog did not stop!
> ... with some small timeout.
> 

The first obvious question - is there only one watchdog? Some watchdog
drivers *are* autoloaded.

Is there only one user of watchdog? systemd may use it too as example.

> So after some additional testing, the situation is the following:
> 
> - without any watchdog and without sbd package, the server reboots normally
> - with "softdog" module loaded, I only get "watchdog did not stop message" at 
> reboot
> - with "softdog" loaded, but unloaded with "ExecStop=...rmmod", reboot is 
> normal again
> - same as above, but with "sbd" package loaded, I am getting "watchdog did 
> not stop message" again
> - switching from "softdog" to "ipmi_watchdog" gets me to the original list of 
> problems
> 
> It looks like the "sbd" is preventing the watchdog to close, so that watchdog 
> triggers always, even in the case of normal reboot. What am I missing here?

While the only way I can reproduce it on my QEMU VM is "reboot -f"
(without stopping all services), there is certainly a race condition in
sbd.service.

ExecStop=@bindir@/kill -TERM $MAINPID


systemd will continue as soon as "kill" completes without waiting for
sbd to actually stop. It means systemd may complete shutdown sequence
before sbd had chance to react on signal and then simply kill it. Which
leaves watchdog armed.

For test purpose try to use script that loops until sbd is actually
stopped for ExecStop.

Note that systemd strongly recommends to use synchronous command for
ExecStop (we may argue that this should be handled by service manager
itself, but well ...).

> 
> Zoran
> 
> - Original Message -
> From: "Andrei Borzenkov" 
> To: "users" 
> Sent: Friday, June 3, 2022 11:24:03 AM
> Subject: Re: [ClusterLabs] normal reboot with active sbd does not work
> 
> On 03.06.2022 11:18, Zoran Bošnjak wrote:
>> Hi all,
>> I would appreciate an advice about sbd fencing (without shared storage).
>>
>> I am using ubuntu 20.04., with default packages from the repository 
>> (pacemaker, corosync, fence-agents, ipmitool, pcs...).
>>
>> HW watchdog is present on servers. The first problem was to load/unload the 
>> watchdog module. For some reason the module is blacklisted on ubuntu,
> 
> What makes you think so?
> 
> bor@bor-Latitude-E5450:~$ lsb_release  -d
> 
> Description:  Ubuntu 20.04.4 LTS
> 
> bor@bor-Latitude-E5450:~$ modprobe -c | grep ipmi_watchdog
> 
> bor@bor-Latitude-E5450:~$
> 
> 
> 
> 
> 
>> so I've created a service for this purpose.
>>
> 
> man modules-load.d
> 
> 
>> --- file: /etc/systemd/system/watchdog.service
>> [Unit]
>> Description=Load watchdog timer module
>> After=syslog.target
>>
> 
> Without any explicit dependencies stop will be attempted as soon as
> possible.
> 
>> [Service]
>> Type=oneshot
>> RemainAfterExit=yes
>> ExecStart=/sbin/modprobe ipmi_watchdog
>> ExecStop=/sbin/rmmod ipmi_watchdog
>>
> 
> Why on earth do you need to unload kernel driver when system reboots?
> 
>> [Install]
>> WantedBy=multi-user.target
>> ---
>>
>> Is this a proper way to load watchdog module under ubuntu?
>>
> 
> There is standard way to load non-autoloaded drivers on *any* systemd
> based distribution. Which is modules-load.d.
> 
>> Anyway, once the module is loaded, the /dev/watchdog (which is required by 
>> 'sbd') is present.
>> Next, the 'sbd' is installed by
>>
>> sudo apt install sbd
>> (followed by one reboot to get the sbd active)
>>
>> The configuration of the 'sbd' is default. The sbd reacts to network failure 
>> as expected (reboots the server). However, when the 'sbd' is active, the 
>> server won't reboot normally any more. For

[ClusterLabs] Pacemaker 2.1.3 release has regression, 2.1.4 coming soon

2022-06-03 Thread Ken Gaillot
Hi all,

The just-released Pacemaker 2.1.3 had an unfortunate combination of two
unrelated regressions, one in the target-attribute feature for fencing
devices (which allows a fence device to target nodes that have a
specified node attribute set), and one in the test suite for that
feature, which is why it wasn't caught before release.

A 2.1.4 release with the fix should be available next week.

In the meantime, 2.1.3 is perfectly fine for clusters that don't use
target-attribute.
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] normal reboot with active sbd does not work

2022-06-03 Thread Klaus Wenninger
On Fri, Jun 3, 2022 at 3:51 PM Zoran Bošnjak  wrote:
>
> Thanks for all your answers. Sorry, my mistake. The ipmi_watchdog is indeed 
> OK. I was first experimenting with "softdog", which is blacklisted. So the 
> reasonable question is how to properly start "softdog" on ubuntu.
>
> The reason to unload watchdog module (ipmi or softdog) is that there seems to 
> be a difference between normal reboot and watchdog reboot.
> In case of ipmi watchdog timer reboot:
> - the system hangs at the end of reboot cycle for some time
> - restart seems to be harder (like power off/on cycle), BIOS runs more 
> diagnostics at startup
> - it turns on HW diagnostic indication on the server front panel (dell 
> server) which stays on forever
> - it logs the event to IDRAC, which is unnecessary, because it was not a 
> hardware event, but just a normal reboot
>
> In case of "sudo reboot" command, I would like to skip this... so the idea is 
> to fully stop the watchdog just before reboot. I am not sure how to do this 
> properly.
>
> The "softdog" is better in this respect. It does not trigger nothing from the 
> list above, but I still get the message during reboot
> [ ... ] watchdog: watchdog0: watchdog did not stop!
> ... with some small timeout.
>
> So after some additional testing, the situation is the following:
>
> - without any watchdog and without sbd package, the server reboots normally
> - with "softdog" module loaded, I only get "watchdog did not stop message" at 
> reboot
> - with "softdog" loaded, but unloaded with "ExecStop=...rmmod", reboot is 
> normal again
> - same as above, but with "sbd" package loaded, I am getting "watchdog did 
> not stop message" again
> - switching from "softdog" to "ipmi_watchdog" gets me to the original list of 
> problems
>
> It looks like the "sbd" is preventing the watchdog to close, so that watchdog 
> triggers always, even in the case of normal reboot. What am I missing here?

sbd has the watchdog-device open and thus is preventing unloading the module.
Without giving any instructions in your unit-file systemd will try to
stop the unit immediately and thus fail.
Have you tried

[Unit]
Before=sbd.service

[Install]
RequiredBy=sbd.service

I would have expected that rebooting with the device disabled again
after sbd shuts down
should behave similarly as with the module being unloaded.
You could check for something like 'nowayout' with the kernel module that would
prevent disabling the watchdog once opened.

Klaus
>
> Zoran
>
> - Original Message -
> From: "Andrei Borzenkov" 
> To: "users" 
> Sent: Friday, June 3, 2022 11:24:03 AM
> Subject: Re: [ClusterLabs] normal reboot with active sbd does not work
>
> On 03.06.2022 11:18, Zoran Bošnjak wrote:
> > Hi all,
> > I would appreciate an advice about sbd fencing (without shared storage).
> >
> > I am using ubuntu 20.04., with default packages from the repository 
> > (pacemaker, corosync, fence-agents, ipmitool, pcs...).
> >
> > HW watchdog is present on servers. The first problem was to load/unload the 
> > watchdog module. For some reason the module is blacklisted on ubuntu,
>
> What makes you think so?
>
> bor@bor-Latitude-E5450:~$ lsb_release  -d
>
> Description:Ubuntu 20.04.4 LTS
>
> bor@bor-Latitude-E5450:~$ modprobe -c | grep ipmi_watchdog
>
> bor@bor-Latitude-E5450:~$
>
>
>
>
>
> > so I've created a service for this purpose.
> >
>
> man modules-load.d
>
>
> > --- file: /etc/systemd/system/watchdog.service
> > [Unit]
> > Description=Load watchdog timer module
> > After=syslog.target
> >
>
> Without any explicit dependencies stop will be attempted as soon as
> possible.
>
> > [Service]
> > Type=oneshot
> > RemainAfterExit=yes
> > ExecStart=/sbin/modprobe ipmi_watchdog
> > ExecStop=/sbin/rmmod ipmi_watchdog
> >
>
> Why on earth do you need to unload kernel driver when system reboots?
>
> > [Install]
> > WantedBy=multi-user.target
> > ---
> >
> > Is this a proper way to load watchdog module under ubuntu?
> >
>
> There is standard way to load non-autoloaded drivers on *any* systemd
> based distribution. Which is modules-load.d.
>
> > Anyway, once the module is loaded, the /dev/watchdog (which is required by 
> > 'sbd') is present.
> > Next, the 'sbd' is installed by
> >
> > sudo apt install sbd
> > (followed by one reboot to get the sbd active)
> >
> > The configuration of the 'sbd' is default. The sbd reacts to network 
> > failure as expected (reboots the server). However, when the 'sbd' is 
> > active, the server won't reboot normally any more. For example from the 
> > command line "sudo reboot", it gets stuck at the end of the reboot 
> > sequence. There is a message on the console:
> >
> > ... reboot progress
> > [ OK ] Finished Reboot.
> > [ OK ] Reached target Reboot.
> > [ ... ] IPMI Watchdog: Unexpected close, not stopping watchdog!
> > [ ... ] IPMI Watchdog: Unexpected close, not stopping watchdog!
> > ... it gets stuck at this point
> >
> > After some long timeout, it looks like the watchdog timer expires 

Re: [ClusterLabs] normal reboot with active sbd does not work

2022-06-03 Thread Zoran Bošnjak
Thanks for all your answers. Sorry, my mistake. The ipmi_watchdog is indeed OK. 
I was first experimenting with "softdog", which is blacklisted. So the 
reasonable question is how to properly start "softdog" on ubuntu.

The reason to unload watchdog module (ipmi or softdog) is that there seems to 
be a difference between normal reboot and watchdog reboot.
In case of ipmi watchdog timer reboot:
- the system hangs at the end of reboot cycle for some time
- restart seems to be harder (like power off/on cycle), BIOS runs more 
diagnostics at startup
- it turns on HW diagnostic indication on the server front panel (dell server) 
which stays on forever
- it logs the event to IDRAC, which is unnecessary, because it was not a 
hardware event, but just a normal reboot

In case of "sudo reboot" command, I would like to skip this... so the idea is 
to fully stop the watchdog just before reboot. I am not sure how to do this 
properly.

The "softdog" is better in this respect. It does not trigger nothing from the 
list above, but I still get the message during reboot
[ ... ] watchdog: watchdog0: watchdog did not stop!
... with some small timeout.

So after some additional testing, the situation is the following:

- without any watchdog and without sbd package, the server reboots normally
- with "softdog" module loaded, I only get "watchdog did not stop message" at 
reboot
- with "softdog" loaded, but unloaded with "ExecStop=...rmmod", reboot is 
normal again
- same as above, but with "sbd" package loaded, I am getting "watchdog did not 
stop message" again
- switching from "softdog" to "ipmi_watchdog" gets me to the original list of 
problems

It looks like the "sbd" is preventing the watchdog to close, so that watchdog 
triggers always, even in the case of normal reboot. What am I missing here?

Zoran

- Original Message -
From: "Andrei Borzenkov" 
To: "users" 
Sent: Friday, June 3, 2022 11:24:03 AM
Subject: Re: [ClusterLabs] normal reboot with active sbd does not work

On 03.06.2022 11:18, Zoran Bošnjak wrote:
> Hi all,
> I would appreciate an advice about sbd fencing (without shared storage).
> 
> I am using ubuntu 20.04., with default packages from the repository 
> (pacemaker, corosync, fence-agents, ipmitool, pcs...).
> 
> HW watchdog is present on servers. The first problem was to load/unload the 
> watchdog module. For some reason the module is blacklisted on ubuntu,

What makes you think so?

bor@bor-Latitude-E5450:~$ lsb_release  -d

Description:Ubuntu 20.04.4 LTS

bor@bor-Latitude-E5450:~$ modprobe -c | grep ipmi_watchdog

bor@bor-Latitude-E5450:~$





> so I've created a service for this purpose.
>

man modules-load.d


> --- file: /etc/systemd/system/watchdog.service
> [Unit]
> Description=Load watchdog timer module
> After=syslog.target
> 

Without any explicit dependencies stop will be attempted as soon as
possible.

> [Service]
> Type=oneshot
> RemainAfterExit=yes
> ExecStart=/sbin/modprobe ipmi_watchdog
> ExecStop=/sbin/rmmod ipmi_watchdog
> 

Why on earth do you need to unload kernel driver when system reboots?

> [Install]
> WantedBy=multi-user.target
> ---
> 
> Is this a proper way to load watchdog module under ubuntu?
> 

There is standard way to load non-autoloaded drivers on *any* systemd
based distribution. Which is modules-load.d.

> Anyway, once the module is loaded, the /dev/watchdog (which is required by 
> 'sbd') is present.
> Next, the 'sbd' is installed by
> 
> sudo apt install sbd
> (followed by one reboot to get the sbd active)
> 
> The configuration of the 'sbd' is default. The sbd reacts to network failure 
> as expected (reboots the server). However, when the 'sbd' is active, the 
> server won't reboot normally any more. For example from the command line 
> "sudo reboot", it gets stuck at the end of the reboot sequence. There is a 
> message on the console:
> 
> ... reboot progress
> [ OK ] Finished Reboot.
> [ OK ] Reached target Reboot.
> [ ... ] IPMI Watchdog: Unexpected close, not stopping watchdog!
> [ ... ] IPMI Watchdog: Unexpected close, not stopping watchdog!
> ... it gets stuck at this point
> 
> After some long timeout, it looks like the watchdog timer expires and server 
> boots, but the failure indication remains on the front panel of the server. 
> If I uninstall the 'sbd' package, the "sudo reboot" works normally again.
> 
> My question is: How do I configure the system, to have the 'sbd' function 
> present, but still be able to reboot the system normally.
> 

As the first step - do not unload watchdog driver on shutdown.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Antw: [EXT] normal reboot with active sbd does not work

2022-06-03 Thread Zoran Bošnjak
Yes, it's dell power edge. Would you know how to disable front panel indication 
in case of watchdog reset?

"echo V >/dev/watchdog" makes no difference.

- Original Message -
From: "Ulrich Windl" 
To: "users" 
Sent: Friday, June 3, 2022 11:00:18 AM
Subject: [ClusterLabs] Antw: [EXT] normal reboot with active sbd does not work

>>> Zoran Bošnjak  schrieb am 03.06.2022 um 10:18 in
Nachricht <2046503996.272.1654244336372.javamail.zim...@via.si>:
> Hi all,
> I would appreciate an advice about sbd fencing (without shared storage).

Not an answer, but curiosity:
As sbd needs very little space (like just 1MB), did anybody ever try to use a
small computer like a raspberry pi to privide shared storage for SBD via iSCSI
for example?
The disk could be a partition of the flash card (it's written quite rarely).

...
> After some long timeout, it looks like the watchdog timer expires and server

> boots, but the failure indication remains on the front panel of the server.


Dell PowerEdge? ;-)

In SLES I have these (among others) settings:
SBD_WATCHDOG_DEV=/dev/watchdog
SBD_WATCHDOG_TIMEOUT=30
SBD_TIMEOUT_ACTION=flush,reboot

I did:
h16:~ # echo iTCO_wdt > /etc/modules-load.d/watchdog.conf
h16:~ # systemctl restart systemd-modules-load
h16:~ # lsmod | egrep "(wd|dog)"
iTCO_wdt   16384  0
iTCO_vendor_support16384  1 iTCO_wdt

Later I changed it to:
h16:~ # echo ipmi_watchdog > /etc/modules-load.d/watchdog.conf
h16:~ # systemctl restart systemd-modules-load

After reboot there was a conflict:
Dec 04 12:07:22 h16 kernel: watchdog: wdat_wdt: cannot register miscdev on
minor=130 (err=-16).
Dec 04 12:07:22 h16 kernel: watchdog: wdat_wdt: a legacy watchdog module is
probably present.
h16:~ # lsmod | grep wd
wdat_wdt   20480  0
h16:~ # modprobe -r wdat_wdt
h16:~ # modprobe ipmi_watchdog
h16:~ # lsmod | grep wat
ipmi_watchdog  32768  1
ipmi_msghandler   114688  4 ipmi_devintf,ipmi_si,ipmi_watchdog,ipmi_ssif

h16:/etc/modprobe.d # cat 99-local.conf
#
# please add local extensions to this file
#
h16:/etc/modprobe.d # echo 'blacklist wdat_wdt' >> 99-local.conf

Maybe also check whether „echo V >/dev/watchdog“ will stop the watchdig
properly. SUSE (and upstream meanwhile Iguess) had to fix it.

Hope this helps a bit.

Regards,
Ulrich

> If I uninstall the 'sbd' package, the "sudo reboot" works normally again.
> 
> My question is: How do I configure the system, to have the 'sbd' function 
> present, but still be able to reboot the system normally.
> 
> regards,
> Zoran
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/ 



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] normal reboot with active sbd does not work

2022-06-03 Thread Andrei Borzenkov
On 03.06.2022 11:18, Zoran Bošnjak wrote:
> Hi all,
> I would appreciate an advice about sbd fencing (without shared storage).
> 
> I am using ubuntu 20.04., with default packages from the repository 
> (pacemaker, corosync, fence-agents, ipmitool, pcs...).
> 
> HW watchdog is present on servers. The first problem was to load/unload the 
> watchdog module. For some reason the module is blacklisted on ubuntu,

What makes you think so?

bor@bor-Latitude-E5450:~$ lsb_release  -d

Description:Ubuntu 20.04.4 LTS

bor@bor-Latitude-E5450:~$ modprobe -c | grep ipmi_watchdog

bor@bor-Latitude-E5450:~$





> so I've created a service for this purpose.
>

man modules-load.d


> --- file: /etc/systemd/system/watchdog.service
> [Unit]
> Description=Load watchdog timer module
> After=syslog.target
> 

Without any explicit dependencies stop will be attempted as soon as
possible.

> [Service]
> Type=oneshot
> RemainAfterExit=yes
> ExecStart=/sbin/modprobe ipmi_watchdog
> ExecStop=/sbin/rmmod ipmi_watchdog
> 

Why on earth do you need to unload kernel driver when system reboots?

> [Install]
> WantedBy=multi-user.target
> ---
> 
> Is this a proper way to load watchdog module under ubuntu?
> 

There is standard way to load non-autoloaded drivers on *any* systemd
based distribution. Which is modules-load.d.

> Anyway, once the module is loaded, the /dev/watchdog (which is required by 
> 'sbd') is present.
> Next, the 'sbd' is installed by
> 
> sudo apt install sbd
> (followed by one reboot to get the sbd active)
> 
> The configuration of the 'sbd' is default. The sbd reacts to network failure 
> as expected (reboots the server). However, when the 'sbd' is active, the 
> server won't reboot normally any more. For example from the command line 
> "sudo reboot", it gets stuck at the end of the reboot sequence. There is a 
> message on the console:
> 
> ... reboot progress
> [ OK ] Finished Reboot.
> [ OK ] Reached target Reboot.
> [ ... ] IPMI Watchdog: Unexpected close, not stopping watchdog!
> [ ... ] IPMI Watchdog: Unexpected close, not stopping watchdog!
> ... it gets stuck at this point
> 
> After some long timeout, it looks like the watchdog timer expires and server 
> boots, but the failure indication remains on the front panel of the server. 
> If I uninstall the 'sbd' package, the "sudo reboot" works normally again.
> 
> My question is: How do I configure the system, to have the 'sbd' function 
> present, but still be able to reboot the system normally.
> 

As the first step - do not unload watchdog driver on shutdown.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] Antw: [EXT] Re: normal reboot with active sbd does not work

2022-06-03 Thread Ulrich Windl
>>> Klaus Wenninger  schrieb am 03.06.2022 um 11:03 in
Nachricht
:
> On Fri, Jun 3, 2022 at 10:19 AM Zoran Bošnjak  wrote:
...
> still opened by sbd. In general I don't see why the watchdog-module should
> be unloaded upon shutdown. So as a first try you just might remove that 

Spcifically if the actual watchdog is a hardware timer that isn't stopped when
the module is unloaded.

> part.
> 
> Klaus
> 
>>
>> regards,
>> Zoran
>> ___
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users 
>>
>> ClusterLabs home: https://www.clusterlabs.org/ 
>>
> 
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/ 



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] normal reboot with active sbd does not work

2022-06-03 Thread Klaus Wenninger
On Fri, Jun 3, 2022 at 11:03 AM Klaus Wenninger  wrote:
>
> On Fri, Jun 3, 2022 at 10:19 AM Zoran Bošnjak  wrote:
> >
> > Hi all,
> > I would appreciate an advice about sbd fencing (without shared storage).
> >
> > I am using ubuntu 20.04., with default packages from the repository 
> > (pacemaker, corosync, fence-agents, ipmitool, pcs...).
> >
> > HW watchdog is present on servers. The first problem was to load/unload the 
> > watchdog module. For some reason the module is blacklisted on ubuntu, so 
> > I've created a service for this purpose.
> >
> > --- file: /etc/systemd/system/watchdog.service
> > [Unit]
> > Description=Load watchdog timer module
> > After=syslog.target
> >
> > [Service]
> > Type=oneshot
> > RemainAfterExit=yes
> > ExecStart=/sbin/modprobe ipmi_watchdog
> > ExecStop=/sbin/rmmod ipmi_watchdog
> >
> > [Install]
> > WantedBy=multi-user.target
> > ---
> >
> > Is this a proper way to load watchdog module under ubuntu?
> >
> > Anyway, once the module is loaded, the /dev/watchdog (which is required by 
> > 'sbd') is present.
> > Next, the 'sbd' is installed by
> >
> > sudo apt install sbd
> > (followed by one reboot to get the sbd active)
> >
> > The configuration of the 'sbd' is default. The sbd reacts to network 
> > failure as expected (reboots the server). However, when the 'sbd' is 
> > active, the server won't reboot normally any more. For example from the 
> > command line "sudo reboot", it gets stuck at the end of the reboot 
> > sequence. There is a message on the console:
> >
> > ... reboot progress
> > [ OK ] Finished Reboot.
> > [ OK ] Reached target Reboot.
> > [ ... ] IPMI Watchdog: Unexpected close, not stopping watchdog!
> > [ ... ] IPMI Watchdog: Unexpected close, not stopping watchdog!
> > ... it gets stuck at this point
> >
> > After some long timeout, it looks like the watchdog timer expires and 
> > server boots, but the failure indication remains on the front panel of the 
> > server. If I uninstall the 'sbd' package, the "sudo reboot" works normally 
> > again.
> >
> > My question is: How do I configure the system, to have the 'sbd' function 
> > present, but still be able to reboot the system normally.
>
> Loading modules - depending on distribution an version - should probably 
> rather
> be done editing /etc/modules or putting some files under /etc/modprobe-d/.
Of course that would require removing the driver from blacklist.
Any reason why you didn't consider that?
> Guess in your case stopping the unit won't work as the watchdog-device is
> still opened by sbd. In general I don't see why the watchdog-module should
> be unloaded upon shutdown. So as a first try you just might remove that part.
>
> Klaus
>
> >
> > regards,
> > Zoran
> > ___
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> >
> > ClusterLabs home: https://www.clusterlabs.org/
> >

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] normal reboot with active sbd does not work

2022-06-03 Thread Klaus Wenninger
On Fri, Jun 3, 2022 at 10:19 AM Zoran Bošnjak  wrote:
>
> Hi all,
> I would appreciate an advice about sbd fencing (without shared storage).
>
> I am using ubuntu 20.04., with default packages from the repository 
> (pacemaker, corosync, fence-agents, ipmitool, pcs...).
>
> HW watchdog is present on servers. The first problem was to load/unload the 
> watchdog module. For some reason the module is blacklisted on ubuntu, so I've 
> created a service for this purpose.
>
> --- file: /etc/systemd/system/watchdog.service
> [Unit]
> Description=Load watchdog timer module
> After=syslog.target
>
> [Service]
> Type=oneshot
> RemainAfterExit=yes
> ExecStart=/sbin/modprobe ipmi_watchdog
> ExecStop=/sbin/rmmod ipmi_watchdog
>
> [Install]
> WantedBy=multi-user.target
> ---
>
> Is this a proper way to load watchdog module under ubuntu?
>
> Anyway, once the module is loaded, the /dev/watchdog (which is required by 
> 'sbd') is present.
> Next, the 'sbd' is installed by
>
> sudo apt install sbd
> (followed by one reboot to get the sbd active)
>
> The configuration of the 'sbd' is default. The sbd reacts to network failure 
> as expected (reboots the server). However, when the 'sbd' is active, the 
> server won't reboot normally any more. For example from the command line 
> "sudo reboot", it gets stuck at the end of the reboot sequence. There is a 
> message on the console:
>
> ... reboot progress
> [ OK ] Finished Reboot.
> [ OK ] Reached target Reboot.
> [ ... ] IPMI Watchdog: Unexpected close, not stopping watchdog!
> [ ... ] IPMI Watchdog: Unexpected close, not stopping watchdog!
> ... it gets stuck at this point
>
> After some long timeout, it looks like the watchdog timer expires and server 
> boots, but the failure indication remains on the front panel of the server. 
> If I uninstall the 'sbd' package, the "sudo reboot" works normally again.
>
> My question is: How do I configure the system, to have the 'sbd' function 
> present, but still be able to reboot the system normally.

Loading modules - depending on distribution an version - should probably rather
be done editing /etc/modules or putting some files under /etc/modprobe-d/.
Guess in your case stopping the unit won't work as the watchdog-device is
still opened by sbd. In general I don't see why the watchdog-module should
be unloaded upon shutdown. So as a first try you just might remove that part.

Klaus

>
> regards,
> Zoran
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] Antw: [EXT] normal reboot with active sbd does not work

2022-06-03 Thread Ulrich Windl
>>> Zoran Bošnjak  schrieb am 03.06.2022 um 10:18 in
Nachricht <2046503996.272.1654244336372.javamail.zim...@via.si>:
> Hi all,
> I would appreciate an advice about sbd fencing (without shared storage).

Not an answer, but curiosity:
As sbd needs very little space (like just 1MB), did anybody ever try to use a
small computer like a raspberry pi to privide shared storage for SBD via iSCSI
for example?
The disk could be a partition of the flash card (it's written quite rarely).

...
> After some long timeout, it looks like the watchdog timer expires and server

> boots, but the failure indication remains on the front panel of the server.


Dell PowerEdge? ;-)

In SLES I have these (among others) settings:
SBD_WATCHDOG_DEV=/dev/watchdog
SBD_WATCHDOG_TIMEOUT=30
SBD_TIMEOUT_ACTION=flush,reboot

I did:
h16:~ # echo iTCO_wdt > /etc/modules-load.d/watchdog.conf
h16:~ # systemctl restart systemd-modules-load
h16:~ # lsmod | egrep "(wd|dog)"
iTCO_wdt   16384  0
iTCO_vendor_support16384  1 iTCO_wdt

Later I changed it to:
h16:~ # echo ipmi_watchdog > /etc/modules-load.d/watchdog.conf
h16:~ # systemctl restart systemd-modules-load

After reboot there was a conflict:
Dec 04 12:07:22 h16 kernel: watchdog: wdat_wdt: cannot register miscdev on
minor=130 (err=-16).
Dec 04 12:07:22 h16 kernel: watchdog: wdat_wdt: a legacy watchdog module is
probably present.
h16:~ # lsmod | grep wd
wdat_wdt   20480  0
h16:~ # modprobe -r wdat_wdt
h16:~ # modprobe ipmi_watchdog
h16:~ # lsmod | grep wat
ipmi_watchdog  32768  1
ipmi_msghandler   114688  4 ipmi_devintf,ipmi_si,ipmi_watchdog,ipmi_ssif

h16:/etc/modprobe.d # cat 99-local.conf
#
# please add local extensions to this file
#
h16:/etc/modprobe.d # echo 'blacklist wdat_wdt' >> 99-local.conf

Maybe also check whether „echo V >/dev/watchdog“ will stop the watchdig
properly. SUSE (and upstream meanwhile Iguess) had to fix it.

Hope this helps a bit.

Regards,
Ulrich

> If I uninstall the 'sbd' package, the "sudo reboot" works normally again.
> 
> My question is: How do I configure the system, to have the 'sbd' function 
> present, but still be able to reboot the system normally.
> 
> regards,
> Zoran
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/ 



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] normal reboot with active sbd does not work

2022-06-03 Thread Zoran Bošnjak
Hi all,
I would appreciate an advice about sbd fencing (without shared storage).

I am using ubuntu 20.04., with default packages from the repository (pacemaker, 
corosync, fence-agents, ipmitool, pcs...).

HW watchdog is present on servers. The first problem was to load/unload the 
watchdog module. For some reason the module is blacklisted on ubuntu, so I've 
created a service for this purpose.

--- file: /etc/systemd/system/watchdog.service
[Unit]
Description=Load watchdog timer module
After=syslog.target

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/sbin/modprobe ipmi_watchdog
ExecStop=/sbin/rmmod ipmi_watchdog

[Install]
WantedBy=multi-user.target
---

Is this a proper way to load watchdog module under ubuntu?

Anyway, once the module is loaded, the /dev/watchdog (which is required by 
'sbd') is present.
Next, the 'sbd' is installed by

sudo apt install sbd
(followed by one reboot to get the sbd active)

The configuration of the 'sbd' is default. The sbd reacts to network failure as 
expected (reboots the server). However, when the 'sbd' is active, the server 
won't reboot normally any more. For example from the command line "sudo 
reboot", it gets stuck at the end of the reboot sequence. There is a message on 
the console:

... reboot progress
[ OK ] Finished Reboot.
[ OK ] Reached target Reboot.
[ ... ] IPMI Watchdog: Unexpected close, not stopping watchdog!
[ ... ] IPMI Watchdog: Unexpected close, not stopping watchdog!
... it gets stuck at this point

After some long timeout, it looks like the watchdog timer expires and server 
boots, but the failure indication remains on the front panel of the server. If 
I uninstall the 'sbd' package, the "sudo reboot" works normally again.

My question is: How do I configure the system, to have the 'sbd' function 
present, but still be able to reboot the system normally.

regards,
Zoran
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/