Re: [ClusterLabs] fencing configuration

2022-06-07 Thread Andrei Borzenkov
On 07.06.2022 11:50, Klaus Wenninger wrote:
>>
>> From the documentation is not clear to me whether this would be:
>> a) multiple fencing where ipmi would be first level and sbd would be a 
>> second level fencing (where sbd always succeeds)
>> b) or this is considered a single level fencing with a timeout
> 
> With b) falling back to watchdog-fencing wouldn't work properly
> although I remember
> some recent change that might make it fall back without issues.

b) works here:

Jun 07 17:35:50 ha2 pacemaker-controld[7069]:  notice: Requesting
fencing (reboot) of node qnetd

Jun 07 17:35:50 ha2 pacemaker-fenced[7065]:  notice: Client
pacemaker-controld.7069 wants to fence (reboot) qnetd using any device

Jun 07 17:35:50 ha2 pacemaker-fenced[7065]:  notice: Requesting peer
fencing (reboot) targeting qnetd

Jun 07 17:35:50 ha2 pacemaker-fenced[7065]:  notice: watchdog is not
eligible to fence (reboot) qnetd: static-list

Jun 07 17:35:50 ha2 pacemaker-schedulerd[7068]:  warning: Calculated
transition 14 (with warnings), saving inputs in
/var/lib/pacemaker/pengine/pe-warn-95.bz2

Jun 07 17:35:50 ha2 pacemaker-fenced[7065]:  notice: Requesting that ha1
perform 'reboot' action targeting qnetd

Jun 07 17:35:53 ha2 pacemaker-fenced[7065]:  notice: Requesting that ha2
perform 'reboot' action targeting qnetd

Jun 07 17:35:53 ha2 pacemaker-fenced[7065]:  notice: watchdog is not
eligible to fence (reboot) qnetd: static-list

Jun 07 17:35:55 ha2 stonith[11138]: external_reset_req: '_dummy reset'
for host qnetd failed with rc 1

Jun 07 17:35:57 ha2 stonith[11142]: external_reset_req: '_dummy reset'
for host qnetd failed with rc 1

Jun 07 17:35:57 ha2 pacemaker-fenced[7065]:  error: Operation 'reboot'
[11141] targeting qnetd using dummy_stonith returned 1

Jun 07 17:35:57 ha2 pacemaker-fenced[7065]:  warning:
dummy_stonith[11141] [ Performing: stonith -t external/_dummy -E -T
reset qnetd ]

Jun 07 17:35:57 ha2 pacemaker-fenced[7065]:  warning:
dummy_stonith[11141] [ failed: qnetd 5 ]

Jun 07 17:35:57 ha2 pacemaker-fenced[7065]:  notice: Couldn't find
anyone to fence (reboot) qnetd using any device

Jun 07 17:35:57 ha2 pacemaker-fenced[7065]:  notice: Waiting 10s for
qnetd to self-fence (reboot) for client pacemaker-controld.7069

Jun 07 17:36:07 ha2 pacemaker-fenced[7065]:  notice: Self-fencing
(reboot) by qnetd for pacemaker-controld.7069 assumed complete

Jun 07 17:36:07 ha2 pacemaker-fenced[7065]:  notice: Operation 'reboot'
targeting qnetd by ha2 for pacemaker-controld.7069@ha2: OK (complete)

Jun 07 17:36:07 ha2 pacemaker-controld[7069]:  notice: Fence operation 7
for qnetd passed

Jun 07 17:36:07 ha2 pacemaker-controld[7069]:  notice: Transition 14
(Complete=1, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-warn-95.bz2): Complete

Jun 07 17:36:07 ha2 pacemaker-controld[7069]:  notice: State transition
S_TRANSITION_ENGINE -> S_IDLE

Jun 07 17:36:07 ha2 pacemaker-controld[7069]:  notice: Peer qnetd was
terminated (reboot) by ha2 on behalf of pacemaker-controld.7069@ha2: OK



The only gotcha is this stray error after everything have already completed.


Jun 07 17:37:05 ha2 pacemaker-fenced[7065]:  notice: Peer's 'reboot'
action targeting qnetd for client pacemaker-controld.7069 timed out

Jun 07 17:37:05 ha2 pacemaker-fenced[7065]:  notice: Couldn't find
anyone to fence (reboot) qnetd using any device

Jun 07 17:37:05 ha2 pacemaker-fenced[7065]:  error:
request_peer_fencing: Triggered fatal assertion at fenced_remote.c:1799
: op->state < st_done

bor@bor-Latitude-E5450:~/src/ClusterLabs/pacemaker$

> I would try to go for a) as with a reasonably current
> pacemaker-version (iirc 2.1.0 and above)
> you should be able to make the watchdog-fencing-device visible as with
> other fencing-devices

Yep.

dummy_stonith

watchdog

2 fence devices found



> (just use fence_watchdog as the fence-agent - still implemented inside
> pacemaker
> fence-watchdog-binary actually just provides the meta-data).
> Like this you can limit watchdog-fencing to certain-nodes that do
> actually provide a proper
> hardware-watchdog and you can add it to a topology.
> 

Well, as could be seen from above even though "watchdog" is not
eligible, pacemaker is still using it. So I am not sure it will work.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] fencing configuration

2022-06-07 Thread Andrei Borzenkov
On 07.06.2022 11:26, Zoran Bošnjak wrote:
> 
> In the test scenario, the dummy resource is currently running on node1. I 
> have simulated node failure by unplugging the ipmi AND host network 
> interfaces from node1. The result was that node1 gets rebooted (by watchdog), 
> but the rest of the pacemaker cluster was unable to fence node1 (this is 
> expected, since node1's ipmi is not accessible). The problem is that dummy 
> resource remains stopped and node1 unclean. I was expecting that 
> stonith-watchdog-timeout kicks in, so that dummy resource gets restarted on 
> some other node which has quorum. 
> 

I cannot reproduce it, watchdog fencing works here as expected.

> Obviously there is something wrong with my configuration, since this seems to 
> be a reasonably simple scenario for the pacemaker. Appreciate your help.
> 

It is impossible to say anything without logs.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] Antw: [EXT] fencing configuration

2022-06-07 Thread Ulrich Windl
>>> Zoran Bošnjak  schrieb am 07.06.2022 um 10:26 in
Nachricht <1951254459.265.1654590407828.javamail.zim...@via.si>:
> Hi, I need some help with correct fencing configuration in 5‑node cluster.
> 
> The speciffic issue is that there are 3 rooms, where in addition to node 
> failure scenario, each room can fail too (for example in case of room power

> failure or room network failure).
> 
> room0: [ node0 ]
> roomA: [ node1, node2 ]
> roomB: [ node3, node4 ]

First, it's good that even after a complete room failed, you will still have a
quorum.

> 
> ‑ ipmi board is present on each node
> ‑ watchdog timer is available
> ‑ shared storage is not available

The last one sounds adventuous to me, but I'll read on...

> 
> Please advice, what would be a proper fencing configuration in this case.

sbd using shared storage ;-)

> 
> The intention is to configure ipmi fencing (using "fence_idrac" agent) plus

> watchdog timer as a fallback. In other words, I would like to tell the 
> pacemaker: "If fencing is required, try to fence via ipmi. In case of ipmi 
> fence failure, after some timeout assume watchdog has rebooted the node, so

> it is safe to proceed, as if the (self)fencing had succeeded)."

An interesting question would be how to reach any node in a room if that room
failed.
A perfect solution would be to have a shared storage in every room and
configure 3-way sbd disks.
In addition you could use three-way mirroring of your data, just to be
paranoid ;-)

> 
> From the documentation is not clear to me whether this would be:
> a) multiple fencing where ipmi would be first level and sbd would be a 
> second level fencing (where sbd always succeeds)
> b) or this is considered a single level fencing with a timeout
> 
> I have tried to followed option b) and create stonith resource for each node

> and setup the stonith‑watchdog‑timeout, like this:
> 
> ‑‑‑
> # for each node... [0..4]
> export name=...
> export ip=...
> export password=...
> sudo pcs stonith create "fence_ipmi_$name" fence_idrac \
> lanplus=1 ip="$ip" \
> username="admin"  password="$password" \
> pcmk_host_list="$name" op monitor interval=10m timeout=10s
> 
> sudo pcs property set stonith‑watchdog‑timeout=20
> 
> # start dummy resource
> sudo pcs resource create dummy ocf:heartbeat:Dummy op monitor interval=30s
> ‑‑‑
> 
> I am not sure if additional location constraints have to be specified for 
> stonith resources. For example: I have noticed that pacemaker will start a 
> stonith resource on the same node as the fencing target. Is this OK? 
> 
> Should there be any location constraints regarding fencing and rooms?
> 
> 'sbd' is running, properties are as follows:
> 
> ‑‑‑
> $ sudo pcs property show
> Cluster Properties:
>  cluster‑infrastructure: corosync
>  cluster‑name: debian
>  dc‑version: 2.0.3‑4b1f869f0f
>  have‑watchdog: true
>  last‑lrm‑refresh: 1654583431
>  stonith‑enabled: true
>  stonith‑watchdog‑timeout: 20
> ‑‑‑
> 
> Ipmi fencing (when the ipmi connection is alive) works correctly for each 
> node. The watchdog timer also seems to be working correctly. The problem is

> that dummy resource is not restarted as expected.

My favourite here is "crm_mon -1Arfj" ;-)

> 
> In the test scenario, the dummy resource is currently running on node1. I 
> have simulated node failure by unplugging the ipmi AND host network 
> interfaces from node1. The result was that node1 gets rebooted (by
watchdog), 
> but the rest of the pacemaker cluster was unable to fence node1 (this is 
> expected, since node1's ipmi is not accessible). The problem is that dummy 
> resource remains stopped and node1 unclean. I was expecting that 

"unclean" means fencing is either in progress, or did not succeed (like when
you have no fencing at all).

> stonith‑watchdog‑timeout kicks in, so that dummy resource gets restarted on

> some other node which has quorum. 

So that actually does the fencing. Logs could be interesting to read, too.

> 
> Obviously there is something wrong with my configuration, since this seems 
> to be a reasonably simple scenario for the pacemaker. Appreciate your help.

See above.

Regards,
Ulrich


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] fencing configuration

2022-06-07 Thread Klaus Wenninger
On Tue, Jun 7, 2022 at 10:27 AM Zoran Bošnjak  wrote:
>
> Hi, I need some help with correct fencing configuration in 5-node cluster.
>
> The speciffic issue is that there are 3 rooms, where in addition to node 
> failure scenario, each room can fail too (for example in case of room power 
> failure or room network failure).
>
> room0: [ node0 ]
> roomA: [ node1, node2 ]
> roomB: [ node3, node4 ]
>
> - ipmi board is present on each node
> - watchdog timer is available
> - shared storage is not available
>
> Please advice, what would be a proper fencing configuration in this case.
>
> The intention is to configure ipmi fencing (using "fence_idrac" agent) plus 
> watchdog timer as a fallback. In other words, I would like to tell the 
> pacemaker: "If fencing is required, try to fence via ipmi. In case of ipmi 
> fence failure, after some timeout assume watchdog has rebooted the node, so 
> it is safe to proceed, as if the (self)fencing had succeeded)."
>
> From the documentation is not clear to me whether this would be:
> a) multiple fencing where ipmi would be first level and sbd would be a second 
> level fencing (where sbd always succeeds)
> b) or this is considered a single level fencing with a timeout

With b) falling back to watchdog-fencing wouldn't work properly
although I remember
some recent change that might make it fall back without issues.
I would try to go for a) as with a reasonably current
pacemaker-version (iirc 2.1.0 and above)
you should be able to make the watchdog-fencing-device visible as with
other fencing-devices
(just use fence_watchdog as the fence-agent - still implemented inside
pacemaker
fence-watchdog-binary actually just provides the meta-data).
Like this you can limit watchdog-fencing to certain-nodes that do
actually provide a proper
hardware-watchdog and you can add it to a topology.

Depending on your infra-structure an alternative solution to using
watchdog-fencing
for your case (where you can't access ipmis in a room with
power-outage) might be
fabric-fencing.

Klaus
>
> I have tried to followed option b) and create stonith resource for each node 
> and setup the stonith-watchdog-timeout, like this:
>
> ---
> # for each node... [0..4]
> export name=...
> export ip=...
> export password=...
> sudo pcs stonith create "fence_ipmi_$name" fence_idrac \
> lanplus=1 ip="$ip" \
> username="admin"  password="$password" \
> pcmk_host_list="$name" op monitor interval=10m timeout=10s
>
> sudo pcs property set stonith-watchdog-timeout=20
>
> # start dummy resource
> sudo pcs resource create dummy ocf:heartbeat:Dummy op monitor interval=30s
> ---
>
> I am not sure if additional location constraints have to be specified for 
> stonith resources. For example: I have noticed that pacemaker will start a 
> stonith resource on the same node as the fencing target. Is this OK?
>
> Should there be any location constraints regarding fencing and rooms?
>
> 'sbd' is running, properties are as follows:
>
> ---
> $ sudo pcs property show
> Cluster Properties:
>  cluster-infrastructure: corosync
>  cluster-name: debian
>  dc-version: 2.0.3-4b1f869f0f
>  have-watchdog: true
>  last-lrm-refresh: 1654583431
>  stonith-enabled: true
>  stonith-watchdog-timeout: 20
> ---
>
> Ipmi fencing (when the ipmi connection is alive) works correctly for each 
> node. The watchdog timer also seems to be working correctly. The problem is 
> that dummy resource is not restarted as expected.
>
> In the test scenario, the dummy resource is currently running on node1. I 
> have simulated node failure by unplugging the ipmi AND host network 
> interfaces from node1. The result was that node1 gets rebooted (by watchdog), 
> but the rest of the pacemaker cluster was unable to fence node1 (this is 
> expected, since node1's ipmi is not accessible). The problem is that dummy 
> resource remains stopped and node1 unclean. I was expecting that 
> stonith-watchdog-timeout kicks in, so that dummy resource gets restarted on 
> some other node which has quorum.
>
> Obviously there is something wrong with my configuration, since this seems to 
> be a reasonably simple scenario for the pacemaker. Appreciate your help.
>
> regards,
> Zoran
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Antw: [EXT] Re: normal reboot with active sbd does not work

2022-06-07 Thread Klaus Wenninger
On Tue, Jun 7, 2022 at 7:53 AM Ulrich Windl
 wrote:
>
> >>> Andrei Borzenkov  schrieb am 03.06.2022 um 17:04 in
> Nachricht <99f7746a-c962-33bb-6737-f88ba0128...@gmail.com>:
> > On 03.06.2022 16:51, Zoran Bošnjak wrote:
> >> Thanks for all your answers. Sorry, my mistake. The ipmi_watchdog is indeed
>
> > OK. I was first experimenting with "softdog", which is blacklisted. So the
> > reasonable question is how to properly start "softdog" on ubuntu.
> >>
> >
> > blacklist prevents autoloading of modules by alias during hardware
> > detection. Neither softdog or ipmi_watchdog have any alias so they
> > cannot be autoloaded and blacklist is irrelevant here.
> >
> >> The reason to unload watchdog module (ipmi or softdog) is that there seems
>
> > to be a difference between normal reboot and watchdog reboot.
> >> In case of ipmi watchdog timer reboot:
> >> - the system hangs at the end of reboot cycle for some time
> >> - restart seems to be harder (like power off/on cycle), BIOS runs more
> > diagnostics at startup
>
> maybe kdump is enabled in that case?
>
> >> - it turns on HW diagnostic indication on the server front panel (dell
> > server) which stays on forever
> >> - it logs the event to IDRAC, which is unnecessary, because it was not a
> > hardware event, but just a normal reboot
>
> If the hardware watchdog times out and fires, it is consoidered to be an
> exceptional event that will be logged and reported.
>
> >>
> >> In case of "sudo reboot" command, I would like to skip this... so the idea
>
> > is to fully stop the watchdog just before reboot. I am not sure how to do
> > this properly.
> >>
> >> The "softdog" is better in this respect. It does not trigger nothing from
> > the list above, but I still get the message during reboot
> >> [ ... ] watchdog: watchdog0: watchdog did not stop!
> >> ... with some small timeout.
> >>
> >
> > The first obvious question - is there only one watchdog? Some watchdog
> > drivers *are* autoloaded.
> >
> > Is there only one user of watchdog? systemd may use it too as example.
>
> Don't mix timers with a watchdog: It makes little sense to habe multipe
> watchdogs enabled IMHO.

Yep that is an issue atm.

When you have multiple user of a hardware-watchdog like:
watchdog-daemon, sbd, corosync, systemd, ...

I'm not aware of an implementation that would provide multiple watchdog-timers
with the usual char-device-interface out of one physical.
Of course this should be relatively easy to implement - even in user-space.
On our embedded devices we usually had something like a service that
would offer multiple timers to other instances.
The implementation of that service itself was guarded by a hardware-watchdog
so that the derived timers would be as reliable as a hardware-watchdog.
Last implementation was built into watchdog-daemon and offered a dbus-interface.
What systemd has implemented is similarly interesting.
Current systemd-implementation has a suspicious loop around it that prevents
it from being fit for sbd-purposes as it doesn't guarantee a reboot within
a reasonably short time like this.
This is why I haven't yet implemented using the systemd-filedescriptor-approach
in sbd yet (as a configurable alternative to going for the device directly).
Approaching the systemd-guys and asking why it is implemented as it is has
been on my todo-list for a while now.

If you are running multiple-services on a host that don't offer something
like a common supervision main-loop it may make sense to offer a common
instance that offers something like a watchdog-service.
For a node that has all service under pacemaker-control this shouldn't be
needed as we have sbd observing pacemakerd. Pacemakerd in turn
observes the other pacemaker subdaemons (released with RHEL-8.6 and
iirc 2.1.3 upstream) guaranteeing that the monitors on the resources don't
get stuck.

Klaus
>
> >
> >> So after some additional testing, the situation is the following:
> >>
> >> - without any watchdog and without sbd package, the server reboots
> normally
> >> - with "softdog" module loaded, I only get "watchdog did not stop message"
>
> > at reboot
> >> - with "softdog" loaded, but unloaded with "ExecStop=...rmmod", reboot is
> > normal again
> >> - same as above, but with "sbd" package loaded, I am getting "watchdog did
>
> > not stop message" again
> >> - switching from "softdog" to "ipmi_watchdog" gets me to the original list
>
> > of problems
> >>
> >> It looks like the "sbd" is preventing the watchdog to close, so that
> > watchdog triggers always, even in the case of normal reboot. What am I
> > missing here?
>
> The watchdog may have a "no way out" parameter that prevents disabling it
> after enabled once.
>
> >
> > While the only way I can reproduce it on my QEMU VM is "reboot -f"
> > (without stopping all services), there is certainly a race condition in
> > sbd.service.
> >
> > ExecStop=@bindir@/kill -TERM $MAINPID
> >
> >
> > systemd will continue as soon as "kill" completes without waiting for
> > sbd 

[ClusterLabs] fencing configuration

2022-06-07 Thread Zoran Bošnjak
Hi, I need some help with correct fencing configuration in 5-node cluster.

The speciffic issue is that there are 3 rooms, where in addition to node 
failure scenario, each room can fail too (for example in case of room power 
failure or room network failure).

room0: [ node0 ]
roomA: [ node1, node2 ]
roomB: [ node3, node4 ]

- ipmi board is present on each node
- watchdog timer is available
- shared storage is not available

Please advice, what would be a proper fencing configuration in this case.

The intention is to configure ipmi fencing (using "fence_idrac" agent) plus 
watchdog timer as a fallback. In other words, I would like to tell the 
pacemaker: "If fencing is required, try to fence via ipmi. In case of ipmi 
fence failure, after some timeout assume watchdog has rebooted the node, so it 
is safe to proceed, as if the (self)fencing had succeeded)."

>From the documentation is not clear to me whether this would be:
a) multiple fencing where ipmi would be first level and sbd would be a second 
level fencing (where sbd always succeeds)
b) or this is considered a single level fencing with a timeout

I have tried to followed option b) and create stonith resource for each node 
and setup the stonith-watchdog-timeout, like this:

---
# for each node... [0..4]
export name=...
export ip=...
export password=...
sudo pcs stonith create "fence_ipmi_$name" fence_idrac \
lanplus=1 ip="$ip" \
username="admin"  password="$password" \
pcmk_host_list="$name" op monitor interval=10m timeout=10s

sudo pcs property set stonith-watchdog-timeout=20

# start dummy resource
sudo pcs resource create dummy ocf:heartbeat:Dummy op monitor interval=30s
---

I am not sure if additional location constraints have to be specified for 
stonith resources. For example: I have noticed that pacemaker will start a 
stonith resource on the same node as the fencing target. Is this OK? 

Should there be any location constraints regarding fencing and rooms?

'sbd' is running, properties are as follows:

---
$ sudo pcs property show
Cluster Properties:
 cluster-infrastructure: corosync
 cluster-name: debian
 dc-version: 2.0.3-4b1f869f0f
 have-watchdog: true
 last-lrm-refresh: 1654583431
 stonith-enabled: true
 stonith-watchdog-timeout: 20
---

Ipmi fencing (when the ipmi connection is alive) works correctly for each node. 
The watchdog timer also seems to be working correctly. The problem is that 
dummy resource is not restarted as expected.

In the test scenario, the dummy resource is currently running on node1. I have 
simulated node failure by unplugging the ipmi AND host network interfaces from 
node1. The result was that node1 gets rebooted (by watchdog), but the rest of 
the pacemaker cluster was unable to fence node1 (this is expected, since 
node1's ipmi is not accessible). The problem is that dummy resource remains 
stopped and node1 unclean. I was expecting that stonith-watchdog-timeout kicks 
in, so that dummy resource gets restarted on some other node which has quorum. 

Obviously there is something wrong with my configuration, since this seems to 
be a reasonably simple scenario for the pacemaker. Appreciate your help.

regards,
Zoran
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/