Re: [ClusterLabs] [External] : Re: Fence Agent tests
On Sat, Nov 5, 2022 at 9:45 PM Jehan-Guillaume de Rorthais via Users < users@clusterlabs.org> wrote: > On Sat, 5 Nov 2022 20:53:09 +0100 > Valentin Vidić via Users wrote: > > > On Sat, Nov 05, 2022 at 06:47:59PM +, Robert Hayden wrote: > > > That was my impression as well...so I may have something wrong. My > > > expectation was that SBD daemon should be writing to the /dev/watchdog > > > within 20 seconds and the kernel watchdog would self fence. > > > > I don't see anything unusual in the config except that pacemaker mode is > > also enabled. This means that the cluster is providing signal for sbd > even > > when the storage device is down, for example: > > > > 883 ?SL 0:00 sbd: inquisitor > > 892 ?SL 0:00 \_ sbd: watcher: /dev/vdb1 - slot: 0 - uuid: > ... > > 893 ?SL 0:00 \_ sbd: watcher: Pacemaker > > 894 ?SL 0:00 \_ sbd: watcher: Cluster > > > > You can strace different sbd processes to see what they are doing at any > > point. > > I suspect both watchers should detect the loss of network/communication > with > the other node. > > BUT, when sbd is in Pacemaker mode, it doesn't reset the node if the > local **Pacemaker** is still quorate (via corosync). See the full chapter: > «If Pacemaker integration is activated, SBD will not self-fence if > **device** > majority is lost [...]» > > https://documentation.suse.com/sle-ha/15-SP4/html/SLE-HA-all/cha-ha-storage-protect.html > > Would it be possible that no node is shutting down because the cluster is > in > two-node mode? Because of this mode, both would keep the quorum expecting > the > fencing to kill the other one... Except there's no active fencing here, > only > "self-fencing". > Seems not to be the case here but for completeness: This fact should be recognized automatically by sbd (upstream since some time in 2017 iirc) and instead of checking quorum sbd would then check for presence of 2 nodes with the cpg-group. I hope corosync prevents 2-node & qdevice set at the same time. But even in that case I would rather expect unexpected self-fencing instead of the opposite. Klaus > > To verify this guess, check the corosync conf for the "two_node" parameter > and > if both nodes still report as quorate during network outage using: > > corosync-quorumtool -s > > If this turn to be a good guess, without **active** fencing, I suppose a > cluster > can not rely on the two-node mode. I'm not sure what would be the best > setup > though. > > Regards, > ___ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ > ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] [External] : Re: Fence Agent tests
On Wed, Nov 9, 2022 at 2:58 PM Robert Hayden wrote: > > > -Original Message- > > From: Users On Behalf Of Andrei > > Borzenkov > > Sent: Wednesday, November 9, 2022 2:59 AM > > To: Cluster Labs - All topics related to open-source clustering welcomed > > > > Subject: Re: [ClusterLabs] [External] : Re: Fence Agent tests > > > > On Mon, Nov 7, 2022 at 5:07 PM Robert Hayden > > wrote: > > > > > > > > > > -Original Message- > > > > From: Users On Behalf Of Valentin > > Vidic > > > > via Users > > > > Sent: Sunday, November 6, 2022 5:20 PM > > > > To: users@clusterlabs.org > > > > Cc: Valentin Vidić > > > > Subject: Re: [ClusterLabs] [External] : Re: Fence Agent tests > > > > > > > > On Sun, Nov 06, 2022 at 09:08:19PM +, Robert Hayden wrote: > > > > > When SBD_PACEMAKER was set to "yes", the lack of network > > connectivity > > > > to the node > > > > > would be seen and acted upon by the remote nodes (evicts and takes > > > > > over ownership of the resources). But the impacted node would just > > > > > sit logging IO errors. Pacemaker would keep updating the > > /dev/watchdog > > > > > device so SBD would not self evict. Once I re-enabled the > network, > > then > > > > the > > > > > > > > Interesting, not sure if this is the expected behaviour based on: > > > > > > > > > > > https://urldefense.com/v3/__https://lists.clusterlabs.org/pipermail/users/2 Which versions of pacemaker/corosync/sbd are you using? iirc a result of the discussion linked was sbd checking watchdog-timeout against sync-timeout in case of qdevice being used. default sync-timeout is 30s and your watchdog-timeout is 20s. So I would expect kind of current sbd should refuse startup. But iirc in the discussion linked the pacemaker-node finally became non-quorate. There was just a possible split-brain-gap when sync-timeout > watchdog-timeout. So if your pacemaker-instance stays quorate it has to be something else rather. > > > > > 017- > > > > > > August/022699.html__;!!ACWV5N9M2RV99hQ!IvnnhGI1HtTBGTKr4VFabWA > > > > LeMfBWNhcS0FHsPFHwwQ3Riu5R3pOYLaQPNia- > > > > GaB38wRJ7Eq4Q3GyT5C3s8y7w$ > > > > > > > > Does SBD log "Majority of devices lost - surviving on pacemaker" or > > > > some other messages related to Pacemaker? > > > > > > Yes. > > > > > > > > > > > Also what is the status of Pacemaker when the network is down? Does > it > > > > report no quorum or something else? > > > > > > > > > > Pacemaker on the failing node shows quorum even though it has lost > > > communication to the Quorum Device and to the other node in the > cluster. > > > The non-failing node of the cluster can see the Quorum Device system > and > > > thus correctly determines to fence the failing node and take over its > > > resources. > Hmm ... maybe some problem with qdevice-setup and/or quorum stategy (LMS for instance). If quorum doesn't work properly your cluster won't work properly regardless of sbd killing the node properly or not. > > > > > > Only after I run firewall-cmd --panic-off, will the failing node start > to log > > > messages about loss of TOTEM and getting a new consensus with the > > > now visible members. > > > > > > > Where exactly do you use firewalld panic mode? You have hosts, you > > have VM, you have qnode ... > > > > Have you verified that the network is blocked bidirectionally? I had > > rather mixed experience with asymmetrical firewalls which resembles > > your description. > > In my testing harness, I will send a script to the remote node which > contains the firewall-cmd --panic-on, a sleep command, and then > turn off the panic mode. That way I can adjust the length of time > network is unavailable on a single node. I used to log into a network > switch to turn ports off, but that is not possible in a Cloud environment. > I have also played with manually creating iptables rules, but the panic > mode > is simply easier and accomplishes the task. > > I have verified that when panic mode is on, no inbound or outbound > network traffic is allowed. This includes iSCSI packets as well. You > better > have access to the console or the ability to reset the system. > > > > > > Also it may depend on the corosync driver in use. >
Re: [ClusterLabs] [External] : Re: Fence Agent tests
> -Original Message- > From: Users On Behalf Of Andrei > Borzenkov > Sent: Wednesday, November 9, 2022 2:59 AM > To: Cluster Labs - All topics related to open-source clustering welcomed > > Subject: Re: [ClusterLabs] [External] : Re: Fence Agent tests > > On Mon, Nov 7, 2022 at 5:07 PM Robert Hayden > wrote: > > > > > > > -Original Message- > > > From: Users On Behalf Of Valentin > Vidic > > > via Users > > > Sent: Sunday, November 6, 2022 5:20 PM > > > To: users@clusterlabs.org > > > Cc: Valentin Vidić > > > Subject: Re: [ClusterLabs] [External] : Re: Fence Agent tests > > > > > > On Sun, Nov 06, 2022 at 09:08:19PM +, Robert Hayden wrote: > > > > When SBD_PACEMAKER was set to "yes", the lack of network > connectivity > > > to the node > > > > would be seen and acted upon by the remote nodes (evicts and takes > > > > over ownership of the resources). But the impacted node would just > > > > sit logging IO errors. Pacemaker would keep updating the > /dev/watchdog > > > > device so SBD would not self evict. Once I re-enabled the network, > then > > > the > > > > > > Interesting, not sure if this is the expected behaviour based on: > > > > > > > https://urldefense.com/v3/__https://lists.clusterlabs.org/pipermail/users/2 > > > 017- > > > > August/022699.html__;!!ACWV5N9M2RV99hQ!IvnnhGI1HtTBGTKr4VFabWA > > > LeMfBWNhcS0FHsPFHwwQ3Riu5R3pOYLaQPNia- > > > GaB38wRJ7Eq4Q3GyT5C3s8y7w$ > > > > > > Does SBD log "Majority of devices lost - surviving on pacemaker" or > > > some other messages related to Pacemaker? > > > > Yes. > > > > > > > > Also what is the status of Pacemaker when the network is down? Does it > > > report no quorum or something else? > > > > > > > Pacemaker on the failing node shows quorum even though it has lost > > communication to the Quorum Device and to the other node in the cluster. > > The non-failing node of the cluster can see the Quorum Device system and > > thus correctly determines to fence the failing node and take over its > > resources. > > > > Only after I run firewall-cmd --panic-off, will the failing node start to > > log > > messages about loss of TOTEM and getting a new consensus with the > > now visible members. > > > > Where exactly do you use firewalld panic mode? You have hosts, you > have VM, you have qnode ... > > Have you verified that the network is blocked bidirectionally? I had > rather mixed experience with asymmetrical firewalls which resembles > your description. In my testing harness, I will send a script to the remote node which contains the firewall-cmd --panic-on, a sleep command, and then turn off the panic mode. That way I can adjust the length of time network is unavailable on a single node. I used to log into a network switch to turn ports off, but that is not possible in a Cloud environment. I have also played with manually creating iptables rules, but the panic mode is simply easier and accomplishes the task. I have verified that when panic mode is on, no inbound or outbound network traffic is allowed. This includes iSCSI packets as well. You better have access to the console or the ability to reset the system. > > Also it may depend on the corosync driver in use. > > > I think all of that explains the lack of self-fencing when the sbd setting > > of > > SBD_PACEMAKER=yes is used. > > > > Correct. This means that at least under some conditions > pacemaker/corosync fail to detect isolation. > ___ > Manage your subscription: > https://urldefense.com/v3/__https://lists.clusterlabs.org/mailman/listinfo/u > sers__;!!ACWV5N9M2RV99hQ!IMFB2Teli90q80SZ0fS4861iqEF- > yFGiPUvE81iTEJM4MHWMqoPOAxaJL5Fwmyr8py4S4QRvU4INEiY6YXvIH5c$ > > ClusterLabs home: > https://urldefense.com/v3/__https://www.clusterlabs.org/__;!!ACWV5N9 > M2RV99hQ!IMFB2Teli90q80SZ0fS4861iqEF- > yFGiPUvE81iTEJM4MHWMqoPOAxaJL5Fwmyr8py4S4QRvU4INEiY6sVTZv74$ ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] [External] : Re: Fence Agent tests
On Mon, Nov 7, 2022 at 5:07 PM Robert Hayden wrote: > > > > -Original Message- > > From: Users On Behalf Of Valentin Vidic > > via Users > > Sent: Sunday, November 6, 2022 5:20 PM > > To: users@clusterlabs.org > > Cc: Valentin Vidić > > Subject: Re: [ClusterLabs] [External] : Re: Fence Agent tests > > > > On Sun, Nov 06, 2022 at 09:08:19PM +, Robert Hayden wrote: > > > When SBD_PACEMAKER was set to "yes", the lack of network connectivity > > to the node > > > would be seen and acted upon by the remote nodes (evicts and takes > > > over ownership of the resources). But the impacted node would just > > > sit logging IO errors. Pacemaker would keep updating the /dev/watchdog > > > device so SBD would not self evict. Once I re-enabled the network, then > > the > > > > Interesting, not sure if this is the expected behaviour based on: > > > > https://urldefense.com/v3/__https://lists.clusterlabs.org/pipermail/users/2 > > 017- > > August/022699.html__;!!ACWV5N9M2RV99hQ!IvnnhGI1HtTBGTKr4VFabWA > > LeMfBWNhcS0FHsPFHwwQ3Riu5R3pOYLaQPNia- > > GaB38wRJ7Eq4Q3GyT5C3s8y7w$ > > > > Does SBD log "Majority of devices lost - surviving on pacemaker" or > > some other messages related to Pacemaker? > > Yes. > > > > > Also what is the status of Pacemaker when the network is down? Does it > > report no quorum or something else? > > > > Pacemaker on the failing node shows quorum even though it has lost > communication to the Quorum Device and to the other node in the cluster. > The non-failing node of the cluster can see the Quorum Device system and > thus correctly determines to fence the failing node and take over its > resources. > > Only after I run firewall-cmd --panic-off, will the failing node start to log > messages about loss of TOTEM and getting a new consensus with the > now visible members. > Where exactly do you use firewalld panic mode? You have hosts, you have VM, you have qnode ... Have you verified that the network is blocked bidirectionally? I had rather mixed experience with asymmetrical firewalls which resembles your description. Also it may depend on the corosync driver in use. > I think all of that explains the lack of self-fencing when the sbd setting of > SBD_PACEMAKER=yes is used. > Correct. This means that at least under some conditions pacemaker/corosync fail to detect isolation. ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] [External] : Re: Fence Agent tests
On Mon, 7 Nov 2022 14:06:51 + Robert Hayden wrote: > > -Original Message- > > From: Users On Behalf Of Valentin Vidic > > via Users > > Sent: Sunday, November 6, 2022 5:20 PM > > To: users@clusterlabs.org > > Cc: Valentin Vidić > > Subject: Re: [ClusterLabs] [External] : Re: Fence Agent tests > > > > On Sun, Nov 06, 2022 at 09:08:19PM +, Robert Hayden wrote: > > > When SBD_PACEMAKER was set to "yes", the lack of network connectivity > > to the node > > > would be seen and acted upon by the remote nodes (evicts and takes > > > over ownership of the resources). But the impacted node would just > > > sit logging IO errors. Pacemaker would keep updating the /dev/watchdog > > > device so SBD would not self evict. Once I re-enabled the network, then > > > > > the > > > > Interesting, not sure if this is the expected behaviour based on: > > > > https://urldefense.com/v3/__https://lists.clusterlabs.org/pipermail/users/2 > > 017- > > August/022699.html__;!!ACWV5N9M2RV99hQ!IvnnhGI1HtTBGTKr4VFabWA > > LeMfBWNhcS0FHsPFHwwQ3Riu5R3pOYLaQPNia- > > GaB38wRJ7Eq4Q3GyT5C3s8y7w$ > > > > Does SBD log "Majority of devices lost - surviving on pacemaker" or > > some other messages related to Pacemaker? > > Yes. > > > > > Also what is the status of Pacemaker when the network is down? Does it > > report no quorum or something else? > > > > Pacemaker on the failing node shows quorum even though it has lost > communication to the Quorum Device and to the other node in the cluster. This is the main issue. Maybe inspecting the corosync-cmapctl output could shed some lights on some setup we are missing? > The non-failing node of the cluster can see the Quorum Device system and > thus correctly determines to fence the failing node and take over its > resources. Normal. > Only after I run firewall-cmd --panic-off, will the failing node start to log > messages about loss of TOTEM and getting a new consensus with the > now visible members. > > I think all of that explains the lack of self-fencing when the sbd setting of > SBD_PACEMAKER=yes is used. I'm not sure. If I understand correctly, SBD_PACEMAKER=yes only instruct sbd to keep an eye on the pacemaker+corosync processes (as described up thread). It doesn't explain why Pacemaker keeps holding the quorum, but I might miss something... ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] [External] : Re: Fence Agent tests
> -Original Message- > From: Users On Behalf Of Valentin Vidic > via Users > Sent: Sunday, November 6, 2022 5:20 PM > To: users@clusterlabs.org > Cc: Valentin Vidić > Subject: Re: [ClusterLabs] [External] : Re: Fence Agent tests > > On Sun, Nov 06, 2022 at 09:08:19PM +, Robert Hayden wrote: > > When SBD_PACEMAKER was set to "yes", the lack of network connectivity > to the node > > would be seen and acted upon by the remote nodes (evicts and takes > > over ownership of the resources). But the impacted node would just > > sit logging IO errors. Pacemaker would keep updating the /dev/watchdog > > device so SBD would not self evict. Once I re-enabled the network, then > the > > Interesting, not sure if this is the expected behaviour based on: > > https://urldefense.com/v3/__https://lists.clusterlabs.org/pipermail/users/2 > 017- > August/022699.html__;!!ACWV5N9M2RV99hQ!IvnnhGI1HtTBGTKr4VFabWA > LeMfBWNhcS0FHsPFHwwQ3Riu5R3pOYLaQPNia- > GaB38wRJ7Eq4Q3GyT5C3s8y7w$ > > Does SBD log "Majority of devices lost - surviving on pacemaker" or > some other messages related to Pacemaker? Yes. > > Also what is the status of Pacemaker when the network is down? Does it > report no quorum or something else? > Pacemaker on the failing node shows quorum even though it has lost communication to the Quorum Device and to the other node in the cluster. The non-failing node of the cluster can see the Quorum Device system and thus correctly determines to fence the failing node and take over its resources. Only after I run firewall-cmd --panic-off, will the failing node start to log messages about loss of TOTEM and getting a new consensus with the now visible members. I think all of that explains the lack of self-fencing when the sbd setting of SBD_PACEMAKER=yes is used. > -- > Valentin > ___ > Manage your subscription: > https://urldefense.com/v3/__https://lists.clusterlabs.org/mailman/listinfo/u > sers__;!!ACWV5N9M2RV99hQ!IvnnhGI1HtTBGTKr4VFabWALeMfBWNhcS0F > HsPFHwwQ3Riu5R3pOYLaQPNia-GaB38wRJ7Eq4Q3GyT4d-yBlAA$ > > ClusterLabs home: > https://urldefense.com/v3/__https://www.clusterlabs.org/__;!!ACWV5N9 > M2RV99hQ!IvnnhGI1HtTBGTKr4VFabWALeMfBWNhcS0FHsPFHwwQ3Riu5R3 > pOYLaQPNia-GaB38wRJ7Eq4Q3GyT6dCiE15w$ ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] [External] : Re: Fence Agent tests
On Sun, Nov 06, 2022 at 09:08:19PM +, Robert Hayden wrote: > When SBD_PACEMAKER was set to "yes", the lack of network connectivity to the > node > would be seen and acted upon by the remote nodes (evicts and takes > over ownership of the resources). But the impacted node would just > sit logging IO errors. Pacemaker would keep updating the /dev/watchdog > device so SBD would not self evict. Once I re-enabled the network, then the Interesting, not sure if this is the expected behaviour based on: https://lists.clusterlabs.org/pipermail/users/2017-August/022699.html Does SBD log "Majority of devices lost - surviving on pacemaker" or some other messages related to Pacemaker? Also what is the status of Pacemaker when the network is down? Does it report no quorum or something else? -- Valentin ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] [External] : Re: Fence Agent tests
> -Original Message- > From: Jehan-Guillaume de Rorthais > Sent: Saturday, November 5, 2022 4:18 PM > To: Robert Hayden > Cc: users@clusterlabs.org > Subject: Re: [ClusterLabs] [External] : Re: Fence Agent tests > > On Sat, 5 Nov 2022 20:54:55 + > Robert Hayden wrote: > > > > -Original Message- > > > From: Jehan-Guillaume de Rorthais > > > Sent: Saturday, November 5, 2022 3:45 PM > > > To: users@clusterlabs.org > > > Cc: Robert Hayden > > > Subject: Re: [ClusterLabs] [External] : Re: Fence Agent tests > > > > > > On Sat, 5 Nov 2022 20:53:09 +0100 > > > Valentin Vidić via Users wrote: > > > > > > > On Sat, Nov 05, 2022 at 06:47:59PM +, Robert Hayden wrote: > > > > > That was my impression as well...so I may have something wrong. My > > > > > expectation was that SBD daemon should be writing to the > > > /dev/watchdog > > > > > within 20 seconds and the kernel watchdog would self fence. > > > > > > > > I don't see anything unusual in the config except that pacemaker mode > is > > > > also enabled. This means that the cluster is providing signal for sbd > > > > even > > > > when the storage device is down, for example: > > > > > > > > 883 ?SL 0:00 sbd: inquisitor > > > > 892 ?SL 0:00 \_ sbd: watcher: /dev/vdb1 - slot: 0 - uuid: > > > > ... > > > > 893 ?SL 0:00 \_ sbd: watcher: Pacemaker > > > > 894 ?SL 0:00 \_ sbd: watcher: Cluster > > > > > > > > You can strace different sbd processes to see what they are doing at > any > > > > point. > > > > > > I suspect both watchers should detect the loss of > network/communication > > > with > > > the other node. > > > > > > BUT, when sbd is in Pacemaker mode, it doesn't reset the node if the > > > local **Pacemaker** is still quorate (via corosync). See the full chapter: > > > «If Pacemaker integration is activated, SBD will not self-fence if > > > **device** majority is lost [...]» > > > https://urldefense.com/v3/__https://documentation.suse.com/sle- > ha/15- > > > SP4/html/SLE-HA-all/cha-ha-storage- > > > > protect.html__;!!ACWV5N9M2RV99hQ!LXxpjg0QHdAP0tvr809WCErcpPH0lx > > > MKesDNqK-PU_Xpvb_KIGlj3uJcVLIbzQLViOi3EiSV3bkPUCHr$ > > > > > > Would it be possible that no node is shutting down because the cluster is > in > > > two-node mode? Because of this mode, both would keep the quorum > > > expecting the > > > fencing to kill the other one... Except there's no active fencing here, > > > only > > > "self-fencing". > > > > > > > I failed to mention I also have a Quorum Device also setup to add its vote > > to > > the quorum. So two_node is not enabled. > > oh, ok. > > > I suspect Valentin was onto to something with pacemaker keeping the > watchdog > > device updated as it thinks the cluster is ok. Need to research and test > > that theory out. I will try to carve some time out next week for that. > > AFAIK, Pacemaker strictly rely on SBD to deal with the watchdog. It doesn't > feed > it by itself. > > In Pacemaker mode, SBD is watching the two most important part of the > cluster: > Pacemaker and Corosync: > > * the "Pacemaker watcher" of SBD connects to the CIB and check it's still > updated on a regular basis and the self-node is marked online. > * the "Cluster watchers" all connect with each others using a dedicated > communication group in corosync ring(s). > > Both watchers can report a failure to SBD that would self-stop the node. > > If the network if down, I suppose the cluster watcher should complain. But I > suspect Pacemaker somehow keeps reporting as quorate, thus, forbidding > SBD to > kill the whole node... I was able to reset and re-test today. Ends up that the watchdog device was being updated by pacemaker due to the /etc/sysconfig/sbd entry: SBD_PACEMAKER=yes. When I set that to "no", then after running "firewall-cmd --panic-on" command, the sbd daemon detected the lack of activity on /dev/watchdog and self fenced the node within seconds. Exactly what I was expecting. When SBD_PACEMAKER was set to "yes", the lack of network connectivity to the node would be seen and acted upon by the remote nodes (evicts and takes over ownership of the resources). But the impacted node would just sit logging IO errors. Pacema
Re: [ClusterLabs] [External] : Re: Fence Agent tests
On Sat, 5 Nov 2022 20:54:55 + Robert Hayden wrote: > > -Original Message- > > From: Jehan-Guillaume de Rorthais > > Sent: Saturday, November 5, 2022 3:45 PM > > To: users@clusterlabs.org > > Cc: Robert Hayden > > Subject: Re: [ClusterLabs] [External] : Re: Fence Agent tests > > > > On Sat, 5 Nov 2022 20:53:09 +0100 > > Valentin Vidić via Users wrote: > > > > > On Sat, Nov 05, 2022 at 06:47:59PM +, Robert Hayden wrote: > > > > That was my impression as well...so I may have something wrong. My > > > > expectation was that SBD daemon should be writing to the > > /dev/watchdog > > > > within 20 seconds and the kernel watchdog would self fence. > > > > > > I don't see anything unusual in the config except that pacemaker mode is > > > also enabled. This means that the cluster is providing signal for sbd even > > > when the storage device is down, for example: > > > > > > 883 ?SL 0:00 sbd: inquisitor > > > 892 ?SL 0:00 \_ sbd: watcher: /dev/vdb1 - slot: 0 - uuid: ... > > > 893 ?SL 0:00 \_ sbd: watcher: Pacemaker > > > 894 ?SL 0:00 \_ sbd: watcher: Cluster > > > > > > You can strace different sbd processes to see what they are doing at any > > > point. > > > > I suspect both watchers should detect the loss of network/communication > > with > > the other node. > > > > BUT, when sbd is in Pacemaker mode, it doesn't reset the node if the > > local **Pacemaker** is still quorate (via corosync). See the full chapter: > > «If Pacemaker integration is activated, SBD will not self-fence if > > **device** majority is lost [...]» > > https://urldefense.com/v3/__https://documentation.suse.com/sle-ha/15- > > SP4/html/SLE-HA-all/cha-ha-storage- > > protect.html__;!!ACWV5N9M2RV99hQ!LXxpjg0QHdAP0tvr809WCErcpPH0lx > > MKesDNqK-PU_Xpvb_KIGlj3uJcVLIbzQLViOi3EiSV3bkPUCHr$ > > > > Would it be possible that no node is shutting down because the cluster is in > > two-node mode? Because of this mode, both would keep the quorum > > expecting the > > fencing to kill the other one... Except there's no active fencing here, only > > "self-fencing". > > > > I failed to mention I also have a Quorum Device also setup to add its vote to > the quorum. So two_node is not enabled. oh, ok. > I suspect Valentin was onto to something with pacemaker keeping the watchdog > device updated as it thinks the cluster is ok. Need to research and test > that theory out. I will try to carve some time out next week for that. AFAIK, Pacemaker strictly rely on SBD to deal with the watchdog. It doesn't feed it by itself. In Pacemaker mode, SBD is watching the two most important part of the cluster: Pacemaker and Corosync: * the "Pacemaker watcher" of SBD connects to the CIB and check it's still updated on a regular basis and the self-node is marked online. * the "Cluster watchers" all connect with each others using a dedicated communication group in corosync ring(s). Both watchers can report a failure to SBD that would self-stop the node. If the network if down, I suppose the cluster watcher should complain. But I suspect Pacemaker somehow keeps reporting as quorate, thus, forbidding SBD to kill the whole node... > Appreciate all of the feedback. I have been dealing with Cluster Suite for a > decade+ but focused on the company's setup. I still have lots to learn, > which keeps me interested. +1 Keep us informed! Regards, ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] [External] : Re: Fence Agent tests
> -Original Message- > From: Jehan-Guillaume de Rorthais > Sent: Saturday, November 5, 2022 3:45 PM > To: users@clusterlabs.org > Cc: Robert Hayden > Subject: Re: [ClusterLabs] [External] : Re: Fence Agent tests > > On Sat, 5 Nov 2022 20:53:09 +0100 > Valentin Vidić via Users wrote: > > > On Sat, Nov 05, 2022 at 06:47:59PM +, Robert Hayden wrote: > > > That was my impression as well...so I may have something wrong. My > > > expectation was that SBD daemon should be writing to the > /dev/watchdog > > > within 20 seconds and the kernel watchdog would self fence. > > > > I don't see anything unusual in the config except that pacemaker mode is > > also enabled. This means that the cluster is providing signal for sbd even > > when the storage device is down, for example: > > > > 883 ?SL 0:00 sbd: inquisitor > > 892 ?SL 0:00 \_ sbd: watcher: /dev/vdb1 - slot: 0 - uuid: ... > > 893 ?SL 0:00 \_ sbd: watcher: Pacemaker > > 894 ?SL 0:00 \_ sbd: watcher: Cluster > > > > You can strace different sbd processes to see what they are doing at any > > point. > > I suspect both watchers should detect the loss of network/communication > with > the other node. > > BUT, when sbd is in Pacemaker mode, it doesn't reset the node if the > local **Pacemaker** is still quorate (via corosync). See the full chapter: > «If Pacemaker integration is activated, SBD will not self-fence if **device** > majority is lost [...]» > https://urldefense.com/v3/__https://documentation.suse.com/sle-ha/15- > SP4/html/SLE-HA-all/cha-ha-storage- > protect.html__;!!ACWV5N9M2RV99hQ!LXxpjg0QHdAP0tvr809WCErcpPH0lx > MKesDNqK-PU_Xpvb_KIGlj3uJcVLIbzQLViOi3EiSV3bkPUCHr$ > > Would it be possible that no node is shutting down because the cluster is in > two-node mode? Because of this mode, both would keep the quorum > expecting the > fencing to kill the other one... Except there's no active fencing here, only > "self-fencing". > I failed to mention I also have a Quorum Device also setup to add its vote to the quorum. So two_node is not enabled. I suspect Valentin was onto to something with pacemaker keeping the watchdog device updated as it thinks the cluster is ok. Need to research and test that theory out. I will try to carve some time out next week for that. Appreciate all of the feedback. I have been dealing with Cluster Suite for a decade+ but focused on the company's setup. I still have lots to learn, which keeps me interested. > To verify this guess, check the corosync conf for the "two_node" parameter > and > if both nodes still report as quorate during network outage using: > > corosync-quorumtool -s > > If this turn to be a good guess, without **active** fencing, I suppose a > cluster > can not rely on the two-node mode. I'm not sure what would be the best > setup > though. > > Regards, ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] [External] : Re: Fence Agent tests
On Sat, 5 Nov 2022 20:53:09 +0100 Valentin Vidić via Users wrote: > On Sat, Nov 05, 2022 at 06:47:59PM +, Robert Hayden wrote: > > That was my impression as well...so I may have something wrong. My > > expectation was that SBD daemon should be writing to the /dev/watchdog > > within 20 seconds and the kernel watchdog would self fence. > > I don't see anything unusual in the config except that pacemaker mode is > also enabled. This means that the cluster is providing signal for sbd even > when the storage device is down, for example: > > 883 ?SL 0:00 sbd: inquisitor > 892 ?SL 0:00 \_ sbd: watcher: /dev/vdb1 - slot: 0 - uuid: ... > 893 ?SL 0:00 \_ sbd: watcher: Pacemaker > 894 ?SL 0:00 \_ sbd: watcher: Cluster > > You can strace different sbd processes to see what they are doing at any > point. I suspect both watchers should detect the loss of network/communication with the other node. BUT, when sbd is in Pacemaker mode, it doesn't reset the node if the local **Pacemaker** is still quorate (via corosync). See the full chapter: «If Pacemaker integration is activated, SBD will not self-fence if **device** majority is lost [...]» https://documentation.suse.com/sle-ha/15-SP4/html/SLE-HA-all/cha-ha-storage-protect.html Would it be possible that no node is shutting down because the cluster is in two-node mode? Because of this mode, both would keep the quorum expecting the fencing to kill the other one... Except there's no active fencing here, only "self-fencing". To verify this guess, check the corosync conf for the "two_node" parameter and if both nodes still report as quorate during network outage using: corosync-quorumtool -s If this turn to be a good guess, without **active** fencing, I suppose a cluster can not rely on the two-node mode. I'm not sure what would be the best setup though. Regards, ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] [External] : Re: Fence Agent tests
On Sat, Nov 05, 2022 at 06:47:59PM +, Robert Hayden wrote: > That was my impression as well...so I may have something wrong. My > expectation was that SBD daemon > should be writing to the /dev/watchdog within 20 seconds and the kernel > watchdog would self fence. I don't see anything unusual in the config except that pacemaker mode is also enabled. This means that the cluster is providing signal for sbd even when the storage device is down, for example: 883 ?SL 0:00 sbd: inquisitor 892 ?SL 0:00 \_ sbd: watcher: /dev/vdb1 - slot: 0 - uuid: 18b958fa-fdae-455a-aa9d-a204a6eed04b 893 ?SL 0:00 \_ sbd: watcher: Pacemaker 894 ?SL 0:00 \_ sbd: watcher: Cluster You can strace different sbd processes to see what they are doing at any point. Easy way to test if watchdog is working is to pause all sbd processes, for example: # pkill -STOP sbd For me this causes a node reset after 5 seconds as defined by: SBD_WATCHDOG_TIMEOUT=5 -- Valentin ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] [External] : Re: Fence Agent tests
> -Original Message- > From: Users On Behalf Of Valentin Vidic > via Users > Sent: Saturday, November 5, 2022 1:07 PM > To: users@clusterlabs.org > Cc: Valentin Vidić > Subject: Re: [ClusterLabs] [External] : Re: Fence Agent tests > > On Sat, Nov 05, 2022 at 05:20:47PM +, Robert Hayden wrote: > > The OCI compute instances don't have a hardware watchdog, only the > software watchdog. > > So, when the network goes completely hung (e.g. firewall-cmd panic-on), > all network > > traffic stops which implies that IO to the SBD device also stops. I do not > > see > the software > > watchdog take any action in response to the network hang. > > It seems like the watchdog is not working or is not configured with a > correct timeout here. sbd will not refresh the watchdog if it fails to > read from the disk, so the watchdog should eventually expire and reset > the node. That was my impression as well...so I may have something wrong. My expectation was that SBD daemon should be writing to the /dev/watchdog within 20 seconds and the kernel watchdog would self fence. Here is my setup root:dh2vgmprepap02:ablgmprep:/root:# grep ^SBD /etc/sysconfig/sbd SBD_DEVICE=/dev/disk/by-id/scsi-360e59ebc0f414569bcc7a5e4a6d58ccb-part1 SBD_PACEMAKER=yes SBD_STARTMODE=always SBD_DELAY_START=no SBD_WATCHDOG_DEV=/dev/watchdog SBD_WATCHDOG_TIMEOUT=5 SBD_TIMEOUT_ACTION=flush,reboot SBD_MOVE_TO_ROOT_CGROUP=auto SBD_OPTS= root:dh2vgmprepap02:ablgmprep:/root:# sbd -d /dev/disk/by-id/scsi-360e59ebc0f414569bcc7a5e4a6d58ccb-part1 dump ==Dumping header on disk /dev/disk/by-id/scsi-360e59ebc0f414569bcc7a5e4a6d58ccb-part1 Header version : 2.1 UUID : 04096cc5-1fb8-44da-9c4f-4b6034a0fe06 Number of slots: 255 Sector size: 512 Timeout (watchdog) : 20 Timeout (allocate) : 2 Timeout (loop) : 1 Timeout (msgwait) : 40 ==Header on disk /dev/disk/by-id/scsi-360e59ebc0f414569bcc7a5e4a6d58ccb-part1 is dumped root:dh2vgmprepap02:ablgmprep:/root:# pcs stonith sbd status --full SBD STATUS : | | dh2vgmprepap03: YES | YES | YES dh2vgmprepap02: YES | YES | YES Messages list on device '/dev/disk/by-id/scsi-360e59ebc0f414569bcc7a5e4a6d58ccb-part1': 0 dh2vgmprepap03 clear 1 dh2vgmprepap02 clear SBD header on device '/dev/disk/by-id/scsi-360e59ebc0f414569bcc7a5e4a6d58ccb-part1': ==Dumping header on disk /dev/disk/by-id/scsi-360e59ebc0f414569bcc7a5e4a6d58ccb-part1 Header version : 2.1 UUID : 04096cc5-1fb8-44da-9c4f-4b6034a0fe06 Number of slots: 255 Sector size: 512 Timeout (watchdog) : 20 Timeout (allocate) : 2 Timeout (loop) : 1 Timeout (msgwait) : 40 ==Header on disk /dev/disk/by-id/scsi-360e59ebc0f414569bcc7a5e4a6d58ccb-part1 is dumped > > -- > Valentin > ___ > Manage your subscription: > https://urldefense.com/v3/__https://lists.clusterlabs.org/mailman/listinfo/u > sers__;!!ACWV5N9M2RV99hQ!LPMOKgky02sAjkujkuJM8HLR5G5hAfCaQGPF > Zymg81e8rf3Z1klCgoi4HAicoJr6wBEhEvnYaLZ6G1vRBDTKyw$ > > ClusterLabs home: > https://urldefense.com/v3/__https://www.clusterlabs.org/__;!!ACWV5N9 > M2RV99hQ!LPMOKgky02sAjkujkuJM8HLR5G5hAfCaQGPFZymg81e8rf3Z1klCg > oi4HAicoJr6wBEhEvnYaLZ6G1tNVtP0BA$ ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] [External] : Re: Fence Agent tests
On Sat, Nov 05, 2022 at 05:20:47PM +, Robert Hayden wrote: > The OCI compute instances don't have a hardware watchdog, only the software > watchdog. > So, when the network goes completely hung (e.g. firewall-cmd panic-on), all > network > traffic stops which implies that IO to the SBD device also stops. I do not > see the software > watchdog take any action in response to the network hang. It seems like the watchdog is not working or is not configured with a correct timeout here. sbd will not refresh the watchdog if it fails to read from the disk, so the watchdog should eventually expire and reset the node. -- Valentin ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] [External] : Re: Fence Agent tests
> -Original Message- > From: Users On Behalf Of Andrei > Borzenkov > Sent: Saturday, November 5, 2022 1:17 AM > To: users@clusterlabs.org > Subject: [External] : Re: [ClusterLabs] Fence Agent tests > > On 04.11.2022 23:46, Robert Hayden wrote: > > I am working on a Fencing agent for the Oracle Cloud Infrastructure (OCI) > environment to complete power fencing of compute instances. The only > fencing setups I have seen for OCI are using SBD, but that is insufficient > with > full network interruptions since OCI uses iSCSI to write/read to the SBD disk. > > > > Out of curiosity - why is it insufficient? If cluster node is completely > isolated, it should commit suicide. If host where cluster node is > running is completely isolated, then you cannot do anything with this > host anyway. Personally, this was my first attempt with SBD, so I may be missing some core protections. I am more familiar with IPMILAN power fencing. In my testing with full network hang (firewall-cmd panic-on), I was not getting the expected fencing results with SBD like I would with power fencing. Hence, my long overdue learning of python to then attempt taking a crack at writing a fencing agent. In my configuration, I am using HA-LVM (vg tags) to protect XFS file systems. When resources fail over, the file system moves to another node. The OCI compute instances don't have a hardware watchdog, only the software watchdog. So, when the network goes completely hung (e.g. firewall-cmd panic-on), all network traffic stops which implies that IO to the SBD device also stops. I do not see the software watchdog take any action in response to the network hang. The remote node will see the network issue and write out the reset message in the SBD device slot for the hung node to suicide. But the impacted node cannot read that SBD device, so it never gets the message. It just sits. Applications can still run, but they don't have access to the disks either (which is good). In the full network hang, the remote node will wait until 2x SBD msg-timeout and then assumes fencing was successful. It then will attempt move the XFS file systems over. If the network-hung node wakes up, then I now have the XFS file systems mounted on both nodes leading to corruption. This may be eliminated if I moved the HA-LVM setup from the vg_tags to system_id. With vg_tags, pacemaker adds a "pacemaker" tag to all controlled volume groups regardless of the node that has the vg activated. With system_id, the nodes uname is added to the vg metadata so each node knows who officially has the vg activated. I have not played with that scenario in OCI just yet. I am not sure if pacemaker would simply remove the other node's uname and add its own when it attempts to move the resource. It is on my list to test because we moved to uname setup with Linux 8. Again, this was my first attempt with SBD, so I may have it setup completely wrong. > > I am not familiar with OCI architecture so I may be missing something > obvious here. > > > ___ > Manage your subscription: > https://urldefense.com/v3/__https://lists.clusterlabs.org/mailman/listinfo/u > sers__;!!ACWV5N9M2RV99hQ!P-TvBc3_Pt- > EGjuAuWw7Fa8vFbMYbE3gi73KUfdyxDBCXFuCWXcbdHNm63_AkgmJ5vhcNX > mcIkMgXSBGaphrfZQ$ > > ClusterLabs home: > https://urldefense.com/v3/__https://www.clusterlabs.org/__;!!ACWV5N9 > M2RV99hQ!P-TvBc3_Pt- > EGjuAuWw7Fa8vFbMYbE3gi73KUfdyxDBCXFuCWXcbdHNm63_AkgmJ5vhcNX > mcIkMgXSBGT2ncT5M$ ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/