Re: [ClusterLabs] Fence node when network interface goes down
On Mon, Nov 15, 2021 at 3:32 PM S Rogers wrote: >> >> The only solution here - as long as fencing node on external >> connectivity loss is acceptable - is modifying ethmonitor RA to fail >> monitor operation in this case. > > I was hoping to find a way to achieve the desired outcome without resorting > to a custom RA, but it does appear to be the only solution. > Well, looking at it from a different angle - you could use the knet nozzle interface for replication which means your postgres connectivity is guaranteed to be the same as pacemaker/corosync connectivity. ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Fence node when network interface goes down
On 15/11/2021 12:03, Klaus Wenninger wrote: On Mon, Nov 15, 2021 at 12:19 PM Andrei Borzenkov wrote: On Mon, Nov 15, 2021 at 1:18 PM Klaus Wenninger wrote: > > > > On Mon, Nov 15, 2021 at 10:37 AM S Rogers wrote: >> >> I had thought about doing that, but the cluster is then dependent on the >> external system, and if that external system was to go down or become >> unreachable for any reason then it would falsely cause the cluster to >> failover or worse it could even take the cluster down completely, if the >> external system goes down and both nodes cannot ping it. > > You wouldn't necessarily have to ban resources from nodes that can't > reach the external network. It would be enough to make them prefer > the location that has connection. So if both lose connection one side > would still stay up. > Not to depend on something really external you might use the > router to your external network as ping target. > In case of fencing - triggered by whatever - and a potential fence-race The problem here is that nothing really triggers fencing. What happens, is Got that! Which is why I gave the hint how to prevent shutting down services with ping first. Taking care of what happens when nodes are fenced still makes sense. Imagine a fence-race where the node running services loses just to afterwards get the services moved back when it comes up again. Klaus Thanks, I wasn't aware of priority-fencing-delay. While it doesn't solve this problem, I can still use it to improve the fencing behaviour of the cluster in general. Unfortunately, in some situations this cluster will be deployed in a completely isolated network so there may not even be a router that we can use as a ping target, and we can't guarantee the presence of any other system on the network that we could reliably use as a ping target. - two postgres lose connection over external network, but cluster nodes retain connectivity over another network - postgres RA compares "latest timestamp" when selecting the best node to fail over to - primary postgres has better timestamp, so RA simply does not consider secondary as suitable for (atomatic) failover The only solution here - as long as fencing node on external connectivity loss is acceptable - is modifying ethmonitor RA to fail monitor operation in this case. I was hoping to find a way to achieve the desired outcome without resorting to a custom RA, but it does appear to be the only solution. This may not be the right audience, but does anyone know if it is a viable change to add an additional parameter to the ethmonitor RA that allows users to override the desired behaviour when the monitor operation fails? (ie, a 'monitor_force_fail' parameter that when set to true will cause the monitor operation to fail if it determines the interface is down) Being relatively new to pacemaker, I don't know whether this goes against RA conventions/practices. > you might use the rather new feature priority-fencing-delay (give the node > that is running valuable resources a benefit in the race) or go for > fence_heuristics_ping (pseudo fence-resource that together with a > fencing-topology prevents the node without access to a certain IP > from fencing the other node). > https://clusterlabs.org/pacemaker/doc/deprecated/en-US/Pacemaker/2.0/html/Pacemaker_Explained/s-cluster-options.html > https://github.com/ClusterLabs/fence-agents/blob/master/agents/heuristics_ping/fence_heuristics_ping.py > > Klaus > ___ >> >> Manage your subscription: >> https://lists.clusterlabs.org/mailman/listinfo/users >> >> ClusterLabs home: https://www.clusterlabs.org/ >> > ___ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/ ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home:https://www.clusterlabs.org/___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Fence node when network interface goes down
On Mon, Nov 15, 2021 at 12:19 PM Andrei Borzenkov wrote: > On Mon, Nov 15, 2021 at 1:18 PM Klaus Wenninger > wrote: > > > > > > > > On Mon, Nov 15, 2021 at 10:37 AM S Rogers > wrote: > >> > >> I had thought about doing that, but the cluster is then dependent on the > >> external system, and if that external system was to go down or become > >> unreachable for any reason then it would falsely cause the cluster to > >> failover or worse it could even take the cluster down completely, if the > >> external system goes down and both nodes cannot ping it. > > > > You wouldn't necessarily have to ban resources from nodes that can't > > reach the external network. It would be enough to make them prefer > > the location that has connection. So if both lose connection one side > > would still stay up. > > Not to depend on something really external you might use the > > router to your external network as ping target. > > In case of fencing - triggered by whatever - and a potential fence-race > > The problem here is that nothing really triggers fencing. What happens, is > Got that! Which is why I gave the hint how to prevent shutting down services with ping first. Taking care of what happens when nodes are fenced still makes sense. Imagine a fence-race where the node running services loses just to afterwards get the services moved back when it comes up again. Klaus > > - two postgres lose connection over external network, but cluster > nodes retain connectivity over another network > - postgres RA compares "latest timestamp" when selecting the best node > to fail over to > - primary postgres has better timestamp, so RA simply does not > consider secondary as suitable for (atomatic) failover > > The only solution here - as long as fencing node on external > connectivity loss is acceptable - is modifying ethmonitor RA to fail > monitor operation in this case. > > > you might use the rather new feature priority-fencing-delay (give the > node > > that is running valuable resources a benefit in the race) or go for > > fence_heuristics_ping (pseudo fence-resource that together with a > > fencing-topology prevents the node without access to a certain IP > > from fencing the other node). > > > https://clusterlabs.org/pacemaker/doc/deprecated/en-US/Pacemaker/2.0/html/Pacemaker_Explained/s-cluster-options.html > > > https://github.com/ClusterLabs/fence-agents/blob/master/agents/heuristics_ping/fence_heuristics_ping.py > > > > Klaus > > ___ > >> > >> Manage your subscription: > >> https://lists.clusterlabs.org/mailman/listinfo/users > >> > >> ClusterLabs home: https://www.clusterlabs.org/ > >> > > ___ > > Manage your subscription: > > https://lists.clusterlabs.org/mailman/listinfo/users > > > > ClusterLabs home: https://www.clusterlabs.org/ > ___ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ > > ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Fence node when network interface goes down
On Mon, Nov 15, 2021 at 1:18 PM Klaus Wenninger wrote: > > > > On Mon, Nov 15, 2021 at 10:37 AM S Rogers wrote: >> >> I had thought about doing that, but the cluster is then dependent on the >> external system, and if that external system was to go down or become >> unreachable for any reason then it would falsely cause the cluster to >> failover or worse it could even take the cluster down completely, if the >> external system goes down and both nodes cannot ping it. > > You wouldn't necessarily have to ban resources from nodes that can't > reach the external network. It would be enough to make them prefer > the location that has connection. So if both lose connection one side > would still stay up. > Not to depend on something really external you might use the > router to your external network as ping target. > In case of fencing - triggered by whatever - and a potential fence-race The problem here is that nothing really triggers fencing. What happens, is - two postgres lose connection over external network, but cluster nodes retain connectivity over another network - postgres RA compares "latest timestamp" when selecting the best node to fail over to - primary postgres has better timestamp, so RA simply does not consider secondary as suitable for (atomatic) failover The only solution here - as long as fencing node on external connectivity loss is acceptable - is modifying ethmonitor RA to fail monitor operation in this case. > you might use the rather new feature priority-fencing-delay (give the node > that is running valuable resources a benefit in the race) or go for > fence_heuristics_ping (pseudo fence-resource that together with a > fencing-topology prevents the node without access to a certain IP > from fencing the other node). > https://clusterlabs.org/pacemaker/doc/deprecated/en-US/Pacemaker/2.0/html/Pacemaker_Explained/s-cluster-options.html > https://github.com/ClusterLabs/fence-agents/blob/master/agents/heuristics_ping/fence_heuristics_ping.py > > Klaus > ___ >> >> Manage your subscription: >> https://lists.clusterlabs.org/mailman/listinfo/users >> >> ClusterLabs home: https://www.clusterlabs.org/ >> > ___ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Fence node when network interface goes down
On Mon, Nov 15, 2021 at 10:37 AM S Rogers wrote: > I had thought about doing that, but the cluster is then dependent on the > external system, and if that external system was to go down or become > unreachable for any reason then it would falsely cause the cluster to > failover or worse it could even take the cluster down completely, if the > external system goes down and both nodes cannot ping it. > You wouldn't necessarily have to ban resources from nodes that can't reach the external network. It would be enough to make them prefer the location that has connection. So if both lose connection one side would still stay up. Not to depend on something really external you might use the router to your external network as ping target. In case of fencing - triggered by whatever - and a potential fence-race you might use the rather new feature priority-fencing-delay (give the node that is running valuable resources a benefit in the race) or go for fence_heuristics_ping (pseudo fence-resource that together with a fencing-topology prevents the node without access to a certain IP from fencing the other node). https://clusterlabs.org/pacemaker/doc/deprecated/en-US/Pacemaker/2.0/html/Pacemaker_Explained/s-cluster-options.html https://github.com/ClusterLabs/fence-agents/blob/master/agents/heuristics_ping/fence_heuristics_ping.py Klaus ___ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ > > ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Fence node when network interface goes down
I had thought about doing that, but the cluster is then dependent on the external system, and if that external system was to go down or become unreachable for any reason then it would falsely cause the cluster to failover or worse it could even take the cluster down completely, if the external system goes down and both nodes cannot ping it. ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Fence node when network interface goes down
Have you tried with ping and a location constraint for avoiding hosts that cannot ping an extrrnal system. Best Regards,Strahil Nikolov On Mon, Nov 15, 2021 at 0:07, S Rogers wrote: Using on-fail=fence is what I initially tried, but it doesn't work unfortunately. It looks like this is because the ethmonitor monitor operation won't actually fail when it detects a downed interface. It'll only fail if it is unable to update the CIB, as per this comment: https://github.com/ClusterLabs/resource-agents/blob/4824a7a83765a0596b7d9856d00102f53c8ce123/heartbeat/ethmonitor#L518 ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/ ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Fence node when network interface goes down
Using on-fail=fence is what I initially tried, but it doesn't work unfortunately. It looks like this is because the ethmonitor monitor operation won't actually fail when it detects a downed interface. It'll only fail if it is unable to update the CIB, as per this comment: https://github.com/ClusterLabs/resource-agents/blob/4824a7a83765a0596b7d9856d00102f53c8ce123/heartbeat/ethmonitor#L518 ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Fence node when network interface goes down
The mentioned error occurs when attempting to promote the PostgreSQL resource on the standby node, after the master PostgreSQL resource is stopped. For info, here is my configuration: Corosync Nodes: node1.local node2.local Pacemaker Nodes: node1.local node2.local Resources: Clone: public_network_monitor-clone Resource: public_network_monitor (class=ocf provider=heartbeat type=ethmonitor) Attributes: interface=eth0 link_status_only=true name=ethmonitor-public Operations: monitor interval=10s timeout=60s (public_network_monitor-monitor-interval-10s) start interval=0s timeout=60s (public_network_monitor-start-interval-0s) stop interval=0s timeout=20s (public_network_monitor-stop-interval-0s) Clone: pgsqld-clone Meta Attrs: notify=true promotable=true Resource: pgsqld (class=ocf provider=heartbeat type=pgsqlms) Attributes: bindir=/usr/lib/postgresql/12/bin datadir=/var/lib/postgresql/12/main pgdata=/etc/postgresql/12/main Operations: demote interval=0s timeout=120s (pgsqld-demote-interval-0s) methods interval=0s timeout=5 (pgsqld-methods-interval-0s) monitor interval=15s role=Master timeout=10s (pgsqld-monitor-interval-15s) monitor interval=16s role=Slave timeout=10s (pgsqld-monitor-interval-16s) notify interval=0s timeout=60s (pgsqld-notify-interval-0s) promote interval=0s timeout=30s (pgsqld-promote-interval-0s) reload interval=0s timeout=20 (pgsqld-reload-interval-0s) start interval=0s timeout=60s (pgsqld-start-interval-0s) stop interval=0s timeout=60s (pgsqld-stop-interval-0s) Resource: public_virtual_ip (class=ocf provider=heartbeat type=IPaddr2) Attributes: cidr_netmask=24 ip=192.168.50.3 nic=mgnet0 Operations: monitor interval=30s (public_virtual_ip-monitor-interval-30s) start interval=0s timeout=20s (public_virtual_ip-start-interval-0s) stop interval=0s timeout=20s (public_virtual_ip-stop-interval-0s) Stonith Devices: Resource: node1_fence_agent (class=stonith type=fence_ssh) Attributes: hostname=192.168.60.1 pcmk_delay_base=15 pcmk_host_list=node1.local user=root Operations: monitor interval=60s (node1_fence_agent-monitor-interval-60s) Resource: node2_fence_agent (class=stonith type=fence_ssh) Attributes: hostname=192.168.60.2 pcmk_host_list=node2.local user=root Operations: monitor interval=60s (node2_fence_agent-monitor-interval-60s) Fencing Levels: Location Constraints: Resource: node1_fence_agent Disabled on: node1.local (score:-INFINITY) (id:location-node1_fence_agent-node1.local--INFINITY) Resource: node2_fence_agent Disabled on: node2.local (score:-INFINITY) (id:location-node2_fence_agent-node2.local--INFINITY) Resource: public_virtual_ip Constraint: location-public_virtual_ip Rule: score=INFINITY (id:location-public_virtual_ip-rule) Expression: ethmonitor-public eq 1 (id:location-public_virtual_ip-rule-expr) Ordering Constraints: promote pgsqld-clone then start public_virtual_ip (kind:Mandatory) (non-symmetrical) (id:order-pgsqld-clone-public_virtual_ip-Mandatory) demote pgsqld-clone then stop public_virtual_ip (kind:Mandatory) (non-symmetrical) (id:order-pgsqld-clone-public_virtual_ip-Mandatory-1) Colocation Constraints: public_virtual_ip with pgsqld-clone (score:INFINITY) (rsc-role:Started) (with-rsc-role:Master) (id:colocation-public_virtual_ip-pgsqld-clone-INFINITY) Ticket Constraints: This is my understanding of the sequence of events: 1. Node1 is running the PostgreSQL resource as master, Node2 is running the PostgreSQL resource as standby. Everything is working okay at this point. 2. On Node1, the public network goes down and ethmonitor changes the ethmonitor-public node attribute from 1 to 0. 3. The location-public_virtual_ip constraint (which requires the IP to run on a node with ethmonitor-public==1) kicks in, and pacemaker demotes the master PostgreSQL resource so that it can then promote it on Node2. 4. The primary PostgreSQL instance on Node2 attempts to shutdown in response to the demotion, but it can't connect to the standby so is unable to stop cleanly. The PostgreSQL resource shows as demoting for 60 seconds, as below: Clone Set: pgsqld-clone [pgsqld] (promotable) pgsqld (ocf::heartbeat:pgsqlms): Demoting node1.local Slaves: [ node2.local ] 5. After a minute, the demotion finishes and pacemaker attempts to promote the PostgreSQL resource on Node2. This action fails with the "Switchover has been canceled from pre-promote action" error, because the standby didn't receive the final WAL activity from the primary. 6. Due to the failed promotion on Node2, PAF/Pacemaker promotes the PostgreSQL resource on Node1 again. However, due to the public network interface being down, the PostgreSQL and virtual IP resources provided by the HA cluster are now completely
[ClusterLabs] Fence node when network interface goes down
Hi, I'm hoping someone will be able to point me in the right direction. I am configuring a two-node active/passive cluster that utilises the PostgreSQL PAF resource agent. Each node has two NICs, therefore the cluster is configured with two corosync links - one on each network (one network is the public network, the other is effectively private and just used for cluster communication). The cluster has a virtual IP resource, which has a colocation constraint to keep it together with the primary Postgres instance. I am trying to protect against the scenario where the public network interface on the active node goes down, in which case I want a failover to occur and the other node to take over and host the primary Postgres instance and the public virtual IP. My current approach is to use ocf:heartbeat:ethmonitor to monitor the public interface along with a location constraint to ensure that the virtual IP must be on a node where the public interface is UP. With this configuration, if I disconnect the active node from the public network, Pacemaker attempts to move the primary PostgreSQL and virtual IP to the other node. The problem is that it attempts to stop the resources gracefully, which causes the pgsql resource to error with "Switchover has been canceled from pre-promote action" (which I believe is because PostgreSQL shuts down, but can't communicate with the standby during the shutdown - a similar situation to what is described here: https://github.com/ClusterLabs/PAF/issues/149) Ideally, if the public network interface on the active node goes down I would want to take that node offline (either fence it or put it in standby mode, so that no resources can run on it), leaving just the other node in the cluster as the active node. Then the old primary can be rebuilt from the new primary in order to join the cluster again. However, I can't figure out a way to cause the active node to be fenced as a result of ocf:heartbeat:ethmonitor detecting that the interface has gone down. Does anyone have any ideas/pointers how I could achieve this, or an alternative approach? Hopefully that makes sense. Any help is appreciated! Thanks. ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Fence node when network interface goes down
On Fri, 2021-11-12 at 17:31 +, S Rogers wrote: > Hi, I'm hoping someone will be able to point me in the right > direction. > > I am configuring a two-node active/passive cluster that utilises the > PostgreSQL PAF resource agent. Each node has two NICs, therefore the > cluster is configured with two corosync links - one on each network > (one network is the public network, the other is effectively private > and just used for cluster communication). The cluster has a virtual > IP resource, which has a colocation constraint to keep it together > with the primary Postgres instance. > > I am trying to protect against the scenario where the public network > interface on the active node goes down, in which case I want a > failover to occur and the other node to take over and host the > primary Postgres instance and the public virtual IP. My current > approach is to use ocf:heartbeat:ethmonitor to monitor the public > interface along with a location constraint to ensure that the virtual > IP must be on a node where the public interface is UP. > > With this configuration, if I disconnect the active node from the > public network, Pacemaker attempts to move the primary PostgreSQL and > virtual IP to the other node. The problem is that it attempts to stop > the resources gracefully, which causes the pgsql resource to error > with "Switchover has been canceled from pre-promote action" (which I > believe is because PostgreSQL shuts down, but can't communicate with > the standby during the shutdown - a similar situation to what is > described here: https://github.com/ClusterLabs/PAF/issues/149) > > Ideally, if the public network interface on the active node goes down > I would want to take that node offline (either fence it or put it in > standby mode, so that no resources can run on it), leaving just the > other node in the cluster as the active node. Then the old primary > can be rebuilt from the new primary in order to join the cluster > again. However, I can't figure out a way to cause the active node to > be fenced as a result of ocf:heartbeat:ethmonitor detecting that the > interface has gone down. > > Does anyone have any ideas/pointers how I could achieve this, or an > alternative approach? > > Hopefully that makes sense. Any help is appreciated! > > Thanks. Failure handling is configurable via the on-fail meta-attribute. You can set on-fail=fence for the ethmonitor resource's monitor action to fence the node if the monitor fails. There's also on-fail=standby, but that will still try to stop any active resources gracefully, so it doesn't help in this case. -- Ken Gaillot ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Fence node when network interface goes down
On 12.11.2021 20:31, S Rogers wrote: > Hi, I'm hoping someone will be able to point me in the right direction. > > I am configuring a two-node active/passive cluster that utilises the > PostgreSQL PAF resource agent. Each node has two NICs, therefore the > cluster is configured with two corosync links - one on each network (one > network is the public network, the other is effectively private and just > used for cluster communication). The cluster has a virtual IP resource, > which has a colocation constraint to keep it together with the primary > Postgres instance. > > I am trying to protect against the scenario where the public network > interface on the active node goes down, in which case I want a failover to > occur and the other node to take over and host the primary Postgres > instance and the public virtual IP. My current approach is to use > ocf:heartbeat:ethmonitor to monitor the public interface along with a > location constraint to ensure that the virtual IP must be on a node where > the public interface is UP. > > With this configuration, if I disconnect the active node from the public > network, Pacemaker attempts to move the primary PostgreSQL and virtual IP > to the other node. The problem is that it attempts to stop the resources > gracefully, which causes the pgsql resource to error with "Switchover has > been canceled from pre-promote action" (which I believe is because > PostgreSQL shuts down, but can't communicate with the standby during the > shutdown - a similar situation to what is described here: > https://github.com/ClusterLabs/PAF/issues/149) > > Ideally, if the public network interface on the active node goes down I > would want to take that node offline (either fence it or put it in standby > mode, so that no resources can run on it), leaving just the other node in > the cluster as the active node. Then the old primary can be rebuilt from > the new primary in order to join the cluster again. However, I can't figure > out a way to cause the active node to be fenced as a result of > ocf:heartbeat:ethmonitor detecting that the interface has gone down. > > Does anyone have any ideas/pointers how I could achieve this, or an > alternative approach? > If stopping resource fails, default pacemaker reaction is to fence the node. Assuming "causes the pgsql resource to error" means "stopping resource fails" it should already do what you want. Show logs from both nodes around the time you simulate error. > Hopefully that makes sense. Any help is appreciated! > > Thanks. > > > ___ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ > ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
[ClusterLabs] Fence node when network interface goes down
Hi, I'm hoping someone will be able to point me in the right direction. I am configuring a two-node active/passive cluster that utilises the PostgreSQL PAF resource agent. Each node has two NICs, therefore the cluster is configured with two corosync links - one on each network (one network is the public network, the other is effectively private and just used for cluster communication). The cluster has a virtual IP resource, which has a colocation constraint to keep it together with the primary Postgres instance. I am trying to protect against the scenario where the public network interface on the active node goes down, in which case I want a failover to occur and the other node to take over and host the primary Postgres instance and the public virtual IP. My current approach is to use ocf:heartbeat:ethmonitor to monitor the public interface along with a location constraint to ensure that the virtual IP must be on a node where the public interface is UP. With this configuration, if I disconnect the active node from the public network, Pacemaker attempts to move the primary PostgreSQL and virtual IP to the other node. The problem is that it attempts to stop the resources gracefully, which causes the pgsql resource to error with "Switchover has been canceled from pre-promote action" (which I believe is because PostgreSQL shuts down, but can't communicate with the standby during the shutdown - a similar situation to what is described here: https://github.com/ClusterLabs/PAF/issues/149) Ideally, if the public network interface on the active node goes down I would want to take that node offline (either fence it or put it in standby mode, so that no resources can run on it), leaving just the other node in the cluster as the active node. Then the old primary can be rebuilt from the new primary in order to join the cluster again. However, I can't figure out a way to cause the active node to be fenced as a result of ocf:heartbeat:ethmonitor detecting that the interface has gone down. Does anyone have any ideas/pointers how I could achieve this, or an alternative approach? Hopefully that makes sense. Any help is appreciated! Thanks. ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/