Re: [ClusterLabs] Fence node when network interface goes down

S Rogers Mon, 15 Nov 2021 04:32:55 -0800


On 15/11/2021 12:03, Klaus Wenninger wrote:

On Mon, Nov 15, 2021 at 12:19 PM Andrei Borzenkov<[email protected]> wrote:


    On Mon, Nov 15, 2021 at 1:18 PM Klaus Wenninger
    <[email protected]> wrote:
    >
    >
    >
    > On Mon, Nov 15, 2021 at 10:37 AM S Rogers
    <[email protected]> wrote:
    >>
    >> I had thought about doing that, but the cluster is then
    dependent on the
    >> external system, and if that external system was to go down or
    become
    >> unreachable for any reason then it would falsely cause the
    cluster to
    >> failover or worse it could even take the cluster down
    completely, if the
    >> external system goes down and both nodes cannot ping it.
    >
    > You wouldn't necessarily have to ban resources from nodes that can't
    > reach the external network. It would be enough to make them prefer
    > the location that has connection. So if both lose connection 
    one side
    > would still stay up.
    > Not to depend on something really external you might use the
    > router to your external network as ping target.
    > In case of fencing - triggered by whatever - and a potential
    fence-race

    The problem here is that nothing really triggers fencing. What
    happens, is


Got that! Which is why I gave the hint how to prevent shutting down
services with ping first.
Taking care of what happens when nodes are fenced still makes sense.
Imagine a fence-race where the node running services loses just
to afterwards get the services moved back when it comes up again.

Klaus

Thanks, I wasn't aware of priority-fencing-delay. While it doesn't solvethis problem, I can still use it to improve the fencing behaviour of thecluster in general.

Unfortunately, in some situations this cluster will be deployed in acompletely isolated network so there may not even be a router that wecan use as a ping target, and we can't guarantee the presence of anyother system on the network that we could reliably use as a ping target.


    - two postgres lose connection over external network, but cluster
    nodes retain connectivity over another network
    - postgres RA compares "latest timestamp" when selecting the best node
    to fail over to
    - primary postgres has better timestamp, so RA simply does not
    consider secondary as suitable for (atomatic) failover

    The only solution here - as long as fencing node on external
    connectivity loss is acceptable - is modifying ethmonitor RA to fail
    monitor operation in this case.

I was hoping to find a way to achieve the desired outcome withoutresorting to a custom RA, but it does appear to be the only solution.

This may not be the right audience, but does anyone know if it is aviable change to add an additional parameter to the ethmonitor RA thatallows users to override the desired behaviour when the monitoroperation fails? (ie, a 'monitor_force_fail' parameter that when set totrue will cause the monitor operation to fail if it determines theinterface is down)

Being relatively new to pacemaker, I don't know whether this goesagainst RA conventions/practices.


    > you might use the rather new feature priority-fencing-delay
    (give the node
    > that is running valuable resources a benefit in the race) or go for
    > fence_heuristics_ping (pseudo fence-resource that together with a
    > fencing-topology prevents the node without access to a certain IP
    > from fencing the other node).
    >
    
https://clusterlabs.org/pacemaker/doc/deprecated/en-US/Pacemaker/2.0/html/Pacemaker_Explained/s-cluster-options.html
    >
    
https://github.com/ClusterLabs/fence-agents/blob/master/agents/heuristics_ping/fence_heuristics_ping.py
    >
    > Klaus
    > _______________________________________________
    >>
    >> Manage your subscription:
    >> https://lists.clusterlabs.org/mailman/listinfo/users
    >>
    >> ClusterLabs home: https://www.clusterlabs.org/
    >>
    > _______________________________________________
    > Manage your subscription:
    > https://lists.clusterlabs.org/mailman/listinfo/users
    >
    > ClusterLabs home: https://www.clusterlabs.org/
    _______________________________________________
    Manage your subscription:
    https://lists.clusterlabs.org/mailman/listinfo/users

    ClusterLabs home: https://www.clusterlabs.org/


_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home:https://www.clusterlabs.org/

_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Fence node when network interface goes down

Reply via email to