On 15/11/2021 12:03, Klaus Wenninger wrote:


On Mon, Nov 15, 2021 at 12:19 PM Andrei Borzenkov <[email protected]> wrote:

    On Mon, Nov 15, 2021 at 1:18 PM Klaus Wenninger
    <[email protected]> wrote:
    >
    >
    >
    > On Mon, Nov 15, 2021 at 10:37 AM S Rogers
    <[email protected]> wrote:
    >>
    >> I had thought about doing that, but the cluster is then
    dependent on the
    >> external system, and if that external system was to go down or
    become
    >> unreachable for any reason then it would falsely cause the
    cluster to
    >> failover or worse it could even take the cluster down
    completely, if the
    >> external system goes down and both nodes cannot ping it.
    >
    > You wouldn't necessarily have to ban resources from nodes that can't
    > reach the external network. It would be enough to make them prefer
    > the location that has connection. So if both lose connection 
    one side
    > would still stay up.
    > Not to depend on something really external you might use the
    > router to your external network as ping target.
    > In case of fencing - triggered by whatever - and a potential
    fence-race

    The problem here is that nothing really triggers fencing. What
    happens, is


Got that! Which is why I gave the hint how to prevent shutting down
services with ping first.
Taking care of what happens when nodes are fenced still makes sense.
Imagine a fence-race where the node running services loses just
to afterwards get the services moved back when it comes up again.

Klaus
Thanks, I wasn't aware of priority-fencing-delay. While it doesn't solve this problem, I can still use it to improve the fencing behaviour of the cluster in general.

Unfortunately, in some situations this cluster will be deployed in a completely isolated network so there may not even be a router that we can use as a ping target, and we can't guarantee the presence of any other system on the network that we could reliably use as a ping target.


    - two postgres lose connection over external network, but cluster
    nodes retain connectivity over another network
    - postgres RA compares "latest timestamp" when selecting the best node
    to fail over to
    - primary postgres has better timestamp, so RA simply does not
    consider secondary as suitable for (atomatic) failover

    The only solution here - as long as fencing node on external
    connectivity loss is acceptable - is modifying ethmonitor RA to fail
    monitor operation in this case.

I was hoping to find a way to achieve the desired outcome without resorting to a custom RA, but it does appear to be the only solution.

This may not be the right audience, but does anyone know if it is a viable change to add an additional parameter to the ethmonitor RA that allows users to override the desired behaviour when the monitor operation fails? (ie, a 'monitor_force_fail' parameter that when set to true will cause the monitor operation to fail if it determines the interface is down)

Being relatively new to pacemaker, I don't know whether this goes against RA conventions/practices.


    > you might use the rather new feature priority-fencing-delay
    (give the node
    > that is running valuable resources a benefit in the race) or go for
    > fence_heuristics_ping (pseudo fence-resource that together with a
    > fencing-topology prevents the node without access to a certain IP
    > from fencing the other node).
    >
    
https://clusterlabs.org/pacemaker/doc/deprecated/en-US/Pacemaker/2.0/html/Pacemaker_Explained/s-cluster-options.html
    >
    
https://github.com/ClusterLabs/fence-agents/blob/master/agents/heuristics_ping/fence_heuristics_ping.py
    >
    > Klaus
    > _______________________________________________
    >>
    >> Manage your subscription:
    >> https://lists.clusterlabs.org/mailman/listinfo/users
    >>
    >> ClusterLabs home: https://www.clusterlabs.org/
    >>
    > _______________________________________________
    > Manage your subscription:
    > https://lists.clusterlabs.org/mailman/listinfo/users
    >
    > ClusterLabs home: https://www.clusterlabs.org/
    _______________________________________________
    Manage your subscription:
    https://lists.clusterlabs.org/mailman/listinfo/users

    ClusterLabs home: https://www.clusterlabs.org/


_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home:https://www.clusterlabs.org/
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Reply via email to