Re: [ClusterLabs] Behavior of corosync kill

Rohit Saini Tue, 25 Aug 2020 07:47:58 -0700

Thanks Ken. Let me check resource-stickiness property at my end.

Regards,
Rohit


On Tue, Aug 25, 2020 at 8:07 PM Ken Gaillot <[email protected]> wrote:

> On Tue, 2020-08-25 at 12:28 +0530, Rohit Saini wrote:
> > Hi All,
> > I am seeing the following behavior. Can someone clarify if this is
> > intended behavior. If yes, then why so? Please let me know if logs
> > are needed for better clarity.
> >
> > 1. Without Stonith:
> > Continuous corosync kill on master causes switchover and makes
> > another node as master. But as soon as this corosync recovers, it
> > becomes master again. Shouldn't it become slave now?
>
> Where resources are active or take on the master role depends on the
> cluster configuration, not past node issues.
>
> You may be interested in the resource-stickiness property:
>
>
> https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/2.0/html-single/Pacemaker_Explained/index.html#_resource_meta_attributes
>
>
> > 2. With Stonith:
> > Sometimes, on corosync kill, that node gets shooted by stonith but
> > sometimes not. Not able to understand this fluctuating behavior. Does
> > it have to do anything with faster recovery of corosync, which
> > stonith fails to detect?
>
> It's not failing to detect it, but recovering satisfactorily without
> fencing.
>
> At any given time, one of the cluster nodes is elected the designated
> controller (DC). When new events occur, such as a node leaving the
> corosync ring unexpectedly, the DC runs pacemaker's scheduler to see
> what needs to be done about it. In the case of a lost node, it will
> also erase the node's resource history, to indicate that the state of
> resources on the node is no longer accurately known.
>
> If no further events happened during that time, the scheduler would
> schedule fencing, and the cluster would carry it out.
>
> However, systemd monitors corosync and will restart it if it dies. If
> systemd respawns corosync fast enough (it often is sub-second), the
> node will rejoin the cluster before the scheduler completes its
> calculations and fencing is initiated. Rejoining the cluster includes
> re-sync'ing its resource history with the other nodes.
>
> The node join is considered new information, so the former scheduler
> run is cancelled (the "transition" is "aborted") and a new one is
> started. Since the node is now happily part of the cluster, and the
> resource history tells us the state of all resources on the node, no
> fencing is needed.
>
>
> > I am using
> > corosync-2.4.5-4.el7.x86_64
> > pacemaker-1.1.19-8.el7.x86_64
> > centos 7.6.1810
> >
> > Thanks,
> > Rohit
> > _______________________________________________
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> >
> > ClusterLabs home: https://www.clusterlabs.org/
> --
> Ken Gaillot <[email protected]>
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>

_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Behavior of corosync kill

Reply via email to