Thanks Ken. Let me check resource-stickiness property at my end. Regards, Rohit
On Tue, Aug 25, 2020 at 8:07 PM Ken Gaillot <[email protected]> wrote: > On Tue, 2020-08-25 at 12:28 +0530, Rohit Saini wrote: > > Hi All, > > I am seeing the following behavior. Can someone clarify if this is > > intended behavior. If yes, then why so? Please let me know if logs > > are needed for better clarity. > > > > 1. Without Stonith: > > Continuous corosync kill on master causes switchover and makes > > another node as master. But as soon as this corosync recovers, it > > becomes master again. Shouldn't it become slave now? > > Where resources are active or take on the master role depends on the > cluster configuration, not past node issues. > > You may be interested in the resource-stickiness property: > > > https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/2.0/html-single/Pacemaker_Explained/index.html#_resource_meta_attributes > > > > 2. With Stonith: > > Sometimes, on corosync kill, that node gets shooted by stonith but > > sometimes not. Not able to understand this fluctuating behavior. Does > > it have to do anything with faster recovery of corosync, which > > stonith fails to detect? > > It's not failing to detect it, but recovering satisfactorily without > fencing. > > At any given time, one of the cluster nodes is elected the designated > controller (DC). When new events occur, such as a node leaving the > corosync ring unexpectedly, the DC runs pacemaker's scheduler to see > what needs to be done about it. In the case of a lost node, it will > also erase the node's resource history, to indicate that the state of > resources on the node is no longer accurately known. > > If no further events happened during that time, the scheduler would > schedule fencing, and the cluster would carry it out. > > However, systemd monitors corosync and will restart it if it dies. If > systemd respawns corosync fast enough (it often is sub-second), the > node will rejoin the cluster before the scheduler completes its > calculations and fencing is initiated. Rejoining the cluster includes > re-sync'ing its resource history with the other nodes. > > The node join is considered new information, so the former scheduler > run is cancelled (the "transition" is "aborted") and a new one is > started. Since the node is now happily part of the cluster, and the > resource history tells us the state of all resources on the node, no > fencing is needed. > > > > I am using > > corosync-2.4.5-4.el7.x86_64 > > pacemaker-1.1.19-8.el7.x86_64 > > centos 7.6.1810 > > > > Thanks, > > Rohit > > _______________________________________________ > > Manage your subscription: > > https://lists.clusterlabs.org/mailman/listinfo/users > > > > ClusterLabs home: https://www.clusterlabs.org/ > -- > Ken Gaillot <[email protected]> > > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ >
_______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
