On Fri, 2021-11-19 at 10:40 -0500, john tillman wrote: <snip>
> > If pacemaker tries to stop resources due to out of quorum > > condition, you > > could set suitable failure-timeout; this will be equivalent to > > using "pcs > > resource refresh". Keep in mind that pacemaker only checks for > > failure-timeout expiration every cluster-recheck-interval (15 That's true only for Pacemaker versions less than 2.0.3; since 2.0.3, the cluster rechecks as soon as the timeout hits. > > minutes by > > default). This still is not directly related to network > > availability, but > > if network outage resulted in node going out of quorum, when > > network is > > back and node joined cluster again it will allow resources to be > > started > > on node. > > > > When quorum is lost I want all the resources to stop. The cluster is > performing this step correctly for me. As long as it's working properly. If quorum is lost because one of the nodes is malfunctioning -- maybe a device driver locked up the system, or CPU wait is horrific due to an out-of-control process or disk failure -- then that node will not know quorum has been lost and will not stop resources. If the condition then clears up, suddenly you have split-brain with two nodes running resources. > > That cluster-recheck-interval would explain the intermittence I saw > this > morning. If I set that to 1 minute would that cause any gross > negative > issues? It increases CPU usage and IPC traffic. For Pacemaker 2.0.3 or later, I definitely wouldn't bother. For older versions, 1 minute feels a bit much, I would go with around 5. > > Is there another setting besides cluster-recheck-interval to consider > adjusting to start mysql when quorum is returned? > > Thank you for the feedback. > > -John -- Ken Gaillot <kgail...@redhat.com> _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/