> On Fri, 2021-11-19 at 10:40 -0500, john tillman wrote: > > <snip> > >> > If pacemaker tries to stop resources due to out of quorum >> > condition, you >> > could set suitable failure-timeout; this will be equivalent to >> > using "pcs >> > resource refresh". Keep in mind that pacemaker only checks for >> > failure-timeout expiration every cluster-recheck-interval (15 > > That's true only for Pacemaker versions less than 2.0.3; since 2.0.3, > the cluster rechecks as soon as the timeout hits.
I'm using pacemaker 2.0.5 and it is *not* starting MySQL when quorum is restored, at least not every time (~1 in 10). So I have seen it work before but I'm more willing to believe that there was a user error in that one successful sample. We (actual a team mate) got mysql to start when quorum is restored. It required both setting the cluster-recheck-interval to something more frequent than 15min and setting the mysql resource's failure-timeout to non-zero. In our case we set both to 1 minute with good results for the last few tests. We can raise the frequency to something greater than 1 but for our tests, 1 proves it out. > >> > minutes by >> > default). This still is not directly related to network >> > availability, but >> > if network outage resulted in node going out of quorum, when >> > network is >> > back and node joined cluster again it will allow resources to be >> > started >> > on node. >> > >> >> When quorum is lost I want all the resources to stop. The cluster is >> performing this step correctly for me. > > As long as it's working properly. If quorum is lost because one of the > nodes is malfunctioning -- maybe a device driver locked up the system, > or CPU wait is horrific due to an out-of-control process or disk > failure -- then that node will not know quorum has been lost and will > not stop resources. If the condition then clears up, suddenly you have > split-brain with two nodes running resources. > >> >> That cluster-recheck-interval would explain the intermittence I saw >> this >> morning. If I set that to 1 minute would that cause any gross >> negative >> issues? > > It increases CPU usage and IPC traffic. For Pacemaker 2.0.3 or later, I > definitely wouldn't bother. For older versions, 1 minute feels a bit > much, I would go with around 5. > >> >> Is there another setting besides cluster-recheck-interval to consider >> adjusting to start mysql when quorum is returned? >> >> Thank you for the feedback. >> >> -John > > -- > Ken Gaillot <[email protected]> > > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ > > _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
