17.02.2019 0:44, Eric Robinson пишет: > Thanks for the feedback, Andrei. > > I only want cluster failover to occur if the filesystem or drbd resources > fail, or if the cluster messaging layer detects a complete node failure. Is > there a way to tell PaceMaker not to trigger a cluster failover if any of the > p_mysql resources fail? >
The closest you can get is disabling monitor recurring action. In this case pacemaker will effectively ignore any resource state change. Unfortunately this also means your resource agent must now correctly handle requests in the wrong state - i.e. it must be able to stop resource that had already failed earlier without returning error to pacemaker. You may set resource to "unmanaged", but this will also prevent pacemaker from starting/stopping your resource at all. As compromise you may set "unmanaged" after resource has been started and unset before stopping it, but then you have exactly the same issue - if resource has failed, as soon as you manage it again pacemaker will trigger corresponding action. Pacemaker design is different from any other cluster resources monitor I have seen. Pacemaker is designed to maintain target resource state at any cost. Pacemaker does not have notion of "important" or "unimportant" resources at all. Even playing with scores won't help because failed resource outweighs everything else with -INFINITY score thus pushing everything dependent away from its current node. In this particular case it may be argued that pacemaker reaction is unjustified. Administrator explicitly set target state to "stop" (otherwise pacemaker would not attempt to stop it) so it is unclear why it tries to restart it on other node. >> -----Original Message----- >> From: Users <users-boun...@clusterlabs.org> On Behalf Of Andrei >> Borzenkov >> Sent: Saturday, February 16, 2019 1:34 PM >> To: users@clusterlabs.org >> Subject: Re: [ClusterLabs] Why Do All The Services Go Down When Just One >> Fails? >> >> 17.02.2019 0:03, Eric Robinson пишет: >>> Here are the relevant corosync logs. >>> >>> It appears that the stop action for resource p_mysql_002 failed, and that >> caused a cascading series of service changes. However, I don't understand >> why, since no other resources are dependent on p_mysql_002. >>> >> >> You have mandatory colocation constraints for each SQL resource with VIP. it >> means that to move SQL resource to another node pacemaker also must >> move VIP to another node which in turn means it needs to move all other >> dependent resources as well. >> ... >>> Feb 16 14:06:39 [3912] 001db01a pengine: warning: >> check_migration_threshold: Forcing p_mysql_002 away from 001db01a >> after 1000000 failures (max=1000000) >> ... >>> Feb 16 14:06:39 [3912] 001db01a pengine: notice: LogAction: * >>> Stop >> p_vip_clust01 ( 001db01a ) blocked >> ... >>> Feb 16 14:06:39 [3912] 001db01a pengine: notice: LogAction: * >>> Stop >> p_mysql_001 ( 001db01a ) due to colocation with >> p_vip_clust01 >> >> _______________________________________________ >> Users mailing list: Users@clusterlabs.org >> https://lists.clusterlabs.org/mailman/listinfo/users >> >> Project Home: http://www.clusterlabs.org Getting started: >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org > _______________________________________________ > Users mailing list: Users@clusterlabs.org > https://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > _______________________________________________ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org