17.02.2019 0:33, Andrei Borzenkov пишет: > 17.02.2019 0:03, Eric Robinson пишет: >> Here are the relevant corosync logs. >> >> It appears that the stop action for resource p_mysql_002 failed, and that >> caused a cascading series of service changes. However, I don't understand >> why, since no other resources are dependent on p_mysql_002. >> > > You have mandatory colocation constraints for each SQL resource with > VIP. it means that to move SQL resource to another node pacemaker also > must move VIP to another node which in turn means it needs to move all > other dependent resources as well. > ... >> Feb 16 14:06:39 [3912] 001db01a pengine: warning: >> check_migration_threshold: Forcing p_mysql_002 away from 001db01a >> after 1000000 failures (max=1000000) > ... >> Feb 16 14:06:39 [3912] 001db01a pengine: notice: LogAction: * >> Stop p_vip_clust01 ( 001db01a ) blocked > ... >> Feb 16 14:06:39 [3912] 001db01a pengine: notice: LogAction: * >> Stop p_mysql_001 ( 001db01a ) due to >> colocation with p_vip_clust01 >
There is apparently more in it. Note that p_vip_clust01 operation is "blocked". That is because mandatory order constraint is symmetrical by default, so to move VIP pacemaker needs first to stop it on current node; but before it can stop VIP it needs to (be able to) stop p_mysql_002; but it cannot do it because by default when "stop" fails without stonith, resource is blocked and no further actions are possible - i.e. resource can no more (tried to) be stopped. I still consider is rather questionable behavior. I tried to reproduce it and I see the same. 1. After this happens resource p_mysql_002 has target=Stopped in CIB. Why, oh why, pacemaker tries to "force away" resource that is not going to be started on another node anyway? 2. pacemaker knows that it cannot stop (and hence move) p_vip_clust01, still it happily will stop all resources that depend on it in preparation to move them and leave them at that because it cannot move them. Resources are neither restarted on current node, nor moved to another node. At this point I'd expect pacemaker to be smart enough and not even initiate actions that are known to be unsuccessful. The best we can do at this point is set symmetrical=false which allows move to actually happen, but it still means downtime for resources that are moved and has its own can of worms in normal case. _______________________________________________ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org