> On 21 Aug 2015, at 1:32 pm, Andrei Borzenkov <[email protected]> wrote: > > 21.08.2015 00:35, Brian Campbell пишет: >> I have a master/slave resource (with a custom resource agent) which, >> if it uncleanly shut down, will return OCF_FAILED_MASTER on the next >> "monitor" operation. This seems to be what >> http://www.linux-ha.org/doc/dev-guides/_literal_ocf_failed_master_literal_9.html >> suggests that exit code should be used for. >> >> After the node is fenced, and comes up again, Pacemaker probes all of >> the resources. It gets the OCF_FAILED_MASTER exit code, and decides >> that it needs to demote the resource. So it executes the demote >> action. My resource agent returns an error on a demote action if it is >> not running, which seems to be the suggested behavior according to >> http://www.linux-ha.org/doc/dev-guides/_literal_demote_literal_action.html >> >> This then causes Pacemaker to log a failure for the "demote" action, >> and then try to recover by stopping (which succeeds cleanly because >> the resource is stopped) followed by starting it again (which again >> succeeds, as we can start in slave mode from a failed state). So the >> end state is correct, but crm_mon shows a failed action that you need >> to clear out: >> >> Failed actions: >> >> editshare.stack.7c645b0e-46bb-407e-b48a-92ec3121f2d7.lizardfs-master.primitive_demote_0 >> (node=es-efs-master2, call=73, rc=1, status=complete, l >> ast-rc-change=Thu Aug 20 12:52:21 2015 >> , queued=54ms, exec=1ms >> ): unknown error >> >> I'm curious about whether the behavior of my resource agent is >> correct. Should I not be returning OCF_FAILED_MASTER upon the >> "monitor" operation if the resource isn't started? > > Correct. If resource is not started it cannot be master or slave; it can > become master only after pacemaker requested it. Unexpected master would be > just the same error as well. > > If you can determine that one resource instance is more suitable to become > master than another one, you should set master score respectively so > pacemaker will promote correct instance. > >> Or should the >> "demote" operation do something different in this state, like actually >> starting up the slave? >> > > In general, if current resource state is the same as would be after operation > is completed, there is absolutely no reason to return error - just pretend > operation succeeded.
Always return the actual state. ie. OCF_NOT_RUNNING in these two cases. Only return OCF_FAILED_MASTER if you know enough to say that its in the master state (ie. lock file, or similar mechanism) but not able to handle requests. > >> It seems like the behavior of Pacemaker is different than what's >> documented in the resource agent guide, so I'm trying to figure out if >> this is a bug in my resource agent, a bug in Pacemaker, a >> misunderstanding on my part, or actually intended behavior. >> >> -- Brian >> >> _______________________________________________ >> Users mailing list: [email protected] >> http://clusterlabs.org/mailman/listinfo/users >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org >> > > > _______________________________________________ > Users mailing list: [email protected] > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Users mailing list: [email protected] http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
