> On 29 Aug 2015, at 1:24 am, Brian Campbell <[email protected]> > wrote: > > On Fri, Aug 28, 2015 at 12:14 AM, Andrew Beekhof <[email protected]> wrote: >> >>> On 21 Aug 2015, at 1:32 pm, Andrei Borzenkov <[email protected]> wrote: >>> >>> 21.08.2015 00:35, Brian Campbell пишет: >>>> I have a master/slave resource (with a custom resource agent) which, >>>> if it uncleanly shut down, will return OCF_FAILED_MASTER on the next >>>> "monitor" operation. This seems to be what >>>> http://www.linux-ha.org/doc/dev-guides/_literal_ocf_failed_master_literal_9.html >>>> suggests that exit code should be used for. >>>> >>>> After the node is fenced, and comes up again, Pacemaker probes all of >>>> the resources. It gets the OCF_FAILED_MASTER exit code, and decides >>>> that it needs to demote the resource. So it executes the demote >>>> action. My resource agent returns an error on a demote action if it is >>>> not running, which seems to be the suggested behavior according to >>>> http://www.linux-ha.org/doc/dev-guides/_literal_demote_literal_action.html >>>> >>>> This then causes Pacemaker to log a failure for the "demote" action, >>>> and then try to recover by stopping (which succeeds cleanly because >>>> the resource is stopped) followed by starting it again (which again >>>> succeeds, as we can start in slave mode from a failed state). So the >>>> end state is correct, but crm_mon shows a failed action that you need >>>> to clear out: >>>> >>>> Failed actions: >>>> >>>> editshare.stack.7c645b0e-46bb-407e-b48a-92ec3121f2d7.lizardfs-master.primitive_demote_0 >>>> (node=es-efs-master2, call=73, rc=1, status=complete, l >>>> ast-rc-change=Thu Aug 20 12:52:21 2015 >>>> , queued=54ms, exec=1ms >>>> ): unknown error >>>> >>>> I'm curious about whether the behavior of my resource agent is >>>> correct. Should I not be returning OCF_FAILED_MASTER upon the >>>> "monitor" operation if the resource isn't started? >>> >>> Correct. If resource is not started it cannot be master or slave; it can >>> become master only after pacemaker requested it. Unexpected master would be >>> just the same error as well. >>> >>> If you can determine that one resource instance is more suitable to become >>> master than another one, you should set master score respectively so >>> pacemaker will promote correct instance. >>> >>>> Or should the >>>> "demote" operation do something different in this state, like actually >>>> starting up the slave? >>>> >>> >>> In general, if current resource state is the same as would be after >>> operation is completed, there is absolutely no reason to return error - >>> just pretend operation succeeded. >> >> Always return the actual state. ie. OCF_NOT_RUNNING in these two cases. >> >> Only return OCF_FAILED_MASTER if you know enough to say that its in the >> master state (ie. lock file, or similar mechanism) but not able to handle >> requests. > > Thanks for the clarifications! > > So it sounds like I should be returning OCF_NOT_RUNNING from the > monitor operation even if I detect that it was uncleanly shut down in > the master state earlier,
It really depends on if you need any cleanup to happen. Need cleanup: OCF_FAILED_MASTER _Safely_ stopped: OCF_NOT_RUNNING > and only return OCF_FAILED_MASTER if it is > running in the master state but failed for some reason, so it needs a > demote or stop. > > -- Brian > > _______________________________________________ > Users mailing list: [email protected] > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Users mailing list: [email protected] http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
