I have a master/slave resource (with a custom resource agent) which, if it uncleanly shut down, will return OCF_FAILED_MASTER on the next "monitor" operation. This seems to be what http://www.linux-ha.org/doc/dev-guides/_literal_ocf_failed_master_literal_9.html suggests that exit code should be used for.
After the node is fenced, and comes up again, Pacemaker probes all of the resources. It gets the OCF_FAILED_MASTER exit code, and decides that it needs to demote the resource. So it executes the demote action. My resource agent returns an error on a demote action if it is not running, which seems to be the suggested behavior according to http://www.linux-ha.org/doc/dev-guides/_literal_demote_literal_action.html This then causes Pacemaker to log a failure for the "demote" action, and then try to recover by stopping (which succeeds cleanly because the resource is stopped) followed by starting it again (which again succeeds, as we can start in slave mode from a failed state). So the end state is correct, but crm_mon shows a failed action that you need to clear out: Failed actions: editshare.stack.7c645b0e-46bb-407e-b48a-92ec3121f2d7.lizardfs-master.primitive_demote_0 (node=es-efs-master2, call=73, rc=1, status=complete, l ast-rc-change=Thu Aug 20 12:52:21 2015 , queued=54ms, exec=1ms ): unknown error I'm curious about whether the behavior of my resource agent is correct. Should I not be returning OCF_FAILED_MASTER upon the "monitor" operation if the resource isn't started? Or should the "demote" operation do something different in this state, like actually starting up the slave? It seems like the behavior of Pacemaker is different than what's documented in the resource agent guide, so I'm trying to figure out if this is a bug in my resource agent, a bug in Pacemaker, a misunderstanding on my part, or actually intended behavior. -- Brian _______________________________________________ Users mailing list: [email protected] http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
