Re: [ClusterLabs] Pacemaker tries to demote resource that isn't running and returns OCF_FAILED_MASTER

Andrei Borzenkov Thu, 20 Aug 2015 20:35:02 -0700

21.08.2015 00:35, Brian Campbell пишет:

I have a master/slave resource (with a custom resource agent) which,
if it uncleanly shut down, will return OCF_FAILED_MASTER on the next
"monitor" operation. This seems to be what
http://www.linux-ha.org/doc/dev-guides/_literal_ocf_failed_master_literal_9.html
suggests that exit code should be used for.


After the node is fenced, and comes up again, Pacemaker probes all of
the resources. It gets the OCF_FAILED_MASTER exit code, and decides
that it needs to demote the resource. So it executes the demote
action. My resource agent returns an error on a demote action if it is
not running, which seems to be the suggested behavior according to
http://www.linux-ha.org/doc/dev-guides/_literal_demote_literal_action.html

This then causes Pacemaker to log a failure for the "demote" action,
and then try to recover by stopping (which succeeds cleanly because
the resource is stopped) followed by starting it again (which again
succeeds, as we can start in slave mode from a failed state). So the
end state is correct, but crm_mon shows a failed action that you need
to clear out:

Failed actions:
     
editshare.stack.7c645b0e-46bb-407e-b48a-92ec3121f2d7.lizardfs-master.primitive_demote_0
(node=es-efs-master2, call=73, rc=1, status=complete, l
ast-rc-change=Thu Aug 20 12:52:21 2015
, queued=54ms, exec=1ms
): unknown error

I'm curious about whether the behavior of my resource agent is
correct. Should I not be returning OCF_FAILED_MASTER upon the
"monitor" operation if the resource isn't started?

Correct. If resource is not started it cannot be master or slave; it canbecome master only after pacemaker requested it. Unexpected master wouldbe just the same error as well.

If you can determine that one resource instance is more suitable tobecome master than another one, you should set master score respectivelyso pacemaker will promote correct instance.

                                                   Or should the
"demote" operation do something different in this state, like actually
starting up the slave?

In general, if current resource state is the same as would be afteroperation is completed, there is absolutely no reason to return error -just pretend operation succeeded.

It seems like the behavior of Pacemaker is different than what's
documented in the resource agent guide, so I'm trying to figure out if
this is a bug in my resource agent, a bug in Pacemaker, a
misunderstanding on my part, or actually intended behavior.

-- Brian

_______________________________________________
Users mailing list: [email protected]
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org



_______________________________________________
Users mailing list: [email protected]
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Pacemaker tries to demote resource that isn't running and returns OCF_FAILED_MASTER

Reply via email to