10.02.2016 13:56, Ferenc Wágner wrote:
Vladislav Bogdanov <bub...@hoster-ok.com> writes:

If pacemaker has got an error on start, it will run stop with the same
set of parameters anyways. And will get error again if that one was
from validation and RA does not differentiate validation for start and
stop. And then circular fencing over the whole cluster is triggered
for no reason.

Of course, for safety, RA could save its state if start was successful
and skip validation on stop only if that state is not found. Otherwise
removed binary or config file would result in resource running on
several nodes.

What would happen if we made the start operation return OCF_NOT_RUNNING

Well, then cluster will try to start it again, and that could be undesirable - what are OCF_ERR_INSTALLED and OCF_ERR_CONFIGURED for then?

if validation fails?  Or more broadly: if the start operation knows that
the resource is not running, thus a stop opration would do no good.
 From Pacemaker Explained B.4: "The cluster will not attempt to stop a
resource that returns this for any action."  The probes could still
return OCF_ERR_CONFIGURED, putting real info into the logs, the stop
failure could still lead to fencing, protecting data integrity, but
circular fencing would not happen.  I hope.

By the way, what are the reasons to run stop after a failed start?  To
clean up halfway-started resources?  Besides OCF_ERR_GENERIC, the other
error codes pretty much guarrantee that the resource can not be active.

That heavily depends on how given RA is implemented...


_______________________________________________
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to