Re: [ClusterLabs] Antw: OCF Return codes OCF_NOT_RUNNING

2018-07-11 Thread Ken Gaillot
On Wed, 2018-07-11 at 13:44 +0200, Ulrich Windl wrote:
> > > > Ian Underhill  schrieb am 11.07.2018
> > > > um 13:27 in
> 
> Nachricht
> :
> > im trying to understand the behaviour of pacemaker when a resource
> > monitor
> > returns OCF_NOT_RUNNING instead of OCF_ERR_GENERIC, and does
> > pacemaker
> > really care.
> > 
> > The documentation states that a return code OCF_NOT_RUNNING from a
> > monitor
> > will not result in a stop being called on that resource, as it
> > believes the
> > node is still clean.
> > 
> > https://www.clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/html/
> > Pacemaker 
> > _Explained/s-ocf-return-codes.html
> > 
> > This makes sense, however in practice is not what happens (unless
> > im doing
> > something wrong :) )
> > 
> > When my resource returns OCF_NOT_RUNNING for a monitor call (after
> > a start
> > has been performed) a stop is called.
> 
> Well: it depends: If your start was successful, pacemaker believes
> the resource is running. If the monitor says it's stopped, pacemaker
> seems to try a "clean stop" by calling the stop method (possibly
> before trying to start it again). Am I right?

Yes, I think the documentation is wrong. It depends on what state the
cluster thinks the resource is supposed to be in. If the cluster
expects the resource is already stopped (for example, when doing a
probe), then "not running" will not result in a stop. If the cluster
expects the resource to be running (for example, when doing a normal
recurring monitor), then the documentation is incorrect, recovery
includes a stop then start.

> > if I have a resource threshold set >1,  i get start->monitor->stop
> > cycle
> > until the threshold is consumed
> 
> Then either your start is broken, or your monitor is broken. Try to
> validate your RA using ocf-tester before using it.
> 
> Regards,
> Ulrich
> 
> > 
> > /Ian.
-- 
Ken Gaillot 
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Antw: OCF Return codes OCF_NOT_RUNNING

2018-07-11 Thread Ulrich Windl
>>> Ian Underhill  schrieb am 11.07.2018 um 13:27 in
Nachricht
:
> im trying to understand the behaviour of pacemaker when a resource monitor
> returns OCF_NOT_RUNNING instead of OCF_ERR_GENERIC, and does pacemaker
> really care.
> 
> The documentation states that a return code OCF_NOT_RUNNING from a monitor
> will not result in a stop being called on that resource, as it believes the
> node is still clean.
> 
> https://www.clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/html/Pacemaker 
> _Explained/s-ocf-return-codes.html
> 
> This makes sense, however in practice is not what happens (unless im doing
> something wrong :) )
> 
> When my resource returns OCF_NOT_RUNNING for a monitor call (after a start
> has been performed) a stop is called.

Well: it depends: If your start was successful, pacemaker believes the resource 
is running. If the monitor says it's stopped, pacemaker seems to try a "clean 
stop" by calling the stop method (possibly before trying to start it again). Am 
I right?

> 
> if I have a resource threshold set >1,  i get start->monitor->stop cycle
> until the threshold is consumed

Then either your start is broken, or your monitor is broken. Try to validate 
your RA using ocf-tester before using it.

Regards,
Ulrich

> 
> /Ian.




___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org