On Thu, 2019-07-11 at 10:39 +0100, lejeczek wrote: > On 10/07/2019 15:50, Ken Gaillot wrote: > > On Wed, 2019-07-10 at 11:26 +0100, lejeczek wrote: > > > hi guys, possibly @devel if they pop in here. > > > > > > is there, will there be, a way to make cluster deal with failed > > > resources in such a way that cluster would try not to give up on > > > failed > > > resources? > > > > > > I understand that as of now the only way is user's manual > > > intervention > > > (under which I'd include any scripted ways outside of the > > > cluster) if > > > we > > > need to bring back up a failed resource. > > > > > > many thanks, L. > > > > Not sure what you mean ... the default behavior is to try > > restarting a > > failed resource up to 1,000,000 times on the same node, then try > > starting it on a different node, and not give up until all nodes > > have > > failed to start it. > > > > This is affected by on-fail, migration-threshold, failure-timeout, > > and > > start-failure-is-fatal. > > > > If you're talking about a resource that failed because the entire > > node > > failed, then fencing comes into play. > > Apologies for I was not clear enough while wording my question, I see > that now. When I said - make cluster deal with failed resources - I > meant a resource which failed in the (whole) cluster, failed on every > node. > > If that happens I see that only my (user manual) intervention can > make > cluster peep at the resource again and I wonder if this is me unaware > that there are ways it can be done, that cluster will not need me and > by > itself would do something, will not give up. > > My case is: a systemd resource which whether successful or not is > determined by a mechanism outside of the cluster, it can only > successfully start on one single node. When that node reboots then > cluster fails this resource, when that node rebooted and is up again > the > failed resource remains in failed state. > > Hopefully I manged to make it bit clearer this time. > > Many thanks, L.
Ah, yes. failure-timeout is the only way to handle that. Keep in mind it is not guaranteed to be checked more frequently than the cluster- recheck-interval. -- Ken Gaillot <kgail...@redhat.com> _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/