Re: [Pacemaker] fencing to recover from failed resources

Bart Coninckx Thu, 13 Jan 2011 02:56:33 -0800

On Thursday 13 January 2011 11:13:42 Lars Marowsky-Bree wrote:
> On 2011-01-13T11:08:49, Bart Coninckx <bart.conin...@telenet.be> wrote:
> > thx for your answer.
> > So do I get this straight:
> > - resource undergoes monitor operation
> > - monitor reports failure
> > - a restart of the resource is issued (stop and start)
> > - stop fails
> > - PE decides to fence the node because of this regardless of the state of
> > other resources
> > 
> > Untill I figure out why a stop fails (this are Xen resources, not sure
> > why a xm shutdown or xm destroy would fail ...), is there a way to make
> > Pacemaker less radical in fencing (without disabling fencing all
> > together?)
> 
> You can set the on-fail behavior for stop operations too.
> 
> It defaults to "fence" since a failed stop implies that pacemaker was
> unable to recover the resource, and so it cannot be started again (on
> the same node or elsewhere). This typically implies a bug in the
> resource agent (which failed to perform the requested action) or a
> kernel bug (unkillable processes etc); hence, the only automated safe
> action that pacemaker can do to bring the resource into a clean state
> again is to fence the whole node.
> 
> If you don't want that, you can set on-fail="block", for example.
> 
> 
> Regards,
>     Lars



By the way: things seem better when I change the monitor time out to 30 
seconds in stead of 10 seconds. Very strange though, because the resource 
agent basically does a "xm list --long" while monitoring, which takes less 
than half a second in a console.


B. 



_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] fencing to recover from failed resources

Reply via email to