On Thursday 13 January 2011 11:13:42 Lars Marowsky-Bree wrote: > On 2011-01-13T11:08:49, Bart Coninckx <bart.conin...@telenet.be> wrote: > > thx for your answer. > > So do I get this straight: > > - resource undergoes monitor operation > > - monitor reports failure > > - a restart of the resource is issued (stop and start) > > - stop fails > > - PE decides to fence the node because of this regardless of the state of > > other resources > > > > Untill I figure out why a stop fails (this are Xen resources, not sure > > why a xm shutdown or xm destroy would fail ...), is there a way to make > > Pacemaker less radical in fencing (without disabling fencing all > > together?) > > You can set the on-fail behavior for stop operations too. > > It defaults to "fence" since a failed stop implies that pacemaker was > unable to recover the resource, and so it cannot be started again (on > the same node or elsewhere). This typically implies a bug in the > resource agent (which failed to perform the requested action) or a > kernel bug (unkillable processes etc); hence, the only automated safe > action that pacemaker can do to bring the resource into a clean state > again is to fence the whole node. > > If you don't want that, you can set on-fail="block", for example. > > > Regards, > Lars
By the way: things seem better when I change the monitor time out to 30 seconds in stead of 10 seconds. Very strange though, because the resource agent basically does a "xm list --long" while monitoring, which takes less than half a second in a console. B. _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker