On Wed, May 20, 2009 at 1:31 AM, Bob Haxo <bh...@sgi.com> wrote: > Greetings, > > I liked the idea of not starting the cluster at boot, and found that the > fenced node would reboot and then openais start brought the node onboard > without triggering a reboot of the already running node. > > Then magic happened. I chkconfig'd openais to start with boot, re-ran the > "ifdown eth0" command that had been triggering STONITH and then the STONITH > deathmarch, and, well, everything worked. I've done this test many 10s of > times without a STONITH deathmarch. > > Unfortunately, I haven't a clue as to what was changed that cleared the > issue.
At a guess, I'd say you removed no-quorum-policy=ignore OpenAIS based clusters don't pretend they have quorum when only 1 of the 2 nodes is available (and you cant start shooting until you have quorum or the above option is set). > > Thanks for all the suggestions. > > Cheers, > Bob Haxo > > > On Tue, 2009-05-19 at 14:03 +0200, Andrew Beekhof wrote: > > On Mon, May 18, 2009 at 8:12 PM, Bob Haxo <bh...@sgi.com> wrote: >> >> Any suggestions as to what needs changing so that the stonith deathmarch >> can >> be avoided? > > If you only have two nodes, the only two ways have already discussed: > use poweroff, or don't start the cluster at boot. > If you don't want to do either of those, the only way to terminate the > stonith loop is to fix the network failure. > > If you had 3 or more nodes, the returning node wouldn't have quorum > and therefore wouldn't be allowed to shoot anyone. > > _______________________________________________ > Pacemaker mailing list > Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > _______________________________________________ > Pacemaker mailing list > Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > _______________________________________________ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker