On Mon, Mar 21, 2011 at 4:06 PM, Pavel Levshin <pa...@levshin.spb.ru> wrote: > Hi. > > Today, we had a network outage. Quite a few problems suddenly arised in out > setup, including crashed corosync, known notify bug in DRBD RA and some > problem with VirtualDomain RA timeout on stop. > > But particularly strange was fencing behaviour. > > Initially, one node (wapgw1-1) has parted from the cluster. When connection > was restored, corosync has died on that node. It was considered "offline > unclean" and was scheduled to be fenced. Fencing by HP iLO did not work > (currently, I do not know why). Second priority fencing method is meatware, > and it did take time. > > Second node, wapgw1-2, hit DRBD notify bug and failed to stop some > resources. It was "online unclean". It also was scheduled to be fenced. HP > iLO was available for this node, but it had not been STONITHed until I > manually confirmed STONITH for wapgw1-1. > > When I confirmed first node restart, second node was fenced automatically. > > Is this ordering intended behaviour or a bug?
A little of both. The ordering (in the PE) was added because stonithd wasn't able to cope with parallel fencing operations. I don't know if this is still the case for stonithd in 1.0. Perhaps Dejan can comment. Unfortunately, as you saw, this means that we fence nodes one by one - and that if op N fails, we never try op > N. Ideally the ordering would be removed, lets see what Dejan has to say. > > It's pacemaker 1.0.10, corosync 1.2.7. Three-node cluster. > > > -- > Pavel Levshin > > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker