Ah, ok, now I get it. So node 2 should wait until it's confident that the lost node either shut down or was killed by it's watchdog timer. After that, it will consider the node fenced and proceed with recovery. I don't think ATB will factor in here, as the cluster should treat this as a simple "node was lost, fencing finally worked, it's safe to recover now" thing.
The node IDs shouldn't matter in this case. What decides the winner is who is allowed access to the shared storage. The one that can is allowed to keep kicking the watchdog. The one that loses access, assuming it is alive at all, should be forced off when the watchdog timer expires. digimer On 2018-04-28 09:19 PM, Wei Shan wrote: > Hi, > > I'm using Redhat Cluster Suite 7with watchdog timer based fence agent. I > understand this is a really bad setup but this is what the end-user wants. > > ATB => auto_tie_breaker > > "When the auto_tie_breaker is used in even-number member clusters, then > the failure of the partition containing the auto_tie_breaker_node (by > default the node with lowest ID) will cause other partition to become > inquorate and it will self-fence. In 2-node clusters with > auto_tie_breaker this means that failure of node favoured by > auto_tie_breaker_node (typically nodeid 1) will result in reboot of > other node (typically nodeid 2) that detects the inquorate state. If > this is undesirable then corosync-qdevice can be used instead of the > auto_tie_breaker to provide additional vote to quorum making behaviour > closer to odd-number member clusters." > > Thanks > > > On Sun, 29 Apr 2018 at 02:15, Digimer <li...@alteeve.ca > <mailto:li...@alteeve.ca>> wrote: > > On 2018-04-28 09:06 PM, Wei Shan wrote: > > Hi all, > > > > If I have a 2 node cluster with ATB enabled and the lowest node ID > node > > has failed. What will happen? My assumption is that the higher node ID > > node will self fence and be rebooted. What happens after that? > > > > Thanks! > > > > -- > > Regards, > > Ang Wei Shan > > Which cluster stack is this? I am not familiar with the term "ATB". > > If it's a standard pacemaker or cman/rgmanager cluster, then on node > failure, the good node should block and request a fence (a lost node is > not allowed to be assumed gone via self fence, except when using a > watchdog timer based fence agent). If the fence doesn't work, the > survivor should remain blocked (better to hang than risk corruption). If > the fence succeeds, then the survivor node will recover any lost > services based on the configuration of those services (usually a simple > (re)start on the good node). > > -- > Digimer > Papers and Projects: https://alteeve.com/w/ > "I am, somehow, less interested in the weight and convolutions of > Einstein’s brain than in the near certainty that people of equal talent > have lived and died in cotton fields and sweatshops." - Stephen Jay > Gould > > > > -- > Regards, > Ang Wei Shan -- Digimer Papers and Projects: https://alteeve.com/w/ "I am, somehow, less interested in the weight and convolutions of Einstein’s brain than in the near certainty that people of equal talent have lived and died in cotton fields and sweatshops." - Stephen Jay Gould _______________________________________________ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org