On 07/11/2018 05:48 AM, Andrei Borzenkov wrote: > 11.07.2018 05:45, Confidential Company пишет: >> Not true, the faster node will kill the slower node first. It is >> possible that through misconfiguration, both could die, but it's rare >> and easily avoided with a 'delay="15"' set on the fence config for the >> node you want to win. >> >> Don't use a delay on the other node, just the node you want to live in >> such a case. >> >> ** >> 1. Given Active/Passive setup, resources are active on Node1 >> 2. fence1(prefers to Node1, delay=15) and fence2(prefers to >> Node2, delay=30) >> 3. Node2 goes down >> 4. Node1 thinks Node2 goes down / Node2 thinks Node1 goes >> down > If node2 is down, it cannot think anything.
True. Assuming it is not really down but just somehow disconnected for my answer below. > >> 5. fence1 counts 15 seconds before he fence Node1 while >> fence2 counts 30 seconds before he fence Node2 >> 6. Since fence1 do have shorter time than fence2, fence1 >> executes and shutdown Node1. >> 7. fence1(action: shutdown Node1) will trigger first >> always because it has shorter delay than fence2. >> >> ** Okay what's important is that they should be different. But in the case >> above, even though Node2 goes down but Node1 has shorter delay, Node1 gets >> fenced/shutdown. This is a sample scenario. I don't get the point. Can you >> comment on this? You didn't send the actual config but from your description I get the scenario that way: fencing-resource fence1 is running on Node2 and it is there to fence Node1 and it has a delay of 15s. fencing-resource fence2 is running on Node1 and it is there to fence Node2 and it has a delay of 30s. If they now begin to fence each other at the same time the node actually fenced would be Node1 of course as the fencing-resource fence1 is gonna shoot 15s earlier that the fence2. Looks consistent to me ... Regards, Klaus >> >> Thanks >> >> On Tue, Jul 10, 2018 at 12:18 AM, Klaus Wenninger <kwenn...@redhat.com> >> wrote: >> >>> On 07/09/2018 05:53 PM, Digimer wrote: >>>> On 2018-07-09 11:45 AM, Klaus Wenninger wrote: >>>>> On 07/09/2018 05:33 PM, Digimer wrote: >>>>>> On 2018-07-09 09:56 AM, Klaus Wenninger wrote: >>>>>>> On 07/09/2018 03:49 PM, Digimer wrote: >>>>>>>> On 2018-07-09 08:31 AM, Klaus Wenninger wrote: >>>>>>>>> On 07/09/2018 02:04 PM, Confidential Company wrote: >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> Any ideas what triggers fencing script or stonith? >>>>>>>>>> >>>>>>>>>> Given the setup below: >>>>>>>>>> 1. I have two nodes >>>>>>>>>> 2. Configured fencing on both nodes >>>>>>>>>> 3. Configured delay=15 and delay=30 on fence1(for Node1) and >>>>>>>>>> fence2(for Node2) respectively >>>>>>>>>> >>>>>>>>>> *What does it mean to configured delay in stonith? wait for 15 >>> seconds >>>>>>>>>> before it fence the node? >>>>>>>>> Given that on a 2-node-cluster you don't have real quorum to make >>> one >>>>>>>>> partial cluster fence the rest of the nodes the different delays >>> are meant >>>>>>>>> to prevent a fencing-race. >>>>>>>>> Without different delays that would lead to both nodes fencing each >>>>>>>>> other at the same time - finally both being down. >>>>>>>> Not true, the faster node will kill the slower node first. It is >>>>>>>> possible that through misconfiguration, both could die, but it's rare >>>>>>>> and easily avoided with a 'delay="15"' set on the fence config for >>> the >>>>>>>> node you want to win. >>>>>>> What exactly is not true? Aren't we saying the same? >>>>>>> Of course one of the delays can be 0 (most important is that >>>>>>> they are different). >>>>>> Perhaps I misunderstood your message. It seemed to me that the >>>>>> implication was that fencing in 2-node without a delay always ends up >>>>>> with both nodes being down, which isn't the case. It can happen if the >>>>>> fence methods are not setup right (ie: the node isn't set to >>> immediately >>>>>> power off on ACPI power button event). >>>>> Yes, a misunderstanding I guess. >>>>> >>>>> Should have been more verbose in saying that due to the >>>>> time between the fencing-command fired off to the fencing >>>>> device and the actual fencing taking place (as you state >>>>> dependent on how it is configured in detail - but a measurable >>>>> time in all cases) there is a certain probability that when >>>>> both nodes start fencing at roughly the same time we will >>>>> end up with 2 nodes down. >>>>> >>>>> Everybody has to find his own tradeoff between reliability >>>>> fence-races are prevented and fencing delay I guess. >>>> We've used this; >>>> >>>> 1. IPMI (with the guest OS set to immediately power off) as primary, >>>> with a 15 second delay on the active node. >>>> >>>> 2. Two Switched PDUs (two power circuits, two PSUs) as backup fencing >>>> for when IPMI fails, with no delay. >>>> >>>> In ~8 years, across dozens and dozens of clusters and countless fence >>>> actions, we've never had a dual-fence event (where both nodes go down). >>>> So it can be done safely, but as always, test test test before prod. >>> No doubt about that this setup is working reliably. >>> You just have to know your fencing-devices and >>> which delays they involve. >>> >>> If we are talking about SBD (with disk as otherwise >>> it doesn't work in a sensible way in 2-node-clusters) >>> for instance I would strongly advise using a delay. >>> >>> So I guess it is important to understand the basic >>> idea behind this different delay-based fence-race >>> avoidance. >>> Afterwards you can still decide why it is no issue >>> in your own setup. >>> >>>>>> If the delay is set on both nodes, and they are different, it will work >>>>>> fine. The reason not to do this is that if you use 0, then don't use >>>>>> anything at all (0 is default), and any other value causes avoidable >>>>>> fence delays. >>>>>> >>>>>>>> Don't use a delay on the other node, just the node you want to live >>> in >>>>>>>> such a case. >>>>>>>> >>>>>>>>>> *Given Node1 is active and Node2 goes down, does it mean fence1 >>> will >>>>>>>>>> first execute and shutdowns Node1 even though Node2 goes down? >>>>>>>>> If Node2 managed to sign off properly it will not. >>>>>>>>> If network-connection is down so that Node2 can't inform Node1 that >>> it >>>>>>>>> is going >>>>>>>>> down and finally has stopped all resources it will be fenced by >>> Node1. >>>>>>>>> Regards, >>>>>>>>> Klaus >>>>>>>> Fencing occurs in two cases; >>>>>>>> >>>>>>>> 1. The node stops responding (meaning it's in an unknown state, so >>> it is >>>>>>>> fenced to force it into a known state). >>>>>>>> 2. A resource / service fails to stop stop. In this case, the >>> service is >>>>>>>> in an unknown state, so the node is fenced to force the service into >>> a >>>>>>>> known state so that it can be safely recovered on the peer. >>>>>>>> >>>>>>>> Graceful withdrawal of the node from the cluster, and graceful >>> stopping >>>>>>>> of services will not lead to a fence (because in both cases, the >>> node / >>>>>>>> service are in a known state - off). >>>>>>>> >>> >> >> >> _______________________________________________ >> Users mailing list: Users@clusterlabs.org >> https://lists.clusterlabs.org/mailman/listinfo/users >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org >> > _______________________________________________ > Users mailing list: Users@clusterlabs.org > https://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org