>>> Andrei Borzenkov <[email protected]> schrieb am 18.12.2020 um 08:21 in Nachricht <[email protected]>: > 18.12.2020 10:09, Ulrich Windl пишет: >>>>> Andrei Borzenkov <[email protected]> schrieb am 18.12.2020 um 08:01 in >> Nachricht <[email protected]>: >>> 17.12.2020 21:30, Ken Gaillot пишет: >>>> >>>> This reminded me that some IPMI implementations return "success" for >>>> commands before they've actually been completed. This is why >>>> fence_ipmilan has a "power_wait" parameter that defaults to 2 seconds. >>>> >>> >>> But on this case we also do not know whether command has been completed >>> successfully or not. I'd say in this case the only safe way is to use >>> poweroff and verify in stonith agent that node is actually powered off >>> before returning success. >> >> As I wrote in my message, the other node showind that a node has left would
> be >> an indication that fencing was successful > > You got it backwards. The fencing starts when pacemaker gets indication > that other node has left. > >> IF there was a valid network >> connection up to the fencing event. Thus I think a redundant network is > rather >> important. The user should be able to tell whether fencing actually does > work; >> maybe not from syslog, but from other indicators. > > Completely wrong. Fencing is needed exactly when there is no possibility > to get information about the other node and there is no way to verify > other node state using "normal" means. Alexamder: I was not talking about "when fencing is needed", but about "what may indicate that fencing happened" > > Redundant network helps to avoid unnecessary fencing, it is not > replacement for fencing. > >> Also if the network outage were simulated by using a node-specific blackhole >> route (blocking just the other node(s)), the node could be queried (for >> example) by a ping from a third note to see whether and when it actually > wend >> down. >> > > And? How should two isolated pacemaker instance now communicate and > coordinate activity even if there is connectivity via some of oter > networks available on nodes? Using multiple rings utilizing all > available networks falls under "redundant network". I'm afraid you misunderstood. See above. Regards, Ulrich > >> Regards, >> Ulrich >> >>> >>>> The best thing would be to do some manual testing using ipmitool or >>>> whatnot to turn off the power, and observe how long it takes between >>>> when the command returns and the server actually is powered down. Then >>>> set power_wait to a comfortable margin above that. Or just keep raising >>>> power_wait until the problem goes away :) >>>> >>> >>> _______________________________________________ >>> Manage your subscription: >>> https://lists.clusterlabs.org/mailman/listinfo/users >>> >>> ClusterLabs home: https://www.clusterlabs.org/ >> >> >> >> _______________________________________________ >> Manage your subscription: >> https://lists.clusterlabs.org/mailman/listinfo/users >> >> ClusterLabs home: https://www.clusterlabs.org/ >> > > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
