[ClusterLabs] Antw: Re: Antw: Re: Antw: [EXT] delaying start of a resource

Ulrich Windl Thu, 17 Dec 2020 23:09:59 -0800

>>> Andrei Borzenkov <[email protected]> schrieb am 18.12.2020 um 08:01 in
Nachricht <[email protected]>:
> 17.12.2020 21:30, Ken Gaillot пишет:
>> 
>> This reminded me that some IPMI implementations return "success" for
>> commands before they've actually been completed. This is why
>> fence_ipmilan has a "power_wait" parameter that defaults to 2 seconds.
>> 
> 
> But on this case we also do not know whether command has been completed
> successfully or not. I'd say in this case the only safe way is to use
> poweroff and verify in stonith agent that node is actually powered off
> before returning success.


As I wrote in my message, the other node showind that a node has left would be
an indication that fencing was successful IF there was a valid network
connection up to the fencing event. Thus I think a redundant network is rather
important. The user should be able to tell whether fencing actually does work;
maybe not from syslog, but from other indicators.
Also if the network outage were simulated by using a node-specific blackhole
route (blocking just the other node(s)), the node could be queried (for
example) by a ping from a third note to see whether and when it actually wend
down.

Regards,
Ulrich

> 
>> The best thing would be to do some manual testing using ipmitool or
>> whatnot to turn off the power, and observe how long it takes between
>> when the command returns and the server actually is powered down. Then
>> set power_wait to a comfortable margin above that. Or just keep raising
>> power_wait until the problem goes away :)
>> 
> 
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/ 



_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

[ClusterLabs] Antw: Re: Antw: Re: Antw: [EXT] delaying start of a resource

Reply via email to