On 3/5/21 8:14 AM, Ulrich Windl wrote: >>>> Digimer <[email protected]> schrieb am 04.03.2021 um 06:35 in Nachricht > <[email protected]>: >> On 2021-03-03 1:56 a.m., Ulrich Windl wrote: >>>>>> Eric Robinson <[email protected]> schrieb am 02.03.2021 um 19:26 > in >>> Nachricht >>> >> <sa2pr03mb58847e37845fc6c92bc3007efa...@sa2pr03mb5884.namprd03.prod.outlook.co >> m> >>>>> -----Original Message----- >>>>> From: Users <[email protected]> On Behalf Of Digimer >>>>> Sent: Monday, March 1, 2021 11:02 AM >>>>> To: Cluster Labs - All topics related to open-source clustering welcomed >>>>> <[email protected]>; Ulrich Windl > <[email protected]> >>>>> Subject: Re: [ClusterLabs] Antw: [EXT] Re: "Error: unable to fence >>>> '001db02a'" >>> ... >>>>>>> Cloud fencing usually requires a higher timeout (20s reported here). >>>>>>> >>>>>>> Microsoft seems to suggest the following setup: >>>>>>> >>>>>>> # pcs property set stonith‑timeout=900 >>>>>> But doesn't that mean the other node waits 15 minutes after stonith >>>>>> until it performs the first post-stonith action? >>>>> No, it means that if there is no reply by then, the fence has failed. If >>> the >>>>> fence happens sooner, and the caller is told this, recovery begins very >>>> shortly >>>>> after. >>> How would the fencing be confirmed? I don't know. >> It's part of the FenceAgentAPI. The cluster invokes the fence agent, >> passes in variable=value pairs on STDIN, and waits for the agent to >> exit. It reads the agent's exit code and uses that to determine success >> or failure. > But the agent "acting remote" cannot be sure the "remote end" was killed, > specifically when the network connection seems dead. > I see that in the IPMI case you have a separate connection allowing > "out-of-band signaling", but in the general case that would not be possible. Fence-agents are expected to be implemented in a way that a positive return of a fence-action implies verification on the "remote end". If you don't have these "out-of-band signaling" channels and still want to reschedule resources if network is dropped somewhere, only thing left is SBD (watchdog-fencing - with poison-pill/shared-disk you would be using the communication with the shared disk as this kind of "out-of-band signaling") - if you want to stay with a single cluster - or booth if you can imagineto go with multiple clusters. >> So if the fence agent is invoked and 5 seconds later, it exits with the >> "success" RC, the cluster knows the peer is gone and that it can now >> safely begin recovery. >> >> >> -- >> Digimer >> Papers and Projects: https://alteeve.com/w/ >> "I am, somehow, less interested in the weight and convolutions of >> Einstein’s brain than in the near certainty that people of equal talent >> have lived and died in cotton fields and sweatshops." - Stephen Jay Gould > > > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/
_______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
