On Sun, Feb 28, 2021 at 03:34:20PM +0000, Eric Robinson wrote: > 001db02b rebooted. After it came back up, I tried it in the other direction. > > On node 001db02b, the command... > > # pcs stonith fence 001db02a > > ...produced output... > > Error: unable to fence '001db02a'. > > However, node 001db02a did get restarted! > > We also saw this error... > > Failed Actions: > * stonith-001db02ab_start_0 on 001db02a 'unknown error' (1): call=70, > status=Timed Out, exitreason='', > last-rc-change='Sun Feb 28 10:11:10 2021', queued=0ms, exec=20014ms > > When that happens, does Pacemaker take over the other node's resources, or > not?
Cloud fencing usually requires a higher timeout (20s reported here). Microsoft seems to suggest the following setup: # pcs property set stonith-timeout=900 # pcs stonith create rsc_st_azure fence_azure_arm username="login ID" password="password" resourceGroup="resource group" tenantId="tenant ID" subscriptionId="subscription id" pcmk_host_map="prod-cl1-0:prod-cl1-0-vm-name;prod-cl1-1:prod-cl1-1-vm-name" power_timeout=240 pcmk_reboot_timeout=900 pcmk_monitor_timeout=120 pcmk_monitor_retries=4 pcmk_action_limit=3 op monitor interval=3600 https://docs.microsoft.com/en-us/azure/virtual-machines/workloads/sap/high-availability-guide-rhel-pacemaker -- Valentin _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/