Hi there, first of all thank you both for your suggestions and observations and apologies for my late reply.
I will check the logs on both hosts (although only one of them seems to be the issue) and will revert with any findings. Just to confirm the error message for the monitor operation: It seems that host zc-mail-2.zylacloud.com has a connection timeout to monitor the resource fence_zc-mail-1_virsh right? My question here is, what is the monitor operation doing to confirm that the monitor operation is successful? Is it doing the same operation as specified in the stonith resource and expecting a particular exit code? Thanks once again -----Original Message----- From: Dan Swartzendruber <[email protected]<mailto:dan%20swartzendruber%20%[email protected]%3e>> To: Cluster Labs - All topics related to open-source clustering welcomed <[email protected]<mailto:cluster%20labs%20-%20all%20topics%20related%20to%20open-source%20clustering%20welcomed%20%[email protected]%3e>> Cc: Luke Camilleri <[email protected]<mailto:luke%20camilleri%20%[email protected]%3e>> Subject: Re: [ClusterLabs] connection timed out fence_virsh monitor stonith Date: Mon, 24 Feb 2020 12:24:16 -0500 On 2020-02-24 12:17, Strahil Nikolov wrote: On February 24, 2020 4:56:07 PM GMT+02:00, Luke Camilleri <[email protected]<mailto:[email protected]>> wrote: Hello users, I would like to ask for assistance on the below setup please, mainly on the monitor fence timeout: I notice that the issue happens at 00:00 on both days . Have you checked for a backup or other cron job that is 'overloading' the virtualization host ? This is a very good point. I had a similar problem with a vsphere cluster. Two hyper-converged storage appliances. I used the fence-vmware-rest (or soap) stonith agent to fence the storage apps. Worked just fine. Until the vcenter server appliance got busy doing something or other. Next thing I know, I'm getting stonith agent timeouts. I ended up switching to fence_scsi. Not sure there is a good answer. I saw on a vmware forum a recommendation to increase the stonith timeout, but the recommended timeout was close to a minute, which is enough to be a problem for the VMs in that cluster...
_______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
