Re: [ClusterLabs] Stonith stops after vSphere restart

Andrei Borzenkov Thu, 22 Feb 2018 03:29:23 -0800

Stonith resource state should have no impact on actual stonith
operation. It only reflects whether monitor was successful or not and
serves as warning to administrator that something may be wrong. It
should automatically clear itself after failure-timeout has expired.


On Thu, Feb 22, 2018 at 1:58 PM,  <j...@disroot.org> wrote:
>
> Hi,
>
> I have a 2 node pacemaker cluster configured with the fence agent
> vmware_soap.
> Everything works fine until the vCenter is restarted. After that, stonith
> fails and stop.
>
> [root@node1 ~]# pcs status
> Cluster name: psqltest
> Stack: corosync
> Current DC: node2 (version 1.1.16-12.el7_4.7-94ff4df) - partition with
> quorum
> Last updated: Thu Feb 22 11:30:22 2018
> Last change: Mon Feb 19 09:28:37 2018 by root via crm_resource on node1
>
> 2 nodes configured
> 6 resources configured
>
> Online: [ node1 node2 ]
>
> Full list of resources:
>
> Master/Slave Set: ms_drbd_psqltest [drbd_psqltest]
> Masters: [ node1 ]
> Slaves: [ node2 ]
> Resource Group: pgsqltest
> psqltestfs (ocf::heartbeat:Filesystem): Started node1
> psqltest_vip (ocf::heartbeat:IPaddr2): Started node1
> postgresql-94 (ocf::heartbeat:pgsql): Started node1
> vmware_soap (stonith:fence_vmware_soap): Stopped
>
> Failed Actions:
> * vmware_soap_start_0 on node1 'unknown error' (1): call=38, status=Error,
> exitreason='none',
> last-rc-change='Thu Feb 22 10:55:46 2018', queued=0ms, exec=5374ms
> * vmware_soap_start_0 on node2 'unknown error' (1): call=56, status=Error,
> exitreason='none',
> last-rc-change='Thu Feb 22 10:55:39 2018', queued=0ms, exec=5479ms
>
> Daemon Status:
> corosync: active/enabled
> pacemaker: active/enabled
> pcsd: active/enabled
>
>
> [root@node1 ~]# pcs stonith show --full
> Resource: vmware_soap (class=stonith type=fence_vmware_soap)
> Attributes: inet4_only=1 ipaddr=192.168.1.1 ipport=443 login=MYDOMAIN\User
> passwd=mypass pcmk_host_list=node1,node2 power_wait=3 ssl_insecure=1 action=
> pcmk_list_timeout=120s pcmk_monitor_timeout=120s pcmk_status_timeout=120s
> Operations: monitor interval=60s (vmware_soap-monitor-interval-60s)
>
>
> I need to manually perform a "resource cleanup vmware_soap" to put it online
> again.
> Is there any way to do this automatically?.
> Is it possible to detect vSphere online again and enable stonith?.
>
> Thanks.
>
> _______________________________________________
> Users mailing list: Users@clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
_______________________________________________
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Stonith stops after vSphere restart

Reply via email to