Re: [ClusterLabs] Stonith stops after vSphere restart

2018-04-03 Thread jota
Hi again,

After restarting the vCenter, all worked as expected.
Thanks to all.

Have a nice day.

23 de febrero de 2018 7:59, j...@disroot.org escribió:

> Hi all,
> 
> Thanks for your responses.
> With your advice I was able to configure it. I still have to test its 
> operation. When it is
> possible to restart the vCenter, I will post the results.
> Have a nice weekend!
> 
> 22 de febrero de 2018 16:00, "Tomas Jelinek"  escribió:
> 
>> Try this:
>> 
>> pcs resource meta vmware_soap failure-timeout=
>> 
>> Tomas
>> 
>> Dne 22.2.2018 v 14:55 j...@disroot.org napsal(a):
>> 
>>> Hi,
>>> 
>>> I am trying to configure the failure-timeout for stonith, but I only can do 
>>> it for the other
>>> resources.
>>> When try to enable it for stonith, I get this error: "Error: resource 
>>> option(s): 'failure-timeout',
>>> are not recognized for resource type: 'stonith::fence_vmware_soap'".
>>> 
>>> Thanks.
>>> 
>>> 22 de febrero de 2018 13:46, "Andrei Borzenkov"  
>>> escribió:
>> 
>> On Thu, Feb 22, 2018 at 2:40 PM,  wrote:
>>> Thanks for the responses.
>>> 
>>> So, if I understand, this is the right behaviour and it does not affect to 
>>> the stonith mechanism.
>>> 
>>> If I remember correctly, the fault status persists for hours until I fix it 
>>> manually.
>>> Is there any way to modify the expiry time to clean itself?.
>> 
>> Yes, as mentioned set failure-timeout resource meta-attribute.
>>> 22 de febrero de 2018 12:28, "Andrei Borzenkov"  
>>> escribió:
>>> 
>>> Stonith resource state should have no impact on actual stonith
>>> operation. It only reflects whether monitor was successful or not and
>>> serves as warning to administrator that something may be wrong. It
>>> should automatically clear itself after failure-timeout has expired.
>>> 
>>> On Thu, Feb 22, 2018 at 1:58 PM,  wrote:
>>> 
>>> Hi,
>>> 
>>> I have a 2 node pacemaker cluster configured with the fence agent
>>> vmware_soap.
>>> Everything works fine until the vCenter is restarted. After that, stonith
>>> fails and stop.
>>> 
>>> [root@node1 ~]# pcs status
>>> Cluster name: psqltest
>>> Stack: corosync
>>> Current DC: node2 (version 1.1.16-12.el7_4.7-94ff4df) - partition with
>>> quorum
>>> Last updated: Thu Feb 22 11:30:22 2018
>>> Last change: Mon Feb 19 09:28:37 2018 by root via crm_resource on node1
>>> 
>>> 2 nodes configured
>>> 6 resources configured
>>> 
>>> Online: [ node1 node2 ]
>>> 
>>> Full list of resources:
>>> 
>>> Master/Slave Set: ms_drbd_psqltest [drbd_psqltest]
>>> Masters: [ node1 ]
>>> Slaves: [ node2 ]
>>> Resource Group: pgsqltest
>>> psqltestfs (ocf::heartbeat:Filesystem): Started node1
>>> psqltest_vip (ocf::heartbeat:IPaddr2): Started node1
>>> postgresql-94 (ocf::heartbeat:pgsql): Started node1
>>> vmware_soap (stonith:fence_vmware_soap): Stopped
>>> 
>>> Failed Actions:
>>> * vmware_soap_start_0 on node1 'unknown error' (1): call=38, status=Error,
>>> exitreason='none',
>>> last-rc-change='Thu Feb 22 10:55:46 2018', queued=0ms, exec=5374ms
>>> * vmware_soap_start_0 on node2 'unknown error' (1): call=56, status=Error,
>>> exitreason='none',
>>> last-rc-change='Thu Feb 22 10:55:39 2018', queued=0ms, exec=5479ms
>>> 
>>> Daemon Status:
>>> corosync: active/enabled
>>> pacemaker: active/enabled
>>> pcsd: active/enabled
>>> 
>>> [root@node1 ~]# pcs stonith show --full
>>> Resource: vmware_soap (class=stonith type=fence_vmware_soap)
>>> Attributes: inet4_only=1 ipaddr=192.168.1.1 ipport=443 login=MYDOMAIN\User
>>> passwd=mypass pcmk_host_list=node1,node2 power_wait=3 ssl_insecure=1 action=
>>> pcmk_list_timeout=120s pcmk_monitor_timeout=120s pcmk_status_timeout=120s
>>> Operations: monitor interval=60s (vmware_soap-monitor-interval-60s)
>>> 
>>> I need to manually perform a "resource cleanup vmware_soap" to put it online
>>> again.
>>> Is there any way to do this automatically?.
>>> Is it possible to detect vSphere online again and enable stonith?.
>>> 
>>> Thanks.
>>> 
>>> ___
>>> Users mailing list: Users@clusterlabs.org
>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>> 
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>> ___
>>> Users mailing list: Users@clusterlabs.org
>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>> 
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>> 
>>> ___
>>> Users mailing list: Users@clusterlabs.org
>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>> 
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>> 

Re: [ClusterLabs] Stonith stops after vSphere restart

2018-02-22 Thread jota
Hi all,

Thanks for your responses.
With your advice I was able to configure it. I still have to test its 
operation. When it is possible to restart the vCenter, I will post the results.
Have a nice weekend!


22 de febrero de 2018 16:00, "Tomas Jelinek"  escribió:

> Try this:
> 
> pcs resource meta vmware_soap failure-timeout=
> 
> Tomas
> 
> Dne 22.2.2018 v 14:55 j...@disroot.org napsal(a):
> 
>> Hi,
>> 
>> I am trying to configure the failure-timeout for stonith, but I only can do 
>> it for the other
>> resources.
>> When try to enable it for stonith, I get this error: "Error: resource 
>> option(s): 'failure-timeout',
>> are not recognized for resource type: 'stonith::fence_vmware_soap'".
>> 
>> Thanks.
>> 
>> 22 de febrero de 2018 13:46, "Andrei Borzenkov"  
>> escribió:
>> 
>>> On Thu, Feb 22, 2018 at 2:40 PM,  wrote:
>> 
>> Thanks for the responses.
>> 
>> So, if I understand, this is the right behaviour and it does not affect to 
>> the stonith mechanism.
>> 
>> If I remember correctly, the fault status persists for hours until I fix it 
>> manually.
>> Is there any way to modify the expiry time to clean itself?.
>>> Yes, as mentioned set failure-timeout resource meta-attribute.
>> 
>> 22 de febrero de 2018 12:28, "Andrei Borzenkov"  
>> escribió:
>> 
>> Stonith resource state should have no impact on actual stonith
>> operation. It only reflects whether monitor was successful or not and
>> serves as warning to administrator that something may be wrong. It
>> should automatically clear itself after failure-timeout has expired.
>> 
>> On Thu, Feb 22, 2018 at 1:58 PM,  wrote:
>> 
>> Hi,
>> 
>> I have a 2 node pacemaker cluster configured with the fence agent
>> vmware_soap.
>> Everything works fine until the vCenter is restarted. After that, stonith
>> fails and stop.
>> 
>> [root@node1 ~]# pcs status
>> Cluster name: psqltest
>> Stack: corosync
>> Current DC: node2 (version 1.1.16-12.el7_4.7-94ff4df) - partition with
>> quorum
>> Last updated: Thu Feb 22 11:30:22 2018
>> Last change: Mon Feb 19 09:28:37 2018 by root via crm_resource on node1
>> 
>> 2 nodes configured
>> 6 resources configured
>> 
>> Online: [ node1 node2 ]
>> 
>> Full list of resources:
>> 
>> Master/Slave Set: ms_drbd_psqltest [drbd_psqltest]
>> Masters: [ node1 ]
>> Slaves: [ node2 ]
>> Resource Group: pgsqltest
>> psqltestfs (ocf::heartbeat:Filesystem): Started node1
>> psqltest_vip (ocf::heartbeat:IPaddr2): Started node1
>> postgresql-94 (ocf::heartbeat:pgsql): Started node1
>> vmware_soap (stonith:fence_vmware_soap): Stopped
>> 
>> Failed Actions:
>> * vmware_soap_start_0 on node1 'unknown error' (1): call=38, status=Error,
>> exitreason='none',
>> last-rc-change='Thu Feb 22 10:55:46 2018', queued=0ms, exec=5374ms
>> * vmware_soap_start_0 on node2 'unknown error' (1): call=56, status=Error,
>> exitreason='none',
>> last-rc-change='Thu Feb 22 10:55:39 2018', queued=0ms, exec=5479ms
>> 
>> Daemon Status:
>> corosync: active/enabled
>> pacemaker: active/enabled
>> pcsd: active/enabled
>> 
>> [root@node1 ~]# pcs stonith show --full
>> Resource: vmware_soap (class=stonith type=fence_vmware_soap)
>> Attributes: inet4_only=1 ipaddr=192.168.1.1 ipport=443 login=MYDOMAIN\User
>> passwd=mypass pcmk_host_list=node1,node2 power_wait=3 ssl_insecure=1 action=
>> pcmk_list_timeout=120s pcmk_monitor_timeout=120s pcmk_status_timeout=120s
>> Operations: monitor interval=60s (vmware_soap-monitor-interval-60s)
>> 
>> I need to manually perform a "resource cleanup vmware_soap" to put it online
>> again.
>> Is there any way to do this automatically?.
>> Is it possible to detect vSphere online again and enable stonith?.
>> 
>> Thanks.
>> 
>> ___
>> Users mailing list: Users@clusterlabs.org
>> https://lists.clusterlabs.org/mailman/listinfo/users
>> 
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>> ___
>> Users mailing list: Users@clusterlabs.org
>> https://lists.clusterlabs.org/mailman/listinfo/users
>> 
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>> 
>> ___
>> Users mailing list: Users@clusterlabs.org
>> https://lists.clusterlabs.org/mailman/listinfo/users
>> 
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>> ___
>>> Users mailing list: Users@clusterlabs.org
>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>> 
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: 

Re: [ClusterLabs] Stonith stops after vSphere restart

2018-02-22 Thread Tomas Jelinek

Try this:

pcs resource meta vmware_soap failure-timeout=


Tomas


Dne 22.2.2018 v 14:55 j...@disroot.org napsal(a):

Hi,

I am trying to configure the failure-timeout for stonith, but I only can do it 
for the other resources.
When try to enable it for stonith, I get this error: "Error: resource option(s): 
'failure-timeout', are not recognized for resource type: 
'stonith::fence_vmware_soap'".

Thanks.

22 de febrero de 2018 13:46, "Andrei Borzenkov"  escribió:


On Thu, Feb 22, 2018 at 2:40 PM,  wrote:


Thanks for the responses.

So, if I understand, this is the right behaviour and it does not affect to the 
stonith mechanism.

If I remember correctly, the fault status persists for hours until I fix it 
manually.
Is there any way to modify the expiry time to clean itself?.


Yes, as mentioned set failure-timeout resource meta-attribute.


22 de febrero de 2018 12:28, "Andrei Borzenkov"  escribió:


Stonith resource state should have no impact on actual stonith
operation. It only reflects whether monitor was successful or not and
serves as warning to administrator that something may be wrong. It
should automatically clear itself after failure-timeout has expired.

On Thu, Feb 22, 2018 at 1:58 PM,  wrote:


Hi,

I have a 2 node pacemaker cluster configured with the fence agent
vmware_soap.
Everything works fine until the vCenter is restarted. After that, stonith
fails and stop.

[root@node1 ~]# pcs status
Cluster name: psqltest
Stack: corosync
Current DC: node2 (version 1.1.16-12.el7_4.7-94ff4df) - partition with
quorum
Last updated: Thu Feb 22 11:30:22 2018
Last change: Mon Feb 19 09:28:37 2018 by root via crm_resource on node1

2 nodes configured
6 resources configured

Online: [ node1 node2 ]

Full list of resources:

Master/Slave Set: ms_drbd_psqltest [drbd_psqltest]
Masters: [ node1 ]
Slaves: [ node2 ]
Resource Group: pgsqltest
psqltestfs (ocf::heartbeat:Filesystem): Started node1
psqltest_vip (ocf::heartbeat:IPaddr2): Started node1
postgresql-94 (ocf::heartbeat:pgsql): Started node1
vmware_soap (stonith:fence_vmware_soap): Stopped

Failed Actions:
* vmware_soap_start_0 on node1 'unknown error' (1): call=38, status=Error,
exitreason='none',
last-rc-change='Thu Feb 22 10:55:46 2018', queued=0ms, exec=5374ms
* vmware_soap_start_0 on node2 'unknown error' (1): call=56, status=Error,
exitreason='none',
last-rc-change='Thu Feb 22 10:55:39 2018', queued=0ms, exec=5479ms

Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled

[root@node1 ~]# pcs stonith show --full
Resource: vmware_soap (class=stonith type=fence_vmware_soap)
Attributes: inet4_only=1 ipaddr=192.168.1.1 ipport=443 login=MYDOMAIN\User
passwd=mypass pcmk_host_list=node1,node2 power_wait=3 ssl_insecure=1 action=
pcmk_list_timeout=120s pcmk_monitor_timeout=120s pcmk_status_timeout=120s
Operations: monitor interval=60s (vmware_soap-monitor-interval-60s)

I need to manually perform a "resource cleanup vmware_soap" to put it online
again.
Is there any way to do this automatically?.
Is it possible to detect vSphere online again and enable stonith?.

Thanks.

___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Stonith stops after vSphere restart

2018-02-22 Thread Klaus Wenninger
On 02/22/2018 02:55 PM, j...@disroot.org wrote:
> Hi,
>
> I am trying to configure the failure-timeout for stonith, but I only can do 
> it for the other resources.
> When try to enable it for stonith, I get this error: "Error: resource 
> option(s): 'failure-timeout', are not recognized for resource type: 
> 'stonith::fence_vmware_soap'".

It is a meta-attribute thus 'pcs stonith update ... meta
failure-timeout=...' should work.
Although I'm not 100% sure if it is being adhered properly.

Regards,
Klaus
 
>
> Thanks.
>
> 22 de febrero de 2018 13:46, "Andrei Borzenkov"  
> escribió:
>
>> On Thu, Feb 22, 2018 at 2:40 PM,  wrote:
>>
>>> Thanks for the responses.
>>>
>>> So, if I understand, this is the right behaviour and it does not affect to 
>>> the stonith mechanism.
>>>
>>> If I remember correctly, the fault status persists for hours until I fix it 
>>> manually.
>>> Is there any way to modify the expiry time to clean itself?.
>> Yes, as mentioned set failure-timeout resource meta-attribute.
>>
>>> 22 de febrero de 2018 12:28, "Andrei Borzenkov"  
>>> escribió:
>>>
 Stonith resource state should have no impact on actual stonith
 operation. It only reflects whether monitor was successful or not and
 serves as warning to administrator that something may be wrong. It
 should automatically clear itself after failure-timeout has expired.

 On Thu, Feb 22, 2018 at 1:58 PM,  wrote:
>>> Hi,
>>>
>>> I have a 2 node pacemaker cluster configured with the fence agent
>>> vmware_soap.
>>> Everything works fine until the vCenter is restarted. After that, stonith
>>> fails and stop.
>>>
>>> [root@node1 ~]# pcs status
>>> Cluster name: psqltest
>>> Stack: corosync
>>> Current DC: node2 (version 1.1.16-12.el7_4.7-94ff4df) - partition with
>>> quorum
>>> Last updated: Thu Feb 22 11:30:22 2018
>>> Last change: Mon Feb 19 09:28:37 2018 by root via crm_resource on node1
>>>
>>> 2 nodes configured
>>> 6 resources configured
>>>
>>> Online: [ node1 node2 ]
>>>
>>> Full list of resources:
>>>
>>> Master/Slave Set: ms_drbd_psqltest [drbd_psqltest]
>>> Masters: [ node1 ]
>>> Slaves: [ node2 ]
>>> Resource Group: pgsqltest
>>> psqltestfs (ocf::heartbeat:Filesystem): Started node1
>>> psqltest_vip (ocf::heartbeat:IPaddr2): Started node1
>>> postgresql-94 (ocf::heartbeat:pgsql): Started node1
>>> vmware_soap (stonith:fence_vmware_soap): Stopped
>>>
>>> Failed Actions:
>>> * vmware_soap_start_0 on node1 'unknown error' (1): call=38, status=Error,
>>> exitreason='none',
>>> last-rc-change='Thu Feb 22 10:55:46 2018', queued=0ms, exec=5374ms
>>> * vmware_soap_start_0 on node2 'unknown error' (1): call=56, status=Error,
>>> exitreason='none',
>>> last-rc-change='Thu Feb 22 10:55:39 2018', queued=0ms, exec=5479ms
>>>
>>> Daemon Status:
>>> corosync: active/enabled
>>> pacemaker: active/enabled
>>> pcsd: active/enabled
>>>
>>> [root@node1 ~]# pcs stonith show --full
>>> Resource: vmware_soap (class=stonith type=fence_vmware_soap)
>>> Attributes: inet4_only=1 ipaddr=192.168.1.1 ipport=443 login=MYDOMAIN\User
>>> passwd=mypass pcmk_host_list=node1,node2 power_wait=3 ssl_insecure=1 action=
>>> pcmk_list_timeout=120s pcmk_monitor_timeout=120s pcmk_status_timeout=120s
>>> Operations: monitor interval=60s (vmware_soap-monitor-interval-60s)
>>>
>>> I need to manually perform a "resource cleanup vmware_soap" to put it online
>>> again.
>>> Is there any way to do this automatically?.
>>> Is it possible to detect vSphere online again and enable stonith?.
>>>
>>> Thanks.
>>>
>>> ___
>>> Users mailing list: Users@clusterlabs.org
>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
 ___
 Users mailing list: Users@clusterlabs.org
 https://lists.clusterlabs.org/mailman/listinfo/users

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org
>>> ___
>>> Users mailing list: Users@clusterlabs.org
>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>> ___
>> Users mailing list: Users@clusterlabs.org
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
> ___
> Users mailing list: Users@clusterlabs.org
> 

Re: [ClusterLabs] Stonith stops after vSphere restart

2018-02-22 Thread jota
Hi,

I am trying to configure the failure-timeout for stonith, but I only can do it 
for the other resources.
When try to enable it for stonith, I get this error: "Error: resource 
option(s): 'failure-timeout', are not recognized for resource type: 
'stonith::fence_vmware_soap'".

Thanks.

22 de febrero de 2018 13:46, "Andrei Borzenkov"  escribió:

> On Thu, Feb 22, 2018 at 2:40 PM,  wrote:
> 
>> Thanks for the responses.
>> 
>> So, if I understand, this is the right behaviour and it does not affect to 
>> the stonith mechanism.
>> 
>> If I remember correctly, the fault status persists for hours until I fix it 
>> manually.
>> Is there any way to modify the expiry time to clean itself?.
> 
> Yes, as mentioned set failure-timeout resource meta-attribute.
> 
>> 22 de febrero de 2018 12:28, "Andrei Borzenkov"  
>> escribió:
>> 
>>> Stonith resource state should have no impact on actual stonith
>>> operation. It only reflects whether monitor was successful or not and
>>> serves as warning to administrator that something may be wrong. It
>>> should automatically clear itself after failure-timeout has expired.
>>> 
>>> On Thu, Feb 22, 2018 at 1:58 PM,  wrote:
>> 
>> Hi,
>> 
>> I have a 2 node pacemaker cluster configured with the fence agent
>> vmware_soap.
>> Everything works fine until the vCenter is restarted. After that, stonith
>> fails and stop.
>> 
>> [root@node1 ~]# pcs status
>> Cluster name: psqltest
>> Stack: corosync
>> Current DC: node2 (version 1.1.16-12.el7_4.7-94ff4df) - partition with
>> quorum
>> Last updated: Thu Feb 22 11:30:22 2018
>> Last change: Mon Feb 19 09:28:37 2018 by root via crm_resource on node1
>> 
>> 2 nodes configured
>> 6 resources configured
>> 
>> Online: [ node1 node2 ]
>> 
>> Full list of resources:
>> 
>> Master/Slave Set: ms_drbd_psqltest [drbd_psqltest]
>> Masters: [ node1 ]
>> Slaves: [ node2 ]
>> Resource Group: pgsqltest
>> psqltestfs (ocf::heartbeat:Filesystem): Started node1
>> psqltest_vip (ocf::heartbeat:IPaddr2): Started node1
>> postgresql-94 (ocf::heartbeat:pgsql): Started node1
>> vmware_soap (stonith:fence_vmware_soap): Stopped
>> 
>> Failed Actions:
>> * vmware_soap_start_0 on node1 'unknown error' (1): call=38, status=Error,
>> exitreason='none',
>> last-rc-change='Thu Feb 22 10:55:46 2018', queued=0ms, exec=5374ms
>> * vmware_soap_start_0 on node2 'unknown error' (1): call=56, status=Error,
>> exitreason='none',
>> last-rc-change='Thu Feb 22 10:55:39 2018', queued=0ms, exec=5479ms
>> 
>> Daemon Status:
>> corosync: active/enabled
>> pacemaker: active/enabled
>> pcsd: active/enabled
>> 
>> [root@node1 ~]# pcs stonith show --full
>> Resource: vmware_soap (class=stonith type=fence_vmware_soap)
>> Attributes: inet4_only=1 ipaddr=192.168.1.1 ipport=443 login=MYDOMAIN\User
>> passwd=mypass pcmk_host_list=node1,node2 power_wait=3 ssl_insecure=1 action=
>> pcmk_list_timeout=120s pcmk_monitor_timeout=120s pcmk_status_timeout=120s
>> Operations: monitor interval=60s (vmware_soap-monitor-interval-60s)
>> 
>> I need to manually perform a "resource cleanup vmware_soap" to put it online
>> again.
>> Is there any way to do this automatically?.
>> Is it possible to detect vSphere online again and enable stonith?.
>> 
>> Thanks.
>> 
>> ___
>> Users mailing list: Users@clusterlabs.org
>> https://lists.clusterlabs.org/mailman/listinfo/users
>> 
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>> ___
>>> Users mailing list: Users@clusterlabs.org
>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>> 
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>> 
>> ___
>> Users mailing list: Users@clusterlabs.org
>> https://lists.clusterlabs.org/mailman/listinfo/users
>> 
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
> 
> ___
> Users mailing list: Users@clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Stonith stops after vSphere restart

2018-02-22 Thread Andrei Borzenkov
On Thu, Feb 22, 2018 at 2:40 PM,   wrote:
> Thanks for the responses.
>
> So, if I understand, this is the right behaviour and it does not affect to 
> the stonith mechanism.
>
> If I remember correctly, the fault status persists for hours until I fix it 
> manually.
> Is there any way to modify the expiry time to clean itself?.
>

Yes, as mentioned set failure-timeout resource meta-attribute.

> 22 de febrero de 2018 12:28, "Andrei Borzenkov"  
> escribió:
>
>> Stonith resource state should have no impact on actual stonith
>> operation. It only reflects whether monitor was successful or not and
>> serves as warning to administrator that something may be wrong. It
>> should automatically clear itself after failure-timeout has expired.
>>
>> On Thu, Feb 22, 2018 at 1:58 PM,  wrote:
>>
>>> Hi,
>>>
>>> I have a 2 node pacemaker cluster configured with the fence agent
>>> vmware_soap.
>>> Everything works fine until the vCenter is restarted. After that, stonith
>>> fails and stop.
>>>
>>> [root@node1 ~]# pcs status
>>> Cluster name: psqltest
>>> Stack: corosync
>>> Current DC: node2 (version 1.1.16-12.el7_4.7-94ff4df) - partition with
>>> quorum
>>> Last updated: Thu Feb 22 11:30:22 2018
>>> Last change: Mon Feb 19 09:28:37 2018 by root via crm_resource on node1
>>>
>>> 2 nodes configured
>>> 6 resources configured
>>>
>>> Online: [ node1 node2 ]
>>>
>>> Full list of resources:
>>>
>>> Master/Slave Set: ms_drbd_psqltest [drbd_psqltest]
>>> Masters: [ node1 ]
>>> Slaves: [ node2 ]
>>> Resource Group: pgsqltest
>>> psqltestfs (ocf::heartbeat:Filesystem): Started node1
>>> psqltest_vip (ocf::heartbeat:IPaddr2): Started node1
>>> postgresql-94 (ocf::heartbeat:pgsql): Started node1
>>> vmware_soap (stonith:fence_vmware_soap): Stopped
>>>
>>> Failed Actions:
>>> * vmware_soap_start_0 on node1 'unknown error' (1): call=38, status=Error,
>>> exitreason='none',
>>> last-rc-change='Thu Feb 22 10:55:46 2018', queued=0ms, exec=5374ms
>>> * vmware_soap_start_0 on node2 'unknown error' (1): call=56, status=Error,
>>> exitreason='none',
>>> last-rc-change='Thu Feb 22 10:55:39 2018', queued=0ms, exec=5479ms
>>>
>>> Daemon Status:
>>> corosync: active/enabled
>>> pacemaker: active/enabled
>>> pcsd: active/enabled
>>>
>>> [root@node1 ~]# pcs stonith show --full
>>> Resource: vmware_soap (class=stonith type=fence_vmware_soap)
>>> Attributes: inet4_only=1 ipaddr=192.168.1.1 ipport=443 login=MYDOMAIN\User
>>> passwd=mypass pcmk_host_list=node1,node2 power_wait=3 ssl_insecure=1 action=
>>> pcmk_list_timeout=120s pcmk_monitor_timeout=120s pcmk_status_timeout=120s
>>> Operations: monitor interval=60s (vmware_soap-monitor-interval-60s)
>>>
>>> I need to manually perform a "resource cleanup vmware_soap" to put it online
>>> again.
>>> Is there any way to do this automatically?.
>>> Is it possible to detect vSphere online again and enable stonith?.
>>>
>>> Thanks.
>>>
>>> ___
>>> Users mailing list: Users@clusterlabs.org
>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>
>> ___
>> Users mailing list: Users@clusterlabs.org
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
> ___
> Users mailing list: Users@clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Stonith stops after vSphere restart

2018-02-22 Thread jota
Thanks for the responses.

So, if I understand, this is the right behaviour and it does not affect to the 
stonith mechanism.

If I remember correctly, the fault status persists for hours until I fix it 
manually.
Is there any way to modify the expiry time to clean itself?.

22 de febrero de 2018 12:28, "Andrei Borzenkov"  escribió:

> Stonith resource state should have no impact on actual stonith
> operation. It only reflects whether monitor was successful or not and
> serves as warning to administrator that something may be wrong. It
> should automatically clear itself after failure-timeout has expired.
> 
> On Thu, Feb 22, 2018 at 1:58 PM,  wrote:
> 
>> Hi,
>> 
>> I have a 2 node pacemaker cluster configured with the fence agent
>> vmware_soap.
>> Everything works fine until the vCenter is restarted. After that, stonith
>> fails and stop.
>> 
>> [root@node1 ~]# pcs status
>> Cluster name: psqltest
>> Stack: corosync
>> Current DC: node2 (version 1.1.16-12.el7_4.7-94ff4df) - partition with
>> quorum
>> Last updated: Thu Feb 22 11:30:22 2018
>> Last change: Mon Feb 19 09:28:37 2018 by root via crm_resource on node1
>> 
>> 2 nodes configured
>> 6 resources configured
>> 
>> Online: [ node1 node2 ]
>> 
>> Full list of resources:
>> 
>> Master/Slave Set: ms_drbd_psqltest [drbd_psqltest]
>> Masters: [ node1 ]
>> Slaves: [ node2 ]
>> Resource Group: pgsqltest
>> psqltestfs (ocf::heartbeat:Filesystem): Started node1
>> psqltest_vip (ocf::heartbeat:IPaddr2): Started node1
>> postgresql-94 (ocf::heartbeat:pgsql): Started node1
>> vmware_soap (stonith:fence_vmware_soap): Stopped
>> 
>> Failed Actions:
>> * vmware_soap_start_0 on node1 'unknown error' (1): call=38, status=Error,
>> exitreason='none',
>> last-rc-change='Thu Feb 22 10:55:46 2018', queued=0ms, exec=5374ms
>> * vmware_soap_start_0 on node2 'unknown error' (1): call=56, status=Error,
>> exitreason='none',
>> last-rc-change='Thu Feb 22 10:55:39 2018', queued=0ms, exec=5479ms
>> 
>> Daemon Status:
>> corosync: active/enabled
>> pacemaker: active/enabled
>> pcsd: active/enabled
>> 
>> [root@node1 ~]# pcs stonith show --full
>> Resource: vmware_soap (class=stonith type=fence_vmware_soap)
>> Attributes: inet4_only=1 ipaddr=192.168.1.1 ipport=443 login=MYDOMAIN\User
>> passwd=mypass pcmk_host_list=node1,node2 power_wait=3 ssl_insecure=1 action=
>> pcmk_list_timeout=120s pcmk_monitor_timeout=120s pcmk_status_timeout=120s
>> Operations: monitor interval=60s (vmware_soap-monitor-interval-60s)
>> 
>> I need to manually perform a "resource cleanup vmware_soap" to put it online
>> again.
>> Is there any way to do this automatically?.
>> Is it possible to detect vSphere online again and enable stonith?.
>> 
>> Thanks.
>> 
>> ___
>> Users mailing list: Users@clusterlabs.org
>> https://lists.clusterlabs.org/mailman/listinfo/users
>> 
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
> 
> ___
> Users mailing list: Users@clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Stonith stops after vSphere restart

2018-02-22 Thread Andrei Borzenkov
Stonith resource state should have no impact on actual stonith
operation. It only reflects whether monitor was successful or not and
serves as warning to administrator that something may be wrong. It
should automatically clear itself after failure-timeout has expired.

On Thu, Feb 22, 2018 at 1:58 PM,   wrote:
>
> Hi,
>
> I have a 2 node pacemaker cluster configured with the fence agent
> vmware_soap.
> Everything works fine until the vCenter is restarted. After that, stonith
> fails and stop.
>
> [root@node1 ~]# pcs status
> Cluster name: psqltest
> Stack: corosync
> Current DC: node2 (version 1.1.16-12.el7_4.7-94ff4df) - partition with
> quorum
> Last updated: Thu Feb 22 11:30:22 2018
> Last change: Mon Feb 19 09:28:37 2018 by root via crm_resource on node1
>
> 2 nodes configured
> 6 resources configured
>
> Online: [ node1 node2 ]
>
> Full list of resources:
>
> Master/Slave Set: ms_drbd_psqltest [drbd_psqltest]
> Masters: [ node1 ]
> Slaves: [ node2 ]
> Resource Group: pgsqltest
> psqltestfs (ocf::heartbeat:Filesystem): Started node1
> psqltest_vip (ocf::heartbeat:IPaddr2): Started node1
> postgresql-94 (ocf::heartbeat:pgsql): Started node1
> vmware_soap (stonith:fence_vmware_soap): Stopped
>
> Failed Actions:
> * vmware_soap_start_0 on node1 'unknown error' (1): call=38, status=Error,
> exitreason='none',
> last-rc-change='Thu Feb 22 10:55:46 2018', queued=0ms, exec=5374ms
> * vmware_soap_start_0 on node2 'unknown error' (1): call=56, status=Error,
> exitreason='none',
> last-rc-change='Thu Feb 22 10:55:39 2018', queued=0ms, exec=5479ms
>
> Daemon Status:
> corosync: active/enabled
> pacemaker: active/enabled
> pcsd: active/enabled
>
>
> [root@node1 ~]# pcs stonith show --full
> Resource: vmware_soap (class=stonith type=fence_vmware_soap)
> Attributes: inet4_only=1 ipaddr=192.168.1.1 ipport=443 login=MYDOMAIN\User
> passwd=mypass pcmk_host_list=node1,node2 power_wait=3 ssl_insecure=1 action=
> pcmk_list_timeout=120s pcmk_monitor_timeout=120s pcmk_status_timeout=120s
> Operations: monitor interval=60s (vmware_soap-monitor-interval-60s)
>
>
> I need to manually perform a "resource cleanup vmware_soap" to put it online
> again.
> Is there any way to do this automatically?.
> Is it possible to detect vSphere online again and enable stonith?.
>
> Thanks.
>
> ___
> Users mailing list: Users@clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Stonith stops after vSphere restart

2018-02-22 Thread Marek Grac
Hi,

On Thu, Feb 22, 2018 at 11:58 AM,  wrote:

>
> Hi,
>
> I have a 2 node pacemaker cluster configured with the fence agent
> vmware_soap.
> Everything works fine until the vCenter is restarted. After that, stonith
> fails and stop.
>

This is expected as we run 'monitor' action to find out if fence device is
working. I assume that it is not responding when vCenter is restarting. If
your fencing device fails then manual intervention makes sense as you have
to have fencing working  in order to prevent data corruption.

m,


>
> [root@node1 ~]# pcs status
> Cluster name: psqltest
> Stack: corosync
> Current DC: node2 (version 1.1.16-12.el7_4.7-94ff4df) - partition with
> quorum
> Last updated: Thu Feb 22 11:30:22 2018
> Last change: Mon Feb 19 09:28:37 2018 by root via crm_resource on node1
>
> 2 nodes configured
> 6 resources configured
>
> Online: [ node1 node2 ]
>
> Full list of resources:
>
> Master/Slave Set: ms_drbd_psqltest [drbd_psqltest]
> Masters: [ node1 ]
> Slaves: [ node2 ]
> Resource Group: pgsqltest
> psqltestfs (ocf::heartbeat:Filesystem): Started node1
> psqltest_vip (ocf::heartbeat:IPaddr2): Started node1
> postgresql-94 (ocf::heartbeat:pgsql): Started node1
> vmware_soap (stonith:fence_vmware_soap): Stopped
>
> Failed Actions:
> * vmware_soap_start_0 on node1 'unknown error' (1): call=38, status=Error,
> exitreason='none',
> last-rc-change='Thu Feb 22 10:55:46 2018', queued=0ms, exec=5374ms
> * vmware_soap_start_0 on node2 'unknown error' (1): call=56, status=Error,
> exitreason='none',
> last-rc-change='Thu Feb 22 10:55:39 2018', queued=0ms, exec=5479ms
>
> Daemon Status:
> corosync: active/enabled
> pacemaker: active/enabled
> pcsd: active/enabled
>
>
> [root@node1 ~]# pcs stonith show --full
> Resource: vmware_soap (class=stonith type=fence_vmware_soap)
> Attributes: inet4_only=1 ipaddr=192.168.1.1 ipport=443 login=MYDOMAIN\User
> passwd=mypass pcmk_host_list=node1,node2 power_wait=3 ssl_insecure=1
> action= pcmk_list_timeout=120s pcmk_monitor_timeout=120s
> pcmk_status_timeout=120s
> Operations: monitor interval=60s (vmware_soap-monitor-interval-60s)
>
>
> I need to manually perform a "resource cleanup vmware_soap" to put it
> online again.
> Is there any way to do this automatically?.
> Is it possible to detect vSphere online again and enable stonith?.
>
> Thanks.
>
> ___
> Users mailing list: Users@clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
>
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org