Re: [ClusterLabs] Stonith stops after vSphere restart
Hi again, After restarting the vCenter, all worked as expected. Thanks to all. Have a nice day. 23 de febrero de 2018 7:59, j...@disroot.org escribió: > Hi all, > > Thanks for your responses. > With your advice I was able to configure it. I still have to test its > operation. When it is > possible to restart the vCenter, I will post the results. > Have a nice weekend! > > 22 de febrero de 2018 16:00, "Tomas Jelinek"escribió: > >> Try this: >> >> pcs resource meta vmware_soap failure-timeout= >> >> Tomas >> >> Dne 22.2.2018 v 14:55 j...@disroot.org napsal(a): >> >>> Hi, >>> >>> I am trying to configure the failure-timeout for stonith, but I only can do >>> it for the other >>> resources. >>> When try to enable it for stonith, I get this error: "Error: resource >>> option(s): 'failure-timeout', >>> are not recognized for resource type: 'stonith::fence_vmware_soap'". >>> >>> Thanks. >>> >>> 22 de febrero de 2018 13:46, "Andrei Borzenkov" >>> escribió: >> >> On Thu, Feb 22, 2018 at 2:40 PM, wrote: >>> Thanks for the responses. >>> >>> So, if I understand, this is the right behaviour and it does not affect to >>> the stonith mechanism. >>> >>> If I remember correctly, the fault status persists for hours until I fix it >>> manually. >>> Is there any way to modify the expiry time to clean itself?. >> >> Yes, as mentioned set failure-timeout resource meta-attribute. >>> 22 de febrero de 2018 12:28, "Andrei Borzenkov" >>> escribió: >>> >>> Stonith resource state should have no impact on actual stonith >>> operation. It only reflects whether monitor was successful or not and >>> serves as warning to administrator that something may be wrong. It >>> should automatically clear itself after failure-timeout has expired. >>> >>> On Thu, Feb 22, 2018 at 1:58 PM, wrote: >>> >>> Hi, >>> >>> I have a 2 node pacemaker cluster configured with the fence agent >>> vmware_soap. >>> Everything works fine until the vCenter is restarted. After that, stonith >>> fails and stop. >>> >>> [root@node1 ~]# pcs status >>> Cluster name: psqltest >>> Stack: corosync >>> Current DC: node2 (version 1.1.16-12.el7_4.7-94ff4df) - partition with >>> quorum >>> Last updated: Thu Feb 22 11:30:22 2018 >>> Last change: Mon Feb 19 09:28:37 2018 by root via crm_resource on node1 >>> >>> 2 nodes configured >>> 6 resources configured >>> >>> Online: [ node1 node2 ] >>> >>> Full list of resources: >>> >>> Master/Slave Set: ms_drbd_psqltest [drbd_psqltest] >>> Masters: [ node1 ] >>> Slaves: [ node2 ] >>> Resource Group: pgsqltest >>> psqltestfs (ocf::heartbeat:Filesystem): Started node1 >>> psqltest_vip (ocf::heartbeat:IPaddr2): Started node1 >>> postgresql-94 (ocf::heartbeat:pgsql): Started node1 >>> vmware_soap (stonith:fence_vmware_soap): Stopped >>> >>> Failed Actions: >>> * vmware_soap_start_0 on node1 'unknown error' (1): call=38, status=Error, >>> exitreason='none', >>> last-rc-change='Thu Feb 22 10:55:46 2018', queued=0ms, exec=5374ms >>> * vmware_soap_start_0 on node2 'unknown error' (1): call=56, status=Error, >>> exitreason='none', >>> last-rc-change='Thu Feb 22 10:55:39 2018', queued=0ms, exec=5479ms >>> >>> Daemon Status: >>> corosync: active/enabled >>> pacemaker: active/enabled >>> pcsd: active/enabled >>> >>> [root@node1 ~]# pcs stonith show --full >>> Resource: vmware_soap (class=stonith type=fence_vmware_soap) >>> Attributes: inet4_only=1 ipaddr=192.168.1.1 ipport=443 login=MYDOMAIN\User >>> passwd=mypass pcmk_host_list=node1,node2 power_wait=3 ssl_insecure=1 action= >>> pcmk_list_timeout=120s pcmk_monitor_timeout=120s pcmk_status_timeout=120s >>> Operations: monitor interval=60s (vmware_soap-monitor-interval-60s) >>> >>> I need to manually perform a "resource cleanup vmware_soap" to put it online >>> again. >>> Is there any way to do this automatically?. >>> Is it possible to detect vSphere online again and enable stonith?. >>> >>> Thanks. >>> >>> ___ >>> Users mailing list: Users@clusterlabs.org >>> https://lists.clusterlabs.org/mailman/listinfo/users >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org >>> ___ >>> Users mailing list: Users@clusterlabs.org >>> https://lists.clusterlabs.org/mailman/listinfo/users >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org >>> >>> ___ >>> Users mailing list: Users@clusterlabs.org >>> https://lists.clusterlabs.org/mailman/listinfo/users >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org >>
Re: [ClusterLabs] Stonith stops after vSphere restart
Hi all, Thanks for your responses. With your advice I was able to configure it. I still have to test its operation. When it is possible to restart the vCenter, I will post the results. Have a nice weekend! 22 de febrero de 2018 16:00, "Tomas Jelinek"escribió: > Try this: > > pcs resource meta vmware_soap failure-timeout= > > Tomas > > Dne 22.2.2018 v 14:55 j...@disroot.org napsal(a): > >> Hi, >> >> I am trying to configure the failure-timeout for stonith, but I only can do >> it for the other >> resources. >> When try to enable it for stonith, I get this error: "Error: resource >> option(s): 'failure-timeout', >> are not recognized for resource type: 'stonith::fence_vmware_soap'". >> >> Thanks. >> >> 22 de febrero de 2018 13:46, "Andrei Borzenkov" >> escribió: >> >>> On Thu, Feb 22, 2018 at 2:40 PM, wrote: >> >> Thanks for the responses. >> >> So, if I understand, this is the right behaviour and it does not affect to >> the stonith mechanism. >> >> If I remember correctly, the fault status persists for hours until I fix it >> manually. >> Is there any way to modify the expiry time to clean itself?. >>> Yes, as mentioned set failure-timeout resource meta-attribute. >> >> 22 de febrero de 2018 12:28, "Andrei Borzenkov" >> escribió: >> >> Stonith resource state should have no impact on actual stonith >> operation. It only reflects whether monitor was successful or not and >> serves as warning to administrator that something may be wrong. It >> should automatically clear itself after failure-timeout has expired. >> >> On Thu, Feb 22, 2018 at 1:58 PM, wrote: >> >> Hi, >> >> I have a 2 node pacemaker cluster configured with the fence agent >> vmware_soap. >> Everything works fine until the vCenter is restarted. After that, stonith >> fails and stop. >> >> [root@node1 ~]# pcs status >> Cluster name: psqltest >> Stack: corosync >> Current DC: node2 (version 1.1.16-12.el7_4.7-94ff4df) - partition with >> quorum >> Last updated: Thu Feb 22 11:30:22 2018 >> Last change: Mon Feb 19 09:28:37 2018 by root via crm_resource on node1 >> >> 2 nodes configured >> 6 resources configured >> >> Online: [ node1 node2 ] >> >> Full list of resources: >> >> Master/Slave Set: ms_drbd_psqltest [drbd_psqltest] >> Masters: [ node1 ] >> Slaves: [ node2 ] >> Resource Group: pgsqltest >> psqltestfs (ocf::heartbeat:Filesystem): Started node1 >> psqltest_vip (ocf::heartbeat:IPaddr2): Started node1 >> postgresql-94 (ocf::heartbeat:pgsql): Started node1 >> vmware_soap (stonith:fence_vmware_soap): Stopped >> >> Failed Actions: >> * vmware_soap_start_0 on node1 'unknown error' (1): call=38, status=Error, >> exitreason='none', >> last-rc-change='Thu Feb 22 10:55:46 2018', queued=0ms, exec=5374ms >> * vmware_soap_start_0 on node2 'unknown error' (1): call=56, status=Error, >> exitreason='none', >> last-rc-change='Thu Feb 22 10:55:39 2018', queued=0ms, exec=5479ms >> >> Daemon Status: >> corosync: active/enabled >> pacemaker: active/enabled >> pcsd: active/enabled >> >> [root@node1 ~]# pcs stonith show --full >> Resource: vmware_soap (class=stonith type=fence_vmware_soap) >> Attributes: inet4_only=1 ipaddr=192.168.1.1 ipport=443 login=MYDOMAIN\User >> passwd=mypass pcmk_host_list=node1,node2 power_wait=3 ssl_insecure=1 action= >> pcmk_list_timeout=120s pcmk_monitor_timeout=120s pcmk_status_timeout=120s >> Operations: monitor interval=60s (vmware_soap-monitor-interval-60s) >> >> I need to manually perform a "resource cleanup vmware_soap" to put it online >> again. >> Is there any way to do this automatically?. >> Is it possible to detect vSphere online again and enable stonith?. >> >> Thanks. >> >> ___ >> Users mailing list: Users@clusterlabs.org >> https://lists.clusterlabs.org/mailman/listinfo/users >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org >> ___ >> Users mailing list: Users@clusterlabs.org >> https://lists.clusterlabs.org/mailman/listinfo/users >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org >> >> ___ >> Users mailing list: Users@clusterlabs.org >> https://lists.clusterlabs.org/mailman/listinfo/users >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org >>> ___ >>> Users mailing list: Users@clusterlabs.org >>> https://lists.clusterlabs.org/mailman/listinfo/users >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs:
Re: [ClusterLabs] Stonith stops after vSphere restart
Try this: pcs resource meta vmware_soap failure-timeout= Tomas Dne 22.2.2018 v 14:55 j...@disroot.org napsal(a): Hi, I am trying to configure the failure-timeout for stonith, but I only can do it for the other resources. When try to enable it for stonith, I get this error: "Error: resource option(s): 'failure-timeout', are not recognized for resource type: 'stonith::fence_vmware_soap'". Thanks. 22 de febrero de 2018 13:46, "Andrei Borzenkov"escribió: On Thu, Feb 22, 2018 at 2:40 PM, wrote: Thanks for the responses. So, if I understand, this is the right behaviour and it does not affect to the stonith mechanism. If I remember correctly, the fault status persists for hours until I fix it manually. Is there any way to modify the expiry time to clean itself?. Yes, as mentioned set failure-timeout resource meta-attribute. 22 de febrero de 2018 12:28, "Andrei Borzenkov" escribió: Stonith resource state should have no impact on actual stonith operation. It only reflects whether monitor was successful or not and serves as warning to administrator that something may be wrong. It should automatically clear itself after failure-timeout has expired. On Thu, Feb 22, 2018 at 1:58 PM, wrote: Hi, I have a 2 node pacemaker cluster configured with the fence agent vmware_soap. Everything works fine until the vCenter is restarted. After that, stonith fails and stop. [root@node1 ~]# pcs status Cluster name: psqltest Stack: corosync Current DC: node2 (version 1.1.16-12.el7_4.7-94ff4df) - partition with quorum Last updated: Thu Feb 22 11:30:22 2018 Last change: Mon Feb 19 09:28:37 2018 by root via crm_resource on node1 2 nodes configured 6 resources configured Online: [ node1 node2 ] Full list of resources: Master/Slave Set: ms_drbd_psqltest [drbd_psqltest] Masters: [ node1 ] Slaves: [ node2 ] Resource Group: pgsqltest psqltestfs (ocf::heartbeat:Filesystem): Started node1 psqltest_vip (ocf::heartbeat:IPaddr2): Started node1 postgresql-94 (ocf::heartbeat:pgsql): Started node1 vmware_soap (stonith:fence_vmware_soap): Stopped Failed Actions: * vmware_soap_start_0 on node1 'unknown error' (1): call=38, status=Error, exitreason='none', last-rc-change='Thu Feb 22 10:55:46 2018', queued=0ms, exec=5374ms * vmware_soap_start_0 on node2 'unknown error' (1): call=56, status=Error, exitreason='none', last-rc-change='Thu Feb 22 10:55:39 2018', queued=0ms, exec=5479ms Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled [root@node1 ~]# pcs stonith show --full Resource: vmware_soap (class=stonith type=fence_vmware_soap) Attributes: inet4_only=1 ipaddr=192.168.1.1 ipport=443 login=MYDOMAIN\User passwd=mypass pcmk_host_list=node1,node2 power_wait=3 ssl_insecure=1 action= pcmk_list_timeout=120s pcmk_monitor_timeout=120s pcmk_status_timeout=120s Operations: monitor interval=60s (vmware_soap-monitor-interval-60s) I need to manually perform a "resource cleanup vmware_soap" to put it online again. Is there any way to do this automatically?. Is it possible to detect vSphere online again and enable stonith?. Thanks. ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Stonith stops after vSphere restart
On 02/22/2018 02:55 PM, j...@disroot.org wrote: > Hi, > > I am trying to configure the failure-timeout for stonith, but I only can do > it for the other resources. > When try to enable it for stonith, I get this error: "Error: resource > option(s): 'failure-timeout', are not recognized for resource type: > 'stonith::fence_vmware_soap'". It is a meta-attribute thus 'pcs stonith update ... meta failure-timeout=...' should work. Although I'm not 100% sure if it is being adhered properly. Regards, Klaus > > Thanks. > > 22 de febrero de 2018 13:46, "Andrei Borzenkov"> escribió: > >> On Thu, Feb 22, 2018 at 2:40 PM, wrote: >> >>> Thanks for the responses. >>> >>> So, if I understand, this is the right behaviour and it does not affect to >>> the stonith mechanism. >>> >>> If I remember correctly, the fault status persists for hours until I fix it >>> manually. >>> Is there any way to modify the expiry time to clean itself?. >> Yes, as mentioned set failure-timeout resource meta-attribute. >> >>> 22 de febrero de 2018 12:28, "Andrei Borzenkov" >>> escribió: >>> Stonith resource state should have no impact on actual stonith operation. It only reflects whether monitor was successful or not and serves as warning to administrator that something may be wrong. It should automatically clear itself after failure-timeout has expired. On Thu, Feb 22, 2018 at 1:58 PM, wrote: >>> Hi, >>> >>> I have a 2 node pacemaker cluster configured with the fence agent >>> vmware_soap. >>> Everything works fine until the vCenter is restarted. After that, stonith >>> fails and stop. >>> >>> [root@node1 ~]# pcs status >>> Cluster name: psqltest >>> Stack: corosync >>> Current DC: node2 (version 1.1.16-12.el7_4.7-94ff4df) - partition with >>> quorum >>> Last updated: Thu Feb 22 11:30:22 2018 >>> Last change: Mon Feb 19 09:28:37 2018 by root via crm_resource on node1 >>> >>> 2 nodes configured >>> 6 resources configured >>> >>> Online: [ node1 node2 ] >>> >>> Full list of resources: >>> >>> Master/Slave Set: ms_drbd_psqltest [drbd_psqltest] >>> Masters: [ node1 ] >>> Slaves: [ node2 ] >>> Resource Group: pgsqltest >>> psqltestfs (ocf::heartbeat:Filesystem): Started node1 >>> psqltest_vip (ocf::heartbeat:IPaddr2): Started node1 >>> postgresql-94 (ocf::heartbeat:pgsql): Started node1 >>> vmware_soap (stonith:fence_vmware_soap): Stopped >>> >>> Failed Actions: >>> * vmware_soap_start_0 on node1 'unknown error' (1): call=38, status=Error, >>> exitreason='none', >>> last-rc-change='Thu Feb 22 10:55:46 2018', queued=0ms, exec=5374ms >>> * vmware_soap_start_0 on node2 'unknown error' (1): call=56, status=Error, >>> exitreason='none', >>> last-rc-change='Thu Feb 22 10:55:39 2018', queued=0ms, exec=5479ms >>> >>> Daemon Status: >>> corosync: active/enabled >>> pacemaker: active/enabled >>> pcsd: active/enabled >>> >>> [root@node1 ~]# pcs stonith show --full >>> Resource: vmware_soap (class=stonith type=fence_vmware_soap) >>> Attributes: inet4_only=1 ipaddr=192.168.1.1 ipport=443 login=MYDOMAIN\User >>> passwd=mypass pcmk_host_list=node1,node2 power_wait=3 ssl_insecure=1 action= >>> pcmk_list_timeout=120s pcmk_monitor_timeout=120s pcmk_status_timeout=120s >>> Operations: monitor interval=60s (vmware_soap-monitor-interval-60s) >>> >>> I need to manually perform a "resource cleanup vmware_soap" to put it online >>> again. >>> Is there any way to do this automatically?. >>> Is it possible to detect vSphere online again and enable stonith?. >>> >>> Thanks. >>> >>> ___ >>> Users mailing list: Users@clusterlabs.org >>> https://lists.clusterlabs.org/mailman/listinfo/users >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org >>> ___ >>> Users mailing list: Users@clusterlabs.org >>> https://lists.clusterlabs.org/mailman/listinfo/users >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org >> ___ >> Users mailing list: Users@clusterlabs.org >> https://lists.clusterlabs.org/mailman/listinfo/users >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org > ___ > Users mailing list: Users@clusterlabs.org >
Re: [ClusterLabs] Stonith stops after vSphere restart
Hi, I am trying to configure the failure-timeout for stonith, but I only can do it for the other resources. When try to enable it for stonith, I get this error: "Error: resource option(s): 'failure-timeout', are not recognized for resource type: 'stonith::fence_vmware_soap'". Thanks. 22 de febrero de 2018 13:46, "Andrei Borzenkov"escribió: > On Thu, Feb 22, 2018 at 2:40 PM, wrote: > >> Thanks for the responses. >> >> So, if I understand, this is the right behaviour and it does not affect to >> the stonith mechanism. >> >> If I remember correctly, the fault status persists for hours until I fix it >> manually. >> Is there any way to modify the expiry time to clean itself?. > > Yes, as mentioned set failure-timeout resource meta-attribute. > >> 22 de febrero de 2018 12:28, "Andrei Borzenkov" >> escribió: >> >>> Stonith resource state should have no impact on actual stonith >>> operation. It only reflects whether monitor was successful or not and >>> serves as warning to administrator that something may be wrong. It >>> should automatically clear itself after failure-timeout has expired. >>> >>> On Thu, Feb 22, 2018 at 1:58 PM, wrote: >> >> Hi, >> >> I have a 2 node pacemaker cluster configured with the fence agent >> vmware_soap. >> Everything works fine until the vCenter is restarted. After that, stonith >> fails and stop. >> >> [root@node1 ~]# pcs status >> Cluster name: psqltest >> Stack: corosync >> Current DC: node2 (version 1.1.16-12.el7_4.7-94ff4df) - partition with >> quorum >> Last updated: Thu Feb 22 11:30:22 2018 >> Last change: Mon Feb 19 09:28:37 2018 by root via crm_resource on node1 >> >> 2 nodes configured >> 6 resources configured >> >> Online: [ node1 node2 ] >> >> Full list of resources: >> >> Master/Slave Set: ms_drbd_psqltest [drbd_psqltest] >> Masters: [ node1 ] >> Slaves: [ node2 ] >> Resource Group: pgsqltest >> psqltestfs (ocf::heartbeat:Filesystem): Started node1 >> psqltest_vip (ocf::heartbeat:IPaddr2): Started node1 >> postgresql-94 (ocf::heartbeat:pgsql): Started node1 >> vmware_soap (stonith:fence_vmware_soap): Stopped >> >> Failed Actions: >> * vmware_soap_start_0 on node1 'unknown error' (1): call=38, status=Error, >> exitreason='none', >> last-rc-change='Thu Feb 22 10:55:46 2018', queued=0ms, exec=5374ms >> * vmware_soap_start_0 on node2 'unknown error' (1): call=56, status=Error, >> exitreason='none', >> last-rc-change='Thu Feb 22 10:55:39 2018', queued=0ms, exec=5479ms >> >> Daemon Status: >> corosync: active/enabled >> pacemaker: active/enabled >> pcsd: active/enabled >> >> [root@node1 ~]# pcs stonith show --full >> Resource: vmware_soap (class=stonith type=fence_vmware_soap) >> Attributes: inet4_only=1 ipaddr=192.168.1.1 ipport=443 login=MYDOMAIN\User >> passwd=mypass pcmk_host_list=node1,node2 power_wait=3 ssl_insecure=1 action= >> pcmk_list_timeout=120s pcmk_monitor_timeout=120s pcmk_status_timeout=120s >> Operations: monitor interval=60s (vmware_soap-monitor-interval-60s) >> >> I need to manually perform a "resource cleanup vmware_soap" to put it online >> again. >> Is there any way to do this automatically?. >> Is it possible to detect vSphere online again and enable stonith?. >> >> Thanks. >> >> ___ >> Users mailing list: Users@clusterlabs.org >> https://lists.clusterlabs.org/mailman/listinfo/users >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org >>> ___ >>> Users mailing list: Users@clusterlabs.org >>> https://lists.clusterlabs.org/mailman/listinfo/users >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org >> >> ___ >> Users mailing list: Users@clusterlabs.org >> https://lists.clusterlabs.org/mailman/listinfo/users >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org > > ___ > Users mailing list: Users@clusterlabs.org > https://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Stonith stops after vSphere restart
On Thu, Feb 22, 2018 at 2:40 PM,wrote: > Thanks for the responses. > > So, if I understand, this is the right behaviour and it does not affect to > the stonith mechanism. > > If I remember correctly, the fault status persists for hours until I fix it > manually. > Is there any way to modify the expiry time to clean itself?. > Yes, as mentioned set failure-timeout resource meta-attribute. > 22 de febrero de 2018 12:28, "Andrei Borzenkov" > escribió: > >> Stonith resource state should have no impact on actual stonith >> operation. It only reflects whether monitor was successful or not and >> serves as warning to administrator that something may be wrong. It >> should automatically clear itself after failure-timeout has expired. >> >> On Thu, Feb 22, 2018 at 1:58 PM, wrote: >> >>> Hi, >>> >>> I have a 2 node pacemaker cluster configured with the fence agent >>> vmware_soap. >>> Everything works fine until the vCenter is restarted. After that, stonith >>> fails and stop. >>> >>> [root@node1 ~]# pcs status >>> Cluster name: psqltest >>> Stack: corosync >>> Current DC: node2 (version 1.1.16-12.el7_4.7-94ff4df) - partition with >>> quorum >>> Last updated: Thu Feb 22 11:30:22 2018 >>> Last change: Mon Feb 19 09:28:37 2018 by root via crm_resource on node1 >>> >>> 2 nodes configured >>> 6 resources configured >>> >>> Online: [ node1 node2 ] >>> >>> Full list of resources: >>> >>> Master/Slave Set: ms_drbd_psqltest [drbd_psqltest] >>> Masters: [ node1 ] >>> Slaves: [ node2 ] >>> Resource Group: pgsqltest >>> psqltestfs (ocf::heartbeat:Filesystem): Started node1 >>> psqltest_vip (ocf::heartbeat:IPaddr2): Started node1 >>> postgresql-94 (ocf::heartbeat:pgsql): Started node1 >>> vmware_soap (stonith:fence_vmware_soap): Stopped >>> >>> Failed Actions: >>> * vmware_soap_start_0 on node1 'unknown error' (1): call=38, status=Error, >>> exitreason='none', >>> last-rc-change='Thu Feb 22 10:55:46 2018', queued=0ms, exec=5374ms >>> * vmware_soap_start_0 on node2 'unknown error' (1): call=56, status=Error, >>> exitreason='none', >>> last-rc-change='Thu Feb 22 10:55:39 2018', queued=0ms, exec=5479ms >>> >>> Daemon Status: >>> corosync: active/enabled >>> pacemaker: active/enabled >>> pcsd: active/enabled >>> >>> [root@node1 ~]# pcs stonith show --full >>> Resource: vmware_soap (class=stonith type=fence_vmware_soap) >>> Attributes: inet4_only=1 ipaddr=192.168.1.1 ipport=443 login=MYDOMAIN\User >>> passwd=mypass pcmk_host_list=node1,node2 power_wait=3 ssl_insecure=1 action= >>> pcmk_list_timeout=120s pcmk_monitor_timeout=120s pcmk_status_timeout=120s >>> Operations: monitor interval=60s (vmware_soap-monitor-interval-60s) >>> >>> I need to manually perform a "resource cleanup vmware_soap" to put it online >>> again. >>> Is there any way to do this automatically?. >>> Is it possible to detect vSphere online again and enable stonith?. >>> >>> Thanks. >>> >>> ___ >>> Users mailing list: Users@clusterlabs.org >>> https://lists.clusterlabs.org/mailman/listinfo/users >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org >> >> ___ >> Users mailing list: Users@clusterlabs.org >> https://lists.clusterlabs.org/mailman/listinfo/users >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org > ___ > Users mailing list: Users@clusterlabs.org > https://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Stonith stops after vSphere restart
Thanks for the responses. So, if I understand, this is the right behaviour and it does not affect to the stonith mechanism. If I remember correctly, the fault status persists for hours until I fix it manually. Is there any way to modify the expiry time to clean itself?. 22 de febrero de 2018 12:28, "Andrei Borzenkov"escribió: > Stonith resource state should have no impact on actual stonith > operation. It only reflects whether monitor was successful or not and > serves as warning to administrator that something may be wrong. It > should automatically clear itself after failure-timeout has expired. > > On Thu, Feb 22, 2018 at 1:58 PM, wrote: > >> Hi, >> >> I have a 2 node pacemaker cluster configured with the fence agent >> vmware_soap. >> Everything works fine until the vCenter is restarted. After that, stonith >> fails and stop. >> >> [root@node1 ~]# pcs status >> Cluster name: psqltest >> Stack: corosync >> Current DC: node2 (version 1.1.16-12.el7_4.7-94ff4df) - partition with >> quorum >> Last updated: Thu Feb 22 11:30:22 2018 >> Last change: Mon Feb 19 09:28:37 2018 by root via crm_resource on node1 >> >> 2 nodes configured >> 6 resources configured >> >> Online: [ node1 node2 ] >> >> Full list of resources: >> >> Master/Slave Set: ms_drbd_psqltest [drbd_psqltest] >> Masters: [ node1 ] >> Slaves: [ node2 ] >> Resource Group: pgsqltest >> psqltestfs (ocf::heartbeat:Filesystem): Started node1 >> psqltest_vip (ocf::heartbeat:IPaddr2): Started node1 >> postgresql-94 (ocf::heartbeat:pgsql): Started node1 >> vmware_soap (stonith:fence_vmware_soap): Stopped >> >> Failed Actions: >> * vmware_soap_start_0 on node1 'unknown error' (1): call=38, status=Error, >> exitreason='none', >> last-rc-change='Thu Feb 22 10:55:46 2018', queued=0ms, exec=5374ms >> * vmware_soap_start_0 on node2 'unknown error' (1): call=56, status=Error, >> exitreason='none', >> last-rc-change='Thu Feb 22 10:55:39 2018', queued=0ms, exec=5479ms >> >> Daemon Status: >> corosync: active/enabled >> pacemaker: active/enabled >> pcsd: active/enabled >> >> [root@node1 ~]# pcs stonith show --full >> Resource: vmware_soap (class=stonith type=fence_vmware_soap) >> Attributes: inet4_only=1 ipaddr=192.168.1.1 ipport=443 login=MYDOMAIN\User >> passwd=mypass pcmk_host_list=node1,node2 power_wait=3 ssl_insecure=1 action= >> pcmk_list_timeout=120s pcmk_monitor_timeout=120s pcmk_status_timeout=120s >> Operations: monitor interval=60s (vmware_soap-monitor-interval-60s) >> >> I need to manually perform a "resource cleanup vmware_soap" to put it online >> again. >> Is there any way to do this automatically?. >> Is it possible to detect vSphere online again and enable stonith?. >> >> Thanks. >> >> ___ >> Users mailing list: Users@clusterlabs.org >> https://lists.clusterlabs.org/mailman/listinfo/users >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org > > ___ > Users mailing list: Users@clusterlabs.org > https://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Stonith stops after vSphere restart
Stonith resource state should have no impact on actual stonith operation. It only reflects whether monitor was successful or not and serves as warning to administrator that something may be wrong. It should automatically clear itself after failure-timeout has expired. On Thu, Feb 22, 2018 at 1:58 PM,wrote: > > Hi, > > I have a 2 node pacemaker cluster configured with the fence agent > vmware_soap. > Everything works fine until the vCenter is restarted. After that, stonith > fails and stop. > > [root@node1 ~]# pcs status > Cluster name: psqltest > Stack: corosync > Current DC: node2 (version 1.1.16-12.el7_4.7-94ff4df) - partition with > quorum > Last updated: Thu Feb 22 11:30:22 2018 > Last change: Mon Feb 19 09:28:37 2018 by root via crm_resource on node1 > > 2 nodes configured > 6 resources configured > > Online: [ node1 node2 ] > > Full list of resources: > > Master/Slave Set: ms_drbd_psqltest [drbd_psqltest] > Masters: [ node1 ] > Slaves: [ node2 ] > Resource Group: pgsqltest > psqltestfs (ocf::heartbeat:Filesystem): Started node1 > psqltest_vip (ocf::heartbeat:IPaddr2): Started node1 > postgresql-94 (ocf::heartbeat:pgsql): Started node1 > vmware_soap (stonith:fence_vmware_soap): Stopped > > Failed Actions: > * vmware_soap_start_0 on node1 'unknown error' (1): call=38, status=Error, > exitreason='none', > last-rc-change='Thu Feb 22 10:55:46 2018', queued=0ms, exec=5374ms > * vmware_soap_start_0 on node2 'unknown error' (1): call=56, status=Error, > exitreason='none', > last-rc-change='Thu Feb 22 10:55:39 2018', queued=0ms, exec=5479ms > > Daemon Status: > corosync: active/enabled > pacemaker: active/enabled > pcsd: active/enabled > > > [root@node1 ~]# pcs stonith show --full > Resource: vmware_soap (class=stonith type=fence_vmware_soap) > Attributes: inet4_only=1 ipaddr=192.168.1.1 ipport=443 login=MYDOMAIN\User > passwd=mypass pcmk_host_list=node1,node2 power_wait=3 ssl_insecure=1 action= > pcmk_list_timeout=120s pcmk_monitor_timeout=120s pcmk_status_timeout=120s > Operations: monitor interval=60s (vmware_soap-monitor-interval-60s) > > > I need to manually perform a "resource cleanup vmware_soap" to put it online > again. > Is there any way to do this automatically?. > Is it possible to detect vSphere online again and enable stonith?. > > Thanks. > > ___ > Users mailing list: Users@clusterlabs.org > https://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Stonith stops after vSphere restart
Hi, On Thu, Feb 22, 2018 at 11:58 AM,wrote: > > Hi, > > I have a 2 node pacemaker cluster configured with the fence agent > vmware_soap. > Everything works fine until the vCenter is restarted. After that, stonith > fails and stop. > This is expected as we run 'monitor' action to find out if fence device is working. I assume that it is not responding when vCenter is restarting. If your fencing device fails then manual intervention makes sense as you have to have fencing working in order to prevent data corruption. m, > > [root@node1 ~]# pcs status > Cluster name: psqltest > Stack: corosync > Current DC: node2 (version 1.1.16-12.el7_4.7-94ff4df) - partition with > quorum > Last updated: Thu Feb 22 11:30:22 2018 > Last change: Mon Feb 19 09:28:37 2018 by root via crm_resource on node1 > > 2 nodes configured > 6 resources configured > > Online: [ node1 node2 ] > > Full list of resources: > > Master/Slave Set: ms_drbd_psqltest [drbd_psqltest] > Masters: [ node1 ] > Slaves: [ node2 ] > Resource Group: pgsqltest > psqltestfs (ocf::heartbeat:Filesystem): Started node1 > psqltest_vip (ocf::heartbeat:IPaddr2): Started node1 > postgresql-94 (ocf::heartbeat:pgsql): Started node1 > vmware_soap (stonith:fence_vmware_soap): Stopped > > Failed Actions: > * vmware_soap_start_0 on node1 'unknown error' (1): call=38, status=Error, > exitreason='none', > last-rc-change='Thu Feb 22 10:55:46 2018', queued=0ms, exec=5374ms > * vmware_soap_start_0 on node2 'unknown error' (1): call=56, status=Error, > exitreason='none', > last-rc-change='Thu Feb 22 10:55:39 2018', queued=0ms, exec=5479ms > > Daemon Status: > corosync: active/enabled > pacemaker: active/enabled > pcsd: active/enabled > > > [root@node1 ~]# pcs stonith show --full > Resource: vmware_soap (class=stonith type=fence_vmware_soap) > Attributes: inet4_only=1 ipaddr=192.168.1.1 ipport=443 login=MYDOMAIN\User > passwd=mypass pcmk_host_list=node1,node2 power_wait=3 ssl_insecure=1 > action= pcmk_list_timeout=120s pcmk_monitor_timeout=120s > pcmk_status_timeout=120s > Operations: monitor interval=60s (vmware_soap-monitor-interval-60s) > > > I need to manually perform a "resource cleanup vmware_soap" to put it > online again. > Is there any way to do this automatically?. > Is it possible to detect vSphere online again and enable stonith?. > > Thanks. > > ___ > Users mailing list: Users@clusterlabs.org > https://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > > ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org