[ClusterLabs] ping Resource Agent doesnt work
I configured ping-RA via ocf:pacemaker. During testing the resource-agent works fine, it executes ping action to my gateway every 10 seconds and 3 attempts. But I noticed that even I disconnect the physical link of node where the resource resides, it doesnt failover to other node. *IP CONFIG:* 172.16.10.0/24 = interface to gateway 172.16.11.0/24= heartbeat link (interface to avoid split brain as my goal is to only failover ping-resource after 3-attempts failure to ping my gateway) *CONFIG:* [root@node1 ~]# pcs config Cluster Name: clusterPa Corosync Nodes: node1 node2 Pacemaker Nodes: node1 node2 Resources: Resource: ping-gateway (class=ocf provider=pacemaker type=ping) Attributes: host_list=172.16.10.1 Operations: monitor interval=10 timeout=60 (ping-gateway-monitor-interval-10) start interval=0s timeout=60 (ping-gateway-start-interval-0s) stop interval=0s timeout=20 (ping-gateway-stop-interval-0s) Stonith Devices: Fencing Levels: Location Constraints: Ordering Constraints: Colocation Constraints: Ticket Constraints: Alerts: No alerts defined Resources Defaults: No defaults set Operations Defaults: No defaults set Cluster Properties: cluster-infrastructure: corosync cluster-name: clusterPa dc-version: 1.1.16-12.el7-94ff4df have-watchdog: false last-lrm-refresh: 1531899781 stonith-enabled: false Quorum: Options: [root@node1 ~]# ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Weird Fencing Behavior
Rhel1 stonith-ng[1473]: warning: Mapping action='off' >> to pcmk_reboot_action='off' >> Jul 17 13:33:28 ArcoSRhel1 stonith-ng[1473]: notice: Fence1 can not fence >> (reboot) ArcosRhel2: static-list >> Jul 17 13:33:28 ArcoSRhel1 stonith-ng[1473]: notice: fence2 can fence >> (reboot) ArcosRhel2: static-list >> Jul 17 13:33:28 ArcoSRhel1 stonith-ng[1473]: notice: Fence1 can not fence >> (reboot) ArcosRhel2: static-list >> Jul 17 13:33:28 ArcoSRhel1 stonith-ng[1473]: notice: fence2 can fence >> (reboot) ArcosRhel2: static-list >> Jul 17 13:33:46 ArcoSRhel1 fence_vmware_soap: Unable to connect/login to >> fencing device >> Jul 17 13:33:46 ArcoSRhel1 stonith-ng[1473]: warning: >> fence_vmware_soap[7157] stderr: [ Unable to connect/login to fencing device >> ] >> Jul 17 13:33:46 ArcoSRhel1 stonith-ng[1473]: warning: >> fence_vmware_soap[7157] stderr: [ ] >> Jul 17 13:33:46 ArcoSRhel1 stonith-ng[1473]: warning: >> fence_vmware_soap[7157] stderr: [ ] >> >> >> >> >> >> >>>> See my config below: >>>> >>>> [root@ArcosRhel2 cluster]# pcs config >>>> Cluster Name: ARCOSCLUSTER >>>> Corosync Nodes: >>>> ? ArcosRhel1 ArcosRhel2 >>>> Pacemaker Nodes: >>>> ? ArcosRhel1 ArcosRhel2 >>>> >>>> Resources: >>>> ? Resource: ClusterIP (class=ocf provider=heartbeat type=IPaddr2) >>>> ? ?Attributes: cidr_netmask=32 ip=172.16.10.243 >>>> ? ?Operations: monitor interval=30s (ClusterIP-monitor-interval-30s) >>>> ? ? ? ? ? ? ? ?start interval=0s timeout=20s (ClusterIP-start- >>> interval-0s) >>>> ? ? ? ? ? ? ? ?stop interval=0s timeout=20s (ClusterIP-stop- >>> interval-0s) >>>> Stonith Devices: >>>> ? Resource: Fence1 (class=stonith type=fence_vmware_soap) >>>> ? ?Attributes: action=off ipaddr=172.16.10.151 login=admin >>> passwd=123pass >>>> pcmk_host_list=ArcosRhel1 pcmk_monitor_timeout=60s >>> port=ArcosRhel1(Joniel) >>>> ssl_insecure=1 pcmk_delay_max=0s >>>> ? ?Operations: monitor interval=60s (Fence1-monitor-interval-60s) >>>> ? Resource: fence2 (class=stonith type=fence_vmware_soap) >>>> ? ?Attributes: action=off ipaddr=172.16.10.152 login=admin >>> passwd=123pass >>>> pcmk_delay_max=0s pcmk_host_list=ArcosRhel2 >>> pcmk_monitor_timeout=60s >>>> port=ArcosRhel2(Ben) ssl_insecure=1 >>>> ? ?Operations: monitor interval=60s (fence2-monitor-interval-60s) >>>> Fencing Levels: >>>> >>>> Location Constraints: >>>> ? ?Resource: Fence1 >>>> ? ? ?Enabled on: ArcosRhel2 (score:INFINITY) >>>> (id:location-Fence1-ArcosRhel2-INFINITY) >>>> ? ?Resource: fence2 >>>> ? ? ?Enabled on: ArcosRhel1 (score:INFINITY) >>>> (id:location-fence2-ArcosRhel1-INFINITY) >>>> Ordering Constraints: >>>> Colocation Constraints: >>>> Ticket Constraints: >>>> >>>> Alerts: >>>> ? No alerts defined >>>> >>>> Resources Defaults: >>>> ? No defaults set >>>> Operations Defaults: >>>> ? No defaults set >>>> >>>> Cluster Properties: >>>> ? cluster-infrastructure: corosync >>>> ? cluster-name: ARCOSCLUSTER >>>> ? dc-version: 1.1.16-12.el7-94ff4df >>>> ? have-watchdog: false >>>> ? last-lrm-refresh: 1531810841 >>>> ? stonith-enabled: true >>>> >>>> Quorum: >>>> ? ?Options: On Wed, Jul 18, 2018 at 8:00 PM, wrote: > Send Users mailing list submissions to > users@clusterlabs.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.clusterlabs.org/mailman/listinfo/users > or, via email, send a message with subject or body 'help' to > users-requ...@clusterlabs.org > > You can reach the person managing the list at > users-ow...@clusterlabs.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Users digest..." > > > Today's Topics: > >1. Re: Weird Fencing Behavior (Andrei Borzenkov) >2. Re: Weird Fencing Behavior (Klaus Wenninger) > > > -- > > Message: 1 > Date: Wed, 18 Jul 2018 07:22:25 +0300 > From: Andrei Borzenkov > To: users@clusterlabs.org > Subject: Re: [ClusterLabs] Weird Fencing Behavior > Mess
Re: [ClusterLabs] Weird Fencing Behavior
> > Hi, > > > > On my two-node active/passive setup, I configured fencing via > > fence_vmware_soap. I configured pcmk_delay=0 on both nodes so I > expected > > that both nodes will be stonithed simultaenously. > > > > On my test scenario, Node1 has ClusterIP resource. When I > disconnect > > service/corosync link physically, Node1 was fenced and Node2 keeps > alive > > given pcmk_delay=0 on both nodes. > > > > Can you explain the behavior above? > > > > #node1 could not connect to ESX because links were disconnected. As > the > #most obvious explanation. > > #You have logs, you are the only one who can answer this question > with > #some certainty. Others can only guess. > > > Oops, my bad. I forgot to tell. I have two interfaces on each virtual > machine (nodes). second interface was used for ESX links, so fence > can be executed even though corosync links were disconnected. Looking > forward to your response. Thanks #Having no fence delay means a death match (each node killing the other) #is possible, but it doesn't guarantee that it will happen. Some of the #time, one node will detect the outage and fence the other one before #the other one can react. #It's basically an Old West shoot-out -- they may reach for their guns #at the same time, but one may be quicker. #As Andrei suggested, the logs from both nodes could give you a timeline #of what happened when. Hi andrei, kindly see below logs. Based on time of logs, Node1 should have fenced first Node2, but in actual test/scenario, Node1 was fenced/shutdown by Node2. Is it possible to have a 2-Node active/passive setup in pacemaker/corosync that the node that gets disconnected/interface down is the only one that gets fenced? Thanks guys *LOGS from Node2:* Jul 17 13:33:27 ArcosRhel2 corosync[1048]: [TOTEM ] A processor failed, forming new configuration. Jul 17 13:33:28 ArcosRhel2 corosync[1048]: [TOTEM ] A new membership ( 172.16.10.242:220) was formed. Members left: 1 Jul 17 13:33:28 ArcosRhel2 corosync[1048]: [TOTEM ] Failed to receive the leave message. failed: 1 Jul 17 13:33:28 ArcosRhel2 corosync[1048]: [QUORUM] Members[1]: 2 Jul 17 13:33:28 ArcosRhel2 corosync[1048]: [MAIN ] Completed service synchronization, ready to provide service. Jul 17 13:33:28 ArcosRhel2 attrd[1082]: notice: Node ArcosRhel1 state is now lost Jul 17 13:33:28 ArcosRhel2 attrd[1082]: notice: Removing all ArcosRhel1 attributes for peer loss Jul 17 13:33:28 ArcosRhel2 attrd[1082]: notice: Lost attribute writer ArcosRhel1 Jul 17 13:33:28 ArcosRhel2 attrd[1082]: notice: Purged 1 peers with id=1 and/or uname=ArcosRhel1 from the membership cache Jul 17 13:33:28 ArcosRhel2 cib[1079]: notice: Node ArcosRhel1 state is now lost Jul 17 13:33:28 ArcosRhel2 cib[1079]: notice: Purged 1 peers with id=1 and/or uname=ArcosRhel1 from the membership cache Jul 17 13:33:28 ArcosRhel2 crmd[1084]: notice: Node ArcosRhel1 state is now lost Jul 17 13:33:28 ArcosRhel2 crmd[1084]: warning: Our DC node (ArcosRhel1) left the cluster Jul 17 13:33:28 ArcosRhel2 pacemakerd[1074]: notice: Node ArcosRhel1 state is now lost Jul 17 13:33:28 ArcosRhel2 stonith-ng[1080]: notice: Node ArcosRhel1 state is now lost Jul 17 13:33:28 ArcosRhel2 stonith-ng[1080]: notice: Purged 1 peers with id=1 and/or uname=ArcosRhel1 from the membership cache Jul 17 13:33:28 ArcosRhel2 crmd[1084]: notice: State transition S_NOT_DC -> S_ELECTION Jul 17 13:33:28 ArcosRhel2 crmd[1084]: notice: State transition S_ELECTION -> S_INTEGRATION Jul 17 13:33:28 ArcosRhel2 crmd[1084]: warning: Input I_ELECTION_DC received in state S_INTEGRATION from do_election_check Jul 17 13:33:30 ArcosRhel2 pengine[1083]: warning: Node ArcosRhel1 will be fenced because the node is no longer part of the cluster Jul 17 13:33:30 ArcosRhel2 pengine[1083]: warning: Node ArcosRhel1 is unclean Jul 17 13:33:30 ArcosRhel2 pengine[1083]: warning: Action fence2_stop_0 on ArcosRhel1 is unrunnable (offline) Jul 17 13:33:30 ArcosRhel2 pengine[1083]: warning: Action ClusterIP_stop_0 on ArcosRhel1 is unrunnable (offline) Jul 17 13:33:30 ArcosRhel2 pengine[1083]: warning: Scheduling Node ArcosRhel1 for STONITH Jul 17 13:33:30 ArcosRhel2 pengine[1083]: notice: Move fence2#011(Started ArcosRhel1 -> ArcosRhel2) Jul 17 13:33:30 ArcosRhel2 pengine[1083]: notice: Move ClusterIP#011(Started ArcosRhel1 -> ArcosRhel2) Jul 17 13:33:30 ArcosRhel2 pengine[1083]: warning: Calculated transition 0 (with warnings), saving inputs in /var/lib/pacemaker/pengine/pe-warn-20.bz2 Jul 17 13:33:30 ArcosRhel2 crmd[1084]: notice: Requesting fencing (reboot) of node ArcosRhel1 Jul 17 13:33:30 ArcosRhel2 crmd[1084]: notice: Initiating start operation fence2_start_0 locally on ArcosRhel2 Jul 17 13:33:30 ArcosRhel2 stonith-ng[1080]: notice: Client crmd.1084.cd70178e wants to fence (reboot) 'ArcosRhel1' with device '(any)' Jul 17 13:33:30 ArcosRhel2 stonith-ng[1080]: notice: Requesting peer fencing (reboot) of ArcosRhel1 Jul 17 13:33:30 ArcosRhel2 stonith-ng[1080]: notice: Fence1
[ClusterLabs] Weird Fencing Behavior
> Hi, > > On my two-node active/passive setup, I configured fencing via > fence_vmware_soap. I configured pcmk_delay=0 on both nodes so I expected > that both nodes will be stonithed simultaenously. > > On my test scenario, Node1 has ClusterIP resource. When I disconnect > service/corosync link physically, Node1 was fenced and Node2 keeps alive > given pcmk_delay=0 on both nodes. > > Can you explain the behavior above? > #node1 could not connect to ESX because links were disconnected. As the #most obvious explanation. #You have logs, you are the only one who can answer this question with #some certainty. Others can only guess. Oops, my bad. I forgot to tell. I have two interfaces on each virtual machine (nodes). second interface was used for ESX links, so fence can be executed even though corosync links were disconnected. Looking forward to your response. Thanks > > > See my config below: > > [root@ArcosRhel2 cluster]# pcs config > Cluster Name: ARCOSCLUSTER > Corosync Nodes: > ArcosRhel1 ArcosRhel2 > Pacemaker Nodes: > ArcosRhel1 ArcosRhel2 > > Resources: > Resource: ClusterIP (class=ocf provider=heartbeat type=IPaddr2) > Attributes: cidr_netmask=32 ip=172.16.10.243 > Operations: monitor interval=30s (ClusterIP-monitor-interval-30s) > start interval=0s timeout=20s (ClusterIP-start-interval-0s) > stop interval=0s timeout=20s (ClusterIP-stop-interval-0s) > > Stonith Devices: > Resource: Fence1 (class=stonith type=fence_vmware_soap) > Attributes: action=off ipaddr=172.16.10.151 login=admin passwd=123pass > pcmk_host_list=ArcosRhel1 pcmk_monitor_timeout=60s port=ArcosRhel1(Joniel) > ssl_insecure=1 pcmk_delay_max=0s > Operations: monitor interval=60s (Fence1-monitor-interval-60s) > Resource: fence2 (class=stonith type=fence_vmware_soap) > Attributes: action=off ipaddr=172.16.10.152 login=admin passwd=123pass > pcmk_delay_max=0s pcmk_host_list=ArcosRhel2 pcmk_monitor_timeout=60s > port=ArcosRhel2(Ben) ssl_insecure=1 > Operations: monitor interval=60s (fence2-monitor-interval-60s) > Fencing Levels: > > Location Constraints: > Resource: Fence1 > Enabled on: ArcosRhel2 (score:INFINITY) > (id:location-Fence1-ArcosRhel2-INFINITY) > Resource: fence2 > Enabled on: ArcosRhel1 (score:INFINITY) > (id:location-fence2-ArcosRhel1-INFINITY) > Ordering Constraints: > Colocation Constraints: > Ticket Constraints: > > Alerts: > No alerts defined > > Resources Defaults: > No defaults set > Operations Defaults: > No defaults set > > Cluster Properties: > cluster-infrastructure: corosync > cluster-name: ARCOSCLUSTER > dc-version: 1.1.16-12.el7-94ff4df > have-watchdog: false > last-lrm-refresh: 1531810841 > stonith-enabled: true > > Quorum: > Options: > > > > ___ > Users mailing list: Users@clusterlabs.org > https://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Weird Fencing Behavior?
Hi, On my two-node active/passive setup, I configured fencing via fence_vmware_soap. I configured pcmk_delay=0 on both nodes so I expected that both nodes will be stonithed simultaenously. On my test scenario, Node1 has ClusterIP resource. When I disconnect service/corosync link physically, Node1 was fenced and Node2 keeps alive given pcmk_delay=0 on both nodes. Can you explain the behavior above? See my config below: [root@ArcosRhel2 cluster]# pcs config Cluster Name: ARCOSCLUSTER Corosync Nodes: ArcosRhel1 ArcosRhel2 Pacemaker Nodes: ArcosRhel1 ArcosRhel2 Resources: Resource: ClusterIP (class=ocf provider=heartbeat type=IPaddr2) Attributes: cidr_netmask=32 ip=172.16.10.243 Operations: monitor interval=30s (ClusterIP-monitor-interval-30s) start interval=0s timeout=20s (ClusterIP-start-interval-0s) stop interval=0s timeout=20s (ClusterIP-stop-interval-0s) Stonith Devices: Resource: Fence1 (class=stonith type=fence_vmware_soap) Attributes: action=off ipaddr=172.16.10.151 login=admin passwd=123pass pcmk_host_list=ArcosRhel1 pcmk_monitor_timeout=60s port=ArcosRhel1(Joniel) ssl_insecure=1 pcmk_delay_max=0s Operations: monitor interval=60s (Fence1-monitor-interval-60s) Resource: fence2 (class=stonith type=fence_vmware_soap) Attributes: action=off ipaddr=172.16.10.152 login=admin passwd=123pass pcmk_delay_max=0s pcmk_host_list=ArcosRhel2 pcmk_monitor_timeout=60s port=ArcosRhel2(Ben) ssl_insecure=1 Operations: monitor interval=60s (fence2-monitor-interval-60s) Fencing Levels: Location Constraints: Resource: Fence1 Enabled on: ArcosRhel2 (score:INFINITY) (id:location-Fence1-ArcosRhel2-INFINITY) Resource: fence2 Enabled on: ArcosRhel1 (score:INFINITY) (id:location-fence2-ArcosRhel1-INFINITY) Ordering Constraints: Colocation Constraints: Ticket Constraints: Alerts: No alerts defined Resources Defaults: No defaults set Operations Defaults: No defaults set Cluster Properties: cluster-infrastructure: corosync cluster-name: ARCOSCLUSTER dc-version: 1.1.16-12.el7-94ff4df have-watchdog: false last-lrm-refresh: 1531810841 stonith-enabled: true Quorum: Options: ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] What triggers fencing?
Message: 2 Date: Wed, 11 Jul 2018 16:33:31 +0200 From: Klaus Wenninger To: Ken Gaillot , Cluster Labs - All topics related to open-source clustering welcomed , Andrei Borzenkov Subject: Re: [ClusterLabs] What triggers fencing? Message-ID: <2bf61b9f-98b0-482f-fa65-263ba9490...@redhat.com> Content-Type: text/plain; charset=utf-8 On 07/11/2018 04:11 PM, Ken Gaillot wrote: > On Wed, 2018-07-11 at 11:06 +0200, Klaus Wenninger wrote: >> On 07/11/2018 05:48 AM, Andrei Borzenkov wrote: >>> 11.07.2018 05:45, Confidential Company ?: >>>> Not true, the faster node will kill the slower node first. It is >>>> possible that through misconfiguration, both could die, but it's >>>> rare >>>> and easily avoided with a 'delay="15"' set on the fence config >>>> for the >>>> node you want to win. >>>> >>>> Don't use a delay on the other node, just the node you want to >>>> live in >>>> such a case. >>>> >>>> ** >>>> 1. Given Active/Passive setup, resources are >>>> active on Node1 >>>> 2. fence1(prefers to Node1, delay=15) and >>>> fence2(prefers to >>>> Node2, delay=30) >>>> 3. Node2 goes down > What do you mean by "down" in this case? > > If you mean the host itself has crashed, then it will not do anything, > and node1 will fence it. > > If you mean node2's network goes out, so it's still functioning but no > one can reach the managed service on it, then you are correct, the > "wrong" node can get shot -- because you didn't specify anything about > what the right node would be. This is a somewhat tricky area, but it > can be done with a quorum-only node, qdisk, or fence_heuristics_ping, > all of which are different ways of "preferring" the node that can reach > a certain host. Or in other words why would I - as a cluster-node - shoot the peer to be able to start the services locally if I can somehow tell beforehand that my services anyway wouldn't be reachable by anybody (e.g. network disconnected). Then it might make more sense to sit still and wait to be shot by the other side for the case that guy is more lucky and has e.g. access to the network. -Klaus in case of 2node setup, they are both know nothing if their services are reachable by anybody. Sharing you my config and my tests: Last login: Thu Jul 12 14:57:21 2018 [root@ArcosRhel1 ~]# pcs config Cluster Name: ARCOSCLUSTER Corosync Nodes: ArcosRhel1 ArcosRhel2 Pacemaker Nodes: ArcosRhel1 ArcosRhel2 Resources: Resource: ClusterIP (class=ocf provider=heartbeat type=IPaddr2) Attributes: cidr_netmask=32 ip=172.16.10.243 Operations: monitor interval=30s (ClusterIP-monitor-interval-30s) start interval=0s timeout=20s (ClusterIP-start-interval-0s) stop interval=0s timeout=20s (ClusterIP-stop-interval-0s) Stonith Devices: Resource: Fence1 (class=stonith type=fence_vmware_soap) Attributes: action=off ipaddr=172.16.11.201 login=test passwd=testing pcmk_host_list=ArcosRhel1 pcmk_monitor_timeout=60s port=ArcosRhel1(Joniel) ssl_insecure=1 Operations: monitor interval=60s (Fence1-monitor-interval-60s) Resource: fence2 (class=stonith type=fence_vmware_soap) Attributes: action=off ipaddr=172.16.11.202 login=test passwd=testing pcmk_delay_max=10s pcmk_host_list=ArcosRhel2 pcmk_monitor_timeout=60s port=ArcosRhel2(Ben) ssl_insecure=1 Operations: monitor interval=60s (fence2-monitor-interval-60s) Fencing Levels: Location Constraints: Resource: Fence1 Enabled on: ArcosRhel2 (score:INFINITY) (id:location-Fence1-ArcosRhel2-INFINITY) Resource: fence2 Enabled on: ArcosRhel1 (score:INFINITY) (id:location-fence2-ArcosRhel1-INFINITY) Ordering Constraints: Colocation Constraints: Ticket Constraints: Alerts: No alerts defined Resources Defaults: No defaults set Operations Defaults: No defaults set Cluster Properties: cluster-infrastructure: corosync cluster-name: ARCOSCLUSTER dc-version: 1.1.16-12.el7-94ff4df have-watchdog: false last-lrm-refresh: 1531375458 stonith-enabled: true Quorum: Options: [root@ArcosRhel1 ~]# **Test scenario: Given: Nodes has two interfaces: (ens192 for corosync traffic / ens224 for esxi traffic) a.) Node1=Active and Node2=Passive. Action=disconnect ens192 of Node1 Output= Node2 was fenced and shutdown b.) Node1=Passive and Node2=Active Action=disconnect ens192 of Node1 Output= Node1 was fenced and shutdown c.) Node1=Passive and Node2=Active Action=disconnect ens192 of Node2 Output=Node2 was fenced and shutdown Thanks, imnotarobot > > If you mean the cluster-managed resource crashes on node2, but node2 > itself is still functioning properly, then what h
Re: [ClusterLabs] What triggers fencing?
Message: 1 Date: Wed, 11 Jul 2018 11:06:56 +0200 From: Klaus Wenninger To: Cluster Labs - All topics related to open-source clustering welcomed , Andrei Borzenkov Subject: Re: [ClusterLabs] What triggers fencing? Message-ID: Content-Type: text/plain; charset=utf-8 On 07/11/2018 05:48 AM, Andrei Borzenkov wrote: > 11.07.2018 05:45, Confidential Company ?: >> Not true, the faster node will kill the slower node first. It is >> possible that through misconfiguration, both could die, but it's rare >> and easily avoided with a 'delay="15"' set on the fence config for the >> node you want to win. >> >> Don't use a delay on the other node, just the node you want to live in >> such a case. >> >> ** >> 1. Given Active/Passive setup, resources are active on Node1 >> 2. fence1(prefers to Node1, delay=15) and fence2(prefers to >> Node2, delay=30) >> 3. Node2 goes down >> 4. Node1 thinks Node2 goes down / Node2 thinks Node1 goes >> down > If node2 is down, it cannot think anything. True. Assuming it is not really down but just somehow disconnected for my answer below. > >> 5. fence1 counts 15 seconds before he fence Node1 while >> fence2 counts 30 seconds before he fence Node2 >> 6. Since fence1 do have shorter time than fence2, fence1 >> executes and shutdown Node1. >> 7. fence1(action: shutdown Node1) will trigger first >> always because it has shorter delay than fence2. >> >> ** Okay what's important is that they should be different. But in the case >> above, even though Node2 goes down but Node1 has shorter delay, Node1 gets >> fenced/shutdown. This is a sample scenario. I don't get the point. Can you >> comment on this? You didn't send the actual config but from your description I get the scenario that way: fencing-resource fence1 is running on Node2 and it is there to fence Node1 and it has a delay of 15s. fencing-resource fence2 is running on Node1 and it is there to fence Node2 and it has a delay of 30s. If they now begin to fence each other at the same time the node actually fenced would be Node1 of course as the fencing-resource fence1 is gonna shoot 15s earlier that the fence2. Looks consistent to me ... Regards, Klaus *** Yes, that is right Klaus. fence1 running on Node2 will fence Node1, fence1 will execute first whichever Node goes down because it has shorter delay. But if Node2 goes down or disconnected, how can it be fenced by Node1 using fence2, if fence2 cannot be triggered because fence1 always comes first. My point here is that giving delay on fencing resolves the issue of double fencing, but it doesnt resolve or doesnt know who's Node should be fenced. Even though Node2 gets disconnected, Node1 will be fenced and the whole service totally goes down. **Let me share you my actual config: I have two ESXI hosts, 2 virtual machines, 2 interfaces on each (1=corosync interface, 1=interface for VM to contact ESXI host) Pacemaker Nodes: ArcosRhel1 ArcosRhel2 Resources: Resource: ClusterIP (class=ocf provider=heartbeat type=IPaddr2) Attributes: cidr_netmask=32 ip=172.16.10.243 Operations: monitor interval=30s (ClusterIP-monitor-interval-30s) start interval=0s timeout=20s (ClusterIP-start-interval-0s) stop interval=0s timeout=20s (ClusterIP-stop-interval-0s) Stonith Devices: Resource: Fence1 (class=stonith type=fence_vmware_soap) Attributes: action=off ipaddr=172.16.10.201 login=test passwd=testing pcmk_host_list=ArcosRhel1 pcmk_monitor_timeout=60s port=ArcosRhel1 ssl_insecure=1 Operations: monitor interval=60s (Fence1-monitor-interval-60s) Resource: fence2 (class=stonith type=fence_vmware_soap) Attributes: action=off ipaddr=172.16.10.202 login=test passwd=testing pcmk_delay_max=10s pcmk_host_list=ArcosRhel2 pcmk_monitor_timeout=60s port=ArcosRhel2 ssl_insecure=1 Operations: monitor interval=60s (fence2-monitor-interval-60s) Fencing Levels: Location Constraints: Resource: Fence1 Enabled on: ArcosRhel2 (score:INFINITY) (id:location-Fence1-ArcosRhel2-INFINITY) Resource: fence2 Enabled on: ArcosRhel1 (score:INFINITY) (id:location-fence2-ArcosRhel1-INFINITY) Ordering Constraints: Colocation Constraints: Ticket Constraints: Alerts: No alerts defined Resources Defaults: No defaults set Operations Defaults: No defaults set Cluster Properties: cluster-infrastructure: corosync cluster-name: ARCOSCLUSTER dc-version: 1.1.16-12.el7-94ff4df have-watchdog: false last-lrm-refresh: 1531300540 stonith-enabled: true * >> >> Thanks >> >> On Tue, Jul 10, 2018 at 12:18 AM, Klaus Wenninger >> wrote: >> >>> On 07/09/2018 05:53 PM, Digimer
Re: [ClusterLabs] What triggers fencing?
Not true, the faster node will kill the slower node first. It is possible that through misconfiguration, both could die, but it's rare and easily avoided with a 'delay="15"' set on the fence config for the node you want to win. Don't use a delay on the other node, just the node you want to live in such a case. ** 1. Given Active/Passive setup, resources are active on Node1 2. fence1(prefers to Node1, delay=15) and fence2(prefers to Node2, delay=30) 3. Node2 goes down 4. Node1 thinks Node2 goes down / Node2 thinks Node1 goes down 5. fence1 counts 15 seconds before he fence Node1 while fence2 counts 30 seconds before he fence Node2 6. Since fence1 do have shorter time than fence2, fence1 executes and shutdown Node1. 7. fence1(action: shutdown Node1) will trigger first always because it has shorter delay than fence2. ** Okay what's important is that they should be different. But in the case above, even though Node2 goes down but Node1 has shorter delay, Node1 gets fenced/shutdown. This is a sample scenario. I don't get the point. Can you comment on this? Thanks On Tue, Jul 10, 2018 at 12:18 AM, Klaus Wenninger wrote: > On 07/09/2018 05:53 PM, Digimer wrote: > > On 2018-07-09 11:45 AM, Klaus Wenninger wrote: > >> On 07/09/2018 05:33 PM, Digimer wrote: > >>> On 2018-07-09 09:56 AM, Klaus Wenninger wrote: > >>>> On 07/09/2018 03:49 PM, Digimer wrote: > >>>>> On 2018-07-09 08:31 AM, Klaus Wenninger wrote: > >>>>>> On 07/09/2018 02:04 PM, Confidential Company wrote: > >>>>>>> Hi, > >>>>>>> > >>>>>>> Any ideas what triggers fencing script or stonith? > >>>>>>> > >>>>>>> Given the setup below: > >>>>>>> 1. I have two nodes > >>>>>>> 2. Configured fencing on both nodes > >>>>>>> 3. Configured delay=15 and delay=30 on fence1(for Node1) and > >>>>>>> fence2(for Node2) respectively > >>>>>>> > >>>>>>> *What does it mean to configured delay in stonith? wait for 15 > seconds > >>>>>>> before it fence the node? > >>>>>> Given that on a 2-node-cluster you don't have real quorum to make > one > >>>>>> partial cluster fence the rest of the nodes the different delays > are meant > >>>>>> to prevent a fencing-race. > >>>>>> Without different delays that would lead to both nodes fencing each > >>>>>> other at the same time - finally both being down. > >>>>> Not true, the faster node will kill the slower node first. It is > >>>>> possible that through misconfiguration, both could die, but it's rare > >>>>> and easily avoided with a 'delay="15"' set on the fence config for > the > >>>>> node you want to win. > >>>> What exactly is not true? Aren't we saying the same? > >>>> Of course one of the delays can be 0 (most important is that > >>>> they are different). > >>> Perhaps I misunderstood your message. It seemed to me that the > >>> implication was that fencing in 2-node without a delay always ends up > >>> with both nodes being down, which isn't the case. It can happen if the > >>> fence methods are not setup right (ie: the node isn't set to > immediately > >>> power off on ACPI power button event). > >> Yes, a misunderstanding I guess. > >> > >> Should have been more verbose in saying that due to the > >> time between the fencing-command fired off to the fencing > >> device and the actual fencing taking place (as you state > >> dependent on how it is configured in detail - but a measurable > >> time in all cases) there is a certain probability that when > >> both nodes start fencing at roughly the same time we will > >> end up with 2 nodes down. > >> > >> Everybody has to find his own tradeoff between reliability > >> fence-races are prevented and fencing delay I guess. > > We've used this; > > > > 1. IPMI (with the guest OS set to immediately power off) as primary, > > with a 15 second delay on the active node. > > > > 2. Two Switched PDUs (two power circuits, two PSUs) as backup fencing > > for when IPMI fails, with no delay. > > > > In ~8 years, across dozens and dozens of clusters and countless fence
[ClusterLabs] What triggers fencing?
Hi, Any ideas what triggers fencing script or stonith? Given the setup below: 1. I have two nodes 2. Configured fencing on both nodes 3. Configured delay=15 and delay=30 on fence1(for Node1) and fence2(for Node2) respectively *What does it mean to configured delay in stonith? wait for 15 seconds before it fence the node? *Given Node1 is active and Node2 goes down, does it mean fence1 will first execute and shutdowns Node1 even though Node2 goes down? Thanks imnotarobot ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Resource-stickiness is not working
On Sat, 2018-06-02 at 22:14 +0800, Confidential Company wrote: > On Fri, 2018-06-01 at 22:58 +0800, Confidential Company wrote: > > Hi, > >? > > I have two-node active/passive setup. My goal is to failover a > > resource once a Node goes down with minimal downtime as possible. > > Based on my testing, when Node1 goes down it failover to Node2. If > > Node1 goes up after link reconnection (reconnect physical cable), > > resource failback to Node1 even though I configured resource- > > stickiness. Is there something wrong with configuration below? > >? > > #service firewalld stop > > #vi /etc/hosts --> 192.168.10.121 (Node1) / 192.168.10.122 (Node2) > -- > > --- Private Network (Direct connect) > > #systemctl start pcsd.service > > #systemctl enable pcsd.service > > #passwd hacluster --> define pw > > #pcs cluster auth Node1 Node2 > > #pcs setup --name Cluster Node1 Node2 > > #pcs cluster start -all > > #pcs property set stonith-enabled=false > > #pcs resource create ClusterIP ocf:heartbeat:IPaddr2 > > ip=192.168.10.123 cidr_netmask=32 op monitor interval=30s > > #pcs resource defaults resource-stickiness=100 > >? > > Regards, > > imnotarobot > > Your configuration is correct, but keep in mind scores of all kinds > will be added together to determine where the final placement is. > > In this case, I'd check that you don't have any constraints with a > higher score preferring the other node. For example, if you > previously? > did a "move" or "ban" from the command line, that adds a constraint > that has to be removed manually if you no longer want it. > --? > Ken Gaillot > > > >>>>>>>>>> > I'm confused. constraint from what I think means there's a preferred > node. But if I want my resources not to have a preferred node is that > possible? > > Regards, > imnotarobot Yes, that's one type of constraint -- but you may not have realized you added one if you ran something like "pcs resource move", which is a way of saying there's a preferred node. There are a variety of other constraints. For example, as you add more resources, you might say that resource A can't run on the same node as resource B, and if that constraint's score is higher than the stickiness, A might move if B starts on its node. To see your existing constraints using pcs, run "pcs constraint show". If there are any you don't want, you can remove them with various pcs commands. -- Ken Gaillot >>>>>>>>>> Correct me if I'm wrong. So resource-stickiness policy can not be used alone. A constraint configuration should be setup in order to make it work but will also be dependent on the level of scores that was setup between the two. Can you suggest what type of constraint configuration should i set to achieve the simple goal above? Regards, imnotarobot ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Resource-stickiness is not working
On Fri, 2018-06-01 at 22:58 +0800, Confidential Company wrote: > Hi, > > I have two-node active/passive setup. My goal is to failover a > resource once a Node goes down with minimal downtime as possible. > Based on my testing, when Node1 goes down it failover to Node2. If > Node1 goes up after link reconnection (reconnect physical cable), > resource failback to Node1 even though I configured resource- > stickiness. Is there something wrong with configuration below? > > #service firewalld stop > #vi /etc/hosts --> 192.168.10.121 (Node1) / 192.168.10.122 (Node2) -- > --- Private Network (Direct connect) > #systemctl start pcsd.service > #systemctl enable pcsd.service > #passwd hacluster --> define pw > #pcs cluster auth Node1 Node2 > #pcs setup --name Cluster Node1 Node2 > #pcs cluster start -all > #pcs property set stonith-enabled=false > #pcs resource create ClusterIP ocf:heartbeat:IPaddr2 > ip=192.168.10.123 cidr_netmask=32 op monitor interval=30s > #pcs resource defaults resource-stickiness=100 > > Regards, > imnotarobot Your configuration is correct, but keep in mind scores of all kinds will be added together to determine where the final placement is. In this case, I'd check that you don't have any constraints with a higher score preferring the other node. For example, if you previously did a "move" or "ban" from the command line, that adds a constraint that has to be removed manually if you no longer want it. -- Ken Gaillot >>>>>>>>>> I'm confused. constraint from what I think means there's a preferred node. But if I want my resources not to have a preferred node is that possible? Regards, imnotarobot ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Resource-stickiness is not working
Hi, I have two-node active/passive setup. My goal is to failover a resource once a Node goes down with minimal downtime as possible. Based on my testing, when Node1 goes down it failover to Node2. If Node1 goes up after link reconnection (reconnect physical cable), resource failback to Node1 even though I configured resource-stickiness. Is there something wrong with configuration below? #service firewalld stop #vi /etc/hosts --> 192.168.10.121 (Node1) / 192.168.10.122 (Node2) - Private Network (Direct connect) #systemctl start pcsd.service #systemctl enable pcsd.service #passwd hacluster --> define pw #pcs cluster auth Node1 Node2 #pcs setup --name Cluster Node1 Node2 #pcs cluster start -all #pcs property set stonith-enabled=false #pcs resource create ClusterIP ocf:heartbeat:IPaddr2 ip=192.168.10.123 cidr_netmask=32 op monitor interval=30s #pcs resource defaults resource-stickiness=100 Regards, imnotarobot ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] ethmonitor is not working
Hi, I have two node active/passive setup. This is my configuration: #service firewalld stop #vi /etc/hosts --> 192.168.2.121 (Node1) / 192.168.2.122 (Node2) - Private Network (Direct connect) #systemctl start pcsd.service #systemctl enable pcsd.service #passwd hacluster --> define pw #pcs cluster auth Node1 Node2 #pcs setup --name Cluster Node1 Node2 #pcs cluster start -all #pcs property set stonith-enabled=false #pcs resource create ClusterIP ocf:heartbeat:IPaddr2 ip=192.168.10.123 cidr_netmask=32 op monitor interval=30s #pcs resource defaults resource-stickiness=100 #pcs resource create ens192-monitor ethmonitor interface=ens192 --clone #pcs constraint location ClusterIP rule score=-INFINITY ethmonitor-ens192 ne 1 My goal is to have two interfaces, service network(ens192) and heartbeat network(ens224). Based on my research, ethmonitor is used to monitor the interface via pacemaker, if there's a failed link, resources will failover to other Node. My problem is no error were prompted via pcs status, resource does not failover and service(ClusterIP) goes down. Testing scenario: 1. Disconnected physicall Link of Node1 > No error appears on pcs status and ClusterIP is not reachable as failover to other node does not happen. Any ideas? Is this a bug? Are there any missing configurations? Regards, imnotarobot ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] ethmonitor RA agent error. How can I fix this? (RHEL)
I have two Virtual machines with two network interfaces. See configuration below: *eth0 - service network *eth1 - heartbeat network *vi /etc/hosts - RhelA(ip of eth1) / RhelB(ip of eth1) *service firewalld stop *pcs resource create VirtualIP ocf:heartbeat:IPaddr2 ip=(virtual ip) cidr_netmas=32 op monitor interval =30s *pcs resource create eth1-monitor ethmonitor interface=eth1 --clone *pcs constraint location VirtualIP rule score=-INFINITY ethmonitor-ens192 ne 1 +++ I tried to ifdown eth0(service network). The result is: 1. VirtualIP resource switched to Node2 2. Got an error from pcs status, Error "unable to find nic" 3. Even after a successful failover, error still exist Since it automatically switced to Node2, my goal is to failover again to Node1. This is what I did: 1. Enable eth0 of node1, wait for 15 seconds. 2. Disable eth0 of node2 2. VirtualIP resource got stopped 3. Even after enabling eth0 of node1, error from previous procedure still exist. 4. Got an additional error, I have two errors now 5. VirtualIP resource doesn't start Regards, imnotarobot ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] ethmonitor configuration
Hi, 1. What are the configurations on ethmonitor if I want to setup a two nic (data and heartbeat), I have virtualIP resource, and this virtualIP should only run on a node the ethmonitor resource tells the ethernet device is up on? 2. What is the minimum monitoring interval of ethmonitor RA? Regards, imnotarobot ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Two-node cluster fencing
Message: 2 Date: Sun, 13 May 2018 10:20:36 +0300 From: Andrei Borzenkov To: users@clusterlabs.org Subject: Re: [ClusterLabs] Two-node cluster fencing Message-ID: Content-Type: text/plain; charset=utf-8 12.05.2018 07:31, Confidential Company ?: > Hi, > > This is my setup: > > 1. I have Two vMware-ESXI hosts with one virtual machine (RHEL 7.4) on each. > 2. On my physical machine, I have four vmnic --> vmnic 0,1 for uplink going > to switchA and switchB --> vmnic 2,3 for heartbeat corosync traffic (direct > connect to other ESXI host) > 3. I plan on clustering my two virtual machines via corosync and create a > virtual-IP via pacemaker. > 4. I plan on using the uplink interface for data and totem interface for > corosync packets(heartbeat messages). > 5. These two virtual machines doesnt need for a shared storage, or a shared > LUN because the application is, by nature, a standalone application that > doesnt need to have a centralized location as it does not store any data > that needs to be synchronized between two servers. >Then why you need fialover cluster in the first place? Why cannot you have both nodes running at the same time and use virtual server to let clients connect to either of them? *Reply to Andrei: *I want to it to be simple as possible. If I setup them as active/active, i need a loadbalancer (nginx I assume). Or maybe you have suggestions. Thanks bro > 6. I have a PC that only needs to contact the Virtual IP of the rhel > virtual servers. > 7. Seamless failover from primary to secondary is not required.> 8. Active/Passive setup > > > Given the setup above, > 1. Is there any drawbacks? > 2. Do I need fencing? Can you explain me by giving a scenario on the above > setup? What instances will occur if I didnt put a fence device? > 3. If I need a fence device? what fence device you recommend? SAN, vmWare, > or PDU? > > > Thanks, > > imnotarobot > > > > ___ ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Two-node cluster fencing
Hi Casey, 1. I tried shutting down my VM while testing, the "ClusterIP" resource switched automatically onto the standy node(Node2) 2. I do "systemctl enable corosnyc/pacemaker" - so that after reboot, corosync and pacemaker will automatically start. 3. As I turn-on Node1, I experienced downtime (maybe syncing of nodes will result of downtime), but my cluster still works as expected --> The active node is still Node2. 4. If I choose ESXI as my fence device, if the physical server goes down, would it still be reasonable because its on one host? Thanks Casey, I want to understand more about fencing. Regards, imnotarobot >>Without fencing, if the primary is powered off abruptly (e.g. if one of your ESX servers crashes), the standby will not become primary, and you will need to promote it manually. We had exactly this scenario happen last week with a 2-node cluster. Without fencing, you don't have high availability. If you don't need high availability, you probably don't need pacemaker. There are instructions for setting up fencing with vmware here: https://www.hastexo.com/resources/hints-and-kinks/fencing-vmware-virtualized-pacemaker-nodes/ One note - rather than the SDK, I believe you actually need the CLI package, which can be found here: https://my.vmware.com/web/vmware/details?downloadGroup=VCLI600&productId=491 Good luck - I haven't managed to get it to build yet - vmware gives you a black box installer script that compiles a bunch of dependent perl modules, and it ends up getting hung with 100% CPU usage for days - digging into this further with lsof and friends, it seems to be prompting for where your apache source code is to compile mod_perl. Why does it need mod_perl for the CLI?? Anyways, I haven't managed to get past that roadblock yet. I'm using Ubuntu 16 so it may happen to just work better on your RHEL instances. If you have a different ESX version than 6.0, you may have better luck as well. Best wishes, -- Casey > On May 11, 2018, at 10:31 PM, Confidential Company wrote: > > Hi, > > This is my setup: > > 1. I have Two vMware-ESXI hosts with one virtual machine (RHEL 7.4) on each. > 2. On my physical machine, I have four vmnic --> vmnic 0,1 for uplink going to switchA and switchB --> vmnic 2,3 for heartbeat corosync traffic (direct connect to other ESXI host) > 3. I plan on clustering my two virtual machines via corosync and create a virtual-IP via pacemaker. > 4. I plan on using the uplink interface for data and totem interface for corosync packets(heartbeat messages). > 5. These two virtual machines doesnt need for a shared storage, or a shared LUN because the application is, by nature, a standalone application that doesnt need to have a centralized location as it does not store any data that needs to be synchronized between two servers. > 6. I have a PC that only needs to contact the Virtual IP of the rhel virtual servers. > 7. Seamless failover from primary to secondary is not required. > 8. Active/Passive setup > > > Given the setup above, > 1. Is there any drawbacks? > 2. Do I need fencing? Can you explain me by giving a scenario on the above setup? What instances will occur if I didnt put a fence device? > 3. If I need a fence device? what fence device you recommend? SAN, vmWare, or PDU? > > > Thanks, > > imnotarobot ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Two-node cluster fencing
Hi, This is my setup: 1. I have Two vMware-ESXI hosts with one virtual machine (RHEL 7.4) on each. 2. On my physical machine, I have four vmnic --> vmnic 0,1 for uplink going to switchA and switchB --> vmnic 2,3 for heartbeat corosync traffic (direct connect to other ESXI host) 3. I plan on clustering my two virtual machines via corosync and create a virtual-IP via pacemaker. 4. I plan on using the uplink interface for data and totem interface for corosync packets(heartbeat messages). 5. These two virtual machines doesnt need for a shared storage, or a shared LUN because the application is, by nature, a standalone application that doesnt need to have a centralized location as it does not store any data that needs to be synchronized between two servers. 6. I have a PC that only needs to contact the Virtual IP of the rhel virtual servers. 7. Seamless failover from primary to secondary is not required. 8. Active/Passive setup Given the setup above, 1. Is there any drawbacks? 2. Do I need fencing? Can you explain me by giving a scenario on the above setup? What instances will occur if I didnt put a fence device? 3. If I need a fence device? what fence device you recommend? SAN, vmWare, or PDU? Thanks, imnotarobot ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org