from:"Confidential Company"

[ClusterLabs] ping Resource Agent doesnt work

2018-07-19 Thread Confidential Company

I configured ping-RA via ocf:pacemaker. During testing the resource-agent
works fine, it executes ping action to my gateway every 10 seconds and 3
attempts. But I noticed that even I disconnect the physical link of node
where the resource resides, it doesnt failover to other node.


*IP CONFIG:*

172.16.10.0/24 = interface to gateway
172.16.11.0/24= heartbeat link (interface to avoid split brain as my goal
is to only failover ping-resource after 3-attempts failure to ping my
gateway)

*CONFIG:*

[root@node1 ~]# pcs config
Cluster Name: clusterPa
Corosync Nodes:
 node1 node2
Pacemaker Nodes:
 node1 node2

Resources:
 Resource: ping-gateway (class=ocf provider=pacemaker type=ping)
  Attributes: host_list=172.16.10.1
  Operations: monitor interval=10 timeout=60
(ping-gateway-monitor-interval-10)
  start interval=0s timeout=60 (ping-gateway-start-interval-0s)
  stop interval=0s timeout=20 (ping-gateway-stop-interval-0s)

Stonith Devices:
Fencing Levels:

Location Constraints:
Ordering Constraints:
Colocation Constraints:
Ticket Constraints:

Alerts:
 No alerts defined

Resources Defaults:
 No defaults set
Operations Defaults:
 No defaults set

Cluster Properties:
 cluster-infrastructure: corosync
 cluster-name: clusterPa
 dc-version: 1.1.16-12.el7-94ff4df
 have-watchdog: false
 last-lrm-refresh: 1531899781
 stonith-enabled: false

Quorum:
  Options:
[root@node1 ~]#
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Weird Fencing Behavior

2018-07-18 Thread Confidential Company

Rhel1 stonith-ng[1473]: warning: Mapping
action='off'
>> to pcmk_reboot_action='off'
>> Jul 17 13:33:28 ArcoSRhel1 stonith-ng[1473]:  notice: Fence1 can not
fence
>> (reboot) ArcosRhel2: static-list
>> Jul 17 13:33:28 ArcoSRhel1 stonith-ng[1473]:  notice: fence2 can fence
>> (reboot) ArcosRhel2: static-list
>> Jul 17 13:33:28 ArcoSRhel1 stonith-ng[1473]:  notice: Fence1 can not
fence
>> (reboot) ArcosRhel2: static-list
>> Jul 17 13:33:28 ArcoSRhel1 stonith-ng[1473]:  notice: fence2 can fence
>> (reboot) ArcosRhel2: static-list
>> Jul 17 13:33:46 ArcoSRhel1 fence_vmware_soap: Unable to connect/login to
>> fencing device
>> Jul 17 13:33:46 ArcoSRhel1 stonith-ng[1473]: warning:
>> fence_vmware_soap[7157] stderr: [ Unable to connect/login to fencing
device
>> ]
>> Jul 17 13:33:46 ArcoSRhel1 stonith-ng[1473]: warning:
>> fence_vmware_soap[7157] stderr: [  ]
>> Jul 17 13:33:46 ArcoSRhel1 stonith-ng[1473]: warning:
>> fence_vmware_soap[7157] stderr: [  ]
>>
>>
>>
>>
>>
>>
>>>> See my config below:
>>>>
>>>> [root@ArcosRhel2 cluster]# pcs config
>>>> Cluster Name: ARCOSCLUSTER
>>>> Corosync Nodes:
>>>> ? ArcosRhel1 ArcosRhel2
>>>> Pacemaker Nodes:
>>>> ? ArcosRhel1 ArcosRhel2
>>>>
>>>> Resources:
>>>> ? Resource: ClusterIP (class=ocf provider=heartbeat type=IPaddr2)
>>>> ? ?Attributes: cidr_netmask=32 ip=172.16.10.243
>>>> ? ?Operations: monitor interval=30s (ClusterIP-monitor-interval-30s)
>>>> ? ? ? ? ? ? ? ?start interval=0s timeout=20s (ClusterIP-start-
>>> interval-0s)
>>>> ? ? ? ? ? ? ? ?stop interval=0s timeout=20s (ClusterIP-stop-
>>> interval-0s)
>>>> Stonith Devices:
>>>> ? Resource: Fence1 (class=stonith type=fence_vmware_soap)
>>>> ? ?Attributes: action=off ipaddr=172.16.10.151 login=admin
>>> passwd=123pass
>>>> pcmk_host_list=ArcosRhel1 pcmk_monitor_timeout=60s
>>> port=ArcosRhel1(Joniel)
>>>> ssl_insecure=1 pcmk_delay_max=0s
>>>> ? ?Operations: monitor interval=60s (Fence1-monitor-interval-60s)
>>>> ? Resource: fence2 (class=stonith type=fence_vmware_soap)
>>>> ? ?Attributes: action=off ipaddr=172.16.10.152 login=admin
>>> passwd=123pass
>>>> pcmk_delay_max=0s pcmk_host_list=ArcosRhel2
>>> pcmk_monitor_timeout=60s
>>>> port=ArcosRhel2(Ben) ssl_insecure=1
>>>> ? ?Operations: monitor interval=60s (fence2-monitor-interval-60s)
>>>> Fencing Levels:
>>>>
>>>> Location Constraints:
>>>> ? ?Resource: Fence1
>>>> ? ? ?Enabled on: ArcosRhel2 (score:INFINITY)
>>>> (id:location-Fence1-ArcosRhel2-INFINITY)
>>>> ? ?Resource: fence2
>>>> ? ? ?Enabled on: ArcosRhel1 (score:INFINITY)
>>>> (id:location-fence2-ArcosRhel1-INFINITY)
>>>> Ordering Constraints:
>>>> Colocation Constraints:
>>>> Ticket Constraints:
>>>>
>>>> Alerts:
>>>> ? No alerts defined
>>>>
>>>> Resources Defaults:
>>>> ? No defaults set
>>>> Operations Defaults:
>>>> ? No defaults set
>>>>
>>>> Cluster Properties:
>>>> ? cluster-infrastructure: corosync
>>>> ? cluster-name: ARCOSCLUSTER
>>>> ? dc-version: 1.1.16-12.el7-94ff4df
>>>> ? have-watchdog: false
>>>> ? last-lrm-refresh: 1531810841
>>>> ? stonith-enabled: true
>>>>
>>>> Quorum:
>>>> ? ?Options:

On Wed, Jul 18, 2018 at 8:00 PM,  wrote:

> Send Users mailing list submissions to
> users@clusterlabs.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.clusterlabs.org/mailman/listinfo/users
> or, via email, send a message with subject or body 'help' to
> users-requ...@clusterlabs.org
>
> You can reach the person managing the list at
> users-ow...@clusterlabs.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Users digest..."
>
>
> Today's Topics:
>
>1. Re: Weird Fencing Behavior (Andrei Borzenkov)
>2. Re: Weird Fencing Behavior (Klaus Wenninger)
>
>
> --
>
> Message: 1
> Date: Wed, 18 Jul 2018 07:22:25 +0300
> From: Andrei Borzenkov 
> To: users@clusterlabs.org
> Subject: Re: [ClusterLabs] Weird Fencing Behavior
> Mess

Re: [ClusterLabs] Weird Fencing Behavior

2018-07-17 Thread Confidential Company

> > Hi,
> >
> > On my two-node active/passive setup, I configured fencing via
> > fence_vmware_soap. I configured pcmk_delay=0 on both nodes so I
> expected
> > that both nodes will be stonithed simultaenously.
> >
> > On my test scenario, Node1 has ClusterIP resource. When I
> disconnect
> > service/corosync link physically, Node1 was fenced and Node2 keeps
> alive
> > given pcmk_delay=0 on both nodes.
> >
> > Can you explain the behavior above?
> >
>
> #node1 could not connect to ESX because links were disconnected. As
> the
> #most obvious explanation.
>
> #You have logs, you are the only one who can answer this question
> with
> #some certainty. Others can only guess.
>
>
> Oops, my bad. I forgot to tell. I have two interfaces on each virtual
> machine (nodes). second interface was used for ESX links, so fence
> can be executed even though corosync links were disconnected. Looking
> forward to your response. Thanks

#Having no fence delay means a death match (each node killing the other)
#is possible, but it doesn't guarantee that it will happen. Some of the
#time, one node will detect the outage and fence the other one before
#the other one can react.

#It's basically an Old West shoot-out -- they may reach for their guns
#at the same time, but one may be quicker.

#As Andrei suggested, the logs from both nodes could give you a timeline
#of what happened when.


Hi andrei, kindly see below logs. Based on time of logs, Node1 should have
fenced first Node2, but in actual test/scenario, Node1 was fenced/shutdown
by Node2.

Is it possible to have a 2-Node active/passive setup in pacemaker/corosync
that the node that gets disconnected/interface down is the only one that
gets fenced?

Thanks guys

*LOGS from Node2:*

Jul 17 13:33:27 ArcosRhel2 corosync[1048]: [TOTEM ] A processor failed,
forming new configuration.
Jul 17 13:33:28 ArcosRhel2 corosync[1048]: [TOTEM ] A new membership (
172.16.10.242:220) was formed. Members left: 1
Jul 17 13:33:28 ArcosRhel2 corosync[1048]: [TOTEM ] Failed to receive the
leave message. failed: 1
Jul 17 13:33:28 ArcosRhel2 corosync[1048]: [QUORUM] Members[1]: 2
Jul 17 13:33:28 ArcosRhel2 corosync[1048]: [MAIN  ] Completed service
synchronization, ready to provide service.
Jul 17 13:33:28 ArcosRhel2 attrd[1082]:  notice: Node ArcosRhel1 state is
now lost
Jul 17 13:33:28 ArcosRhel2 attrd[1082]:  notice: Removing all ArcosRhel1
attributes for peer loss
Jul 17 13:33:28 ArcosRhel2 attrd[1082]:  notice: Lost attribute writer
ArcosRhel1
Jul 17 13:33:28 ArcosRhel2 attrd[1082]:  notice: Purged 1 peers with id=1
and/or uname=ArcosRhel1 from the membership cache
Jul 17 13:33:28 ArcosRhel2 cib[1079]:  notice: Node ArcosRhel1 state is now
lost
Jul 17 13:33:28 ArcosRhel2 cib[1079]:  notice: Purged 1 peers with id=1
and/or uname=ArcosRhel1 from the membership cache
Jul 17 13:33:28 ArcosRhel2 crmd[1084]:  notice: Node ArcosRhel1 state is
now lost
Jul 17 13:33:28 ArcosRhel2 crmd[1084]: warning: Our DC node (ArcosRhel1)
left the cluster
Jul 17 13:33:28 ArcosRhel2 pacemakerd[1074]:  notice: Node ArcosRhel1 state
is now lost
Jul 17 13:33:28 ArcosRhel2 stonith-ng[1080]:  notice: Node ArcosRhel1 state
is now lost
Jul 17 13:33:28 ArcosRhel2 stonith-ng[1080]:  notice: Purged 1 peers with
id=1 and/or uname=ArcosRhel1 from the membership cache
Jul 17 13:33:28 ArcosRhel2 crmd[1084]:  notice: State transition S_NOT_DC
-> S_ELECTION
Jul 17 13:33:28 ArcosRhel2 crmd[1084]:  notice: State transition S_ELECTION
-> S_INTEGRATION
Jul 17 13:33:28 ArcosRhel2 crmd[1084]: warning: Input I_ELECTION_DC
received in state S_INTEGRATION from do_election_check
Jul 17 13:33:30 ArcosRhel2 pengine[1083]: warning: Node ArcosRhel1 will be
fenced because the node is no longer part of the cluster
Jul 17 13:33:30 ArcosRhel2 pengine[1083]: warning: Node ArcosRhel1 is
unclean
Jul 17 13:33:30 ArcosRhel2 pengine[1083]: warning: Action fence2_stop_0 on
ArcosRhel1 is unrunnable (offline)
Jul 17 13:33:30 ArcosRhel2 pengine[1083]: warning: Action ClusterIP_stop_0
on ArcosRhel1 is unrunnable (offline)
Jul 17 13:33:30 ArcosRhel2 pengine[1083]: warning: Scheduling Node
ArcosRhel1 for STONITH
Jul 17 13:33:30 ArcosRhel2 pengine[1083]:  notice: Move
 fence2#011(Started ArcosRhel1 -> ArcosRhel2)
Jul 17 13:33:30 ArcosRhel2 pengine[1083]:  notice: Move
 ClusterIP#011(Started ArcosRhel1 -> ArcosRhel2)
Jul 17 13:33:30 ArcosRhel2 pengine[1083]: warning: Calculated transition 0
(with warnings), saving inputs in /var/lib/pacemaker/pengine/pe-warn-20.bz2
Jul 17 13:33:30 ArcosRhel2 crmd[1084]:  notice: Requesting fencing (reboot)
of node ArcosRhel1
Jul 17 13:33:30 ArcosRhel2 crmd[1084]:  notice: Initiating start operation
fence2_start_0 locally on ArcosRhel2
Jul 17 13:33:30 ArcosRhel2 stonith-ng[1080]:  notice: Client
crmd.1084.cd70178e wants to fence (reboot) 'ArcosRhel1' with device '(any)'
Jul 17 13:33:30 ArcosRhel2 stonith-ng[1080]:  notice: Requesting peer
fencing (reboot) of ArcosRhel1
Jul 17 13:33:30 ArcosRhel2 stonith-ng[1080]:  notice: Fence1

[ClusterLabs] Weird Fencing Behavior

2018-07-17 Thread Confidential Company

> Hi,
>
> On my two-node active/passive setup, I configured fencing via
> fence_vmware_soap. I configured pcmk_delay=0 on both nodes so I expected
> that both nodes will be stonithed simultaenously.
>
> On my test scenario, Node1 has ClusterIP resource. When I disconnect
> service/corosync link physically, Node1 was fenced and Node2 keeps alive
> given pcmk_delay=0 on both nodes.
>
> Can you explain the behavior above?
>

#node1 could not connect to ESX because links were disconnected. As the
#most obvious explanation.

#You have logs, you are the only one who can answer this question with
#some certainty. Others can only guess.


Oops, my bad. I forgot to tell. I have two interfaces on each virtual
machine (nodes). second interface was used for ESX links, so fence can be
executed even though corosync links were disconnected. Looking forward to
your response. Thanks

>
>
> See my config below:
>
> [root@ArcosRhel2 cluster]# pcs config
> Cluster Name: ARCOSCLUSTER
> Corosync Nodes:
>  ArcosRhel1 ArcosRhel2
> Pacemaker Nodes:
>  ArcosRhel1 ArcosRhel2
>
> Resources:
>  Resource: ClusterIP (class=ocf provider=heartbeat type=IPaddr2)
>   Attributes: cidr_netmask=32 ip=172.16.10.243
>   Operations: monitor interval=30s (ClusterIP-monitor-interval-30s)
>   start interval=0s timeout=20s (ClusterIP-start-interval-0s)
>   stop interval=0s timeout=20s (ClusterIP-stop-interval-0s)
>
> Stonith Devices:
>  Resource: Fence1 (class=stonith type=fence_vmware_soap)
>   Attributes: action=off ipaddr=172.16.10.151 login=admin passwd=123pass
> pcmk_host_list=ArcosRhel1 pcmk_monitor_timeout=60s port=ArcosRhel1(Joniel)
> ssl_insecure=1 pcmk_delay_max=0s
>   Operations: monitor interval=60s (Fence1-monitor-interval-60s)
>  Resource: fence2 (class=stonith type=fence_vmware_soap)
>   Attributes: action=off ipaddr=172.16.10.152 login=admin passwd=123pass
> pcmk_delay_max=0s pcmk_host_list=ArcosRhel2 pcmk_monitor_timeout=60s
> port=ArcosRhel2(Ben) ssl_insecure=1
>   Operations: monitor interval=60s (fence2-monitor-interval-60s)
> Fencing Levels:
>
> Location Constraints:
>   Resource: Fence1
> Enabled on: ArcosRhel2 (score:INFINITY)
> (id:location-Fence1-ArcosRhel2-INFINITY)
>   Resource: fence2
> Enabled on: ArcosRhel1 (score:INFINITY)
> (id:location-fence2-ArcosRhel1-INFINITY)
> Ordering Constraints:
> Colocation Constraints:
> Ticket Constraints:
>
> Alerts:
>  No alerts defined
>
> Resources Defaults:
>  No defaults set
> Operations Defaults:
>  No defaults set
>
> Cluster Properties:
>  cluster-infrastructure: corosync
>  cluster-name: ARCOSCLUSTER
>  dc-version: 1.1.16-12.el7-94ff4df
>  have-watchdog: false
>  last-lrm-refresh: 1531810841
>  stonith-enabled: true
>
> Quorum:
>   Options:
>
>
>
> ___
> Users mailing list: Users@clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] Weird Fencing Behavior?

2018-07-17 Thread Confidential Company

Hi,

On my two-node active/passive setup, I configured fencing via
fence_vmware_soap. I configured pcmk_delay=0 on both nodes so I expected
that both nodes will be stonithed simultaenously.

On my test scenario, Node1 has ClusterIP resource. When I disconnect
service/corosync link physically, Node1 was fenced and Node2 keeps alive
given pcmk_delay=0 on both nodes.

Can you explain the behavior above?



See my config below:

[root@ArcosRhel2 cluster]# pcs config
Cluster Name: ARCOSCLUSTER
Corosync Nodes:
 ArcosRhel1 ArcosRhel2
Pacemaker Nodes:
 ArcosRhel1 ArcosRhel2

Resources:
 Resource: ClusterIP (class=ocf provider=heartbeat type=IPaddr2)
  Attributes: cidr_netmask=32 ip=172.16.10.243
  Operations: monitor interval=30s (ClusterIP-monitor-interval-30s)
  start interval=0s timeout=20s (ClusterIP-start-interval-0s)
  stop interval=0s timeout=20s (ClusterIP-stop-interval-0s)

Stonith Devices:
 Resource: Fence1 (class=stonith type=fence_vmware_soap)
  Attributes: action=off ipaddr=172.16.10.151 login=admin passwd=123pass
pcmk_host_list=ArcosRhel1 pcmk_monitor_timeout=60s port=ArcosRhel1(Joniel)
ssl_insecure=1 pcmk_delay_max=0s
  Operations: monitor interval=60s (Fence1-monitor-interval-60s)
 Resource: fence2 (class=stonith type=fence_vmware_soap)
  Attributes: action=off ipaddr=172.16.10.152 login=admin passwd=123pass
pcmk_delay_max=0s pcmk_host_list=ArcosRhel2 pcmk_monitor_timeout=60s
port=ArcosRhel2(Ben) ssl_insecure=1
  Operations: monitor interval=60s (fence2-monitor-interval-60s)
Fencing Levels:

Location Constraints:
  Resource: Fence1
Enabled on: ArcosRhel2 (score:INFINITY)
(id:location-Fence1-ArcosRhel2-INFINITY)
  Resource: fence2
Enabled on: ArcosRhel1 (score:INFINITY)
(id:location-fence2-ArcosRhel1-INFINITY)
Ordering Constraints:
Colocation Constraints:
Ticket Constraints:

Alerts:
 No alerts defined

Resources Defaults:
 No defaults set
Operations Defaults:
 No defaults set

Cluster Properties:
 cluster-infrastructure: corosync
 cluster-name: ARCOSCLUSTER
 dc-version: 1.1.16-12.el7-94ff4df
 have-watchdog: false
 last-lrm-refresh: 1531810841
 stonith-enabled: true

Quorum:
  Options:
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] What triggers fencing?

2018-07-12 Thread Confidential Company

Message: 2
Date: Wed, 11 Jul 2018 16:33:31 +0200
From: Klaus Wenninger 
To: Ken Gaillot , Cluster Labs - All topics
related to open-source clustering welcomed ,
Andrei Borzenkov 
Subject: Re: [ClusterLabs] What triggers fencing?
Message-ID: <2bf61b9f-98b0-482f-fa65-263ba9490...@redhat.com>
Content-Type: text/plain; charset=utf-8

On 07/11/2018 04:11 PM, Ken Gaillot wrote:
> On Wed, 2018-07-11 at 11:06 +0200, Klaus Wenninger wrote:
>> On 07/11/2018 05:48 AM, Andrei Borzenkov wrote:
>>> 11.07.2018 05:45, Confidential Company ?:
>>>> Not true, the faster node will kill the slower node first. It is
>>>> possible that through misconfiguration, both could die, but it's
>>>> rare
>>>> and easily avoided with a 'delay="15"' set on the fence config
>>>> for the
>>>> node you want to win.
>>>>
>>>> Don't use a delay on the other node, just the node you want to
>>>> live in
>>>> such a case.
>>>>
>>>> **
>>>> 1. Given Active/Passive setup, resources are
>>>> active on Node1
>>>> 2. fence1(prefers to Node1, delay=15) and
>>>> fence2(prefers to
>>>> Node2, delay=30)
>>>> 3. Node2 goes down
> What do you mean by "down" in this case?
>
> If you mean the host itself has crashed, then it will not do anything,
> and node1 will fence it.
>
> If you mean node2's network goes out, so it's still functioning but no
> one can reach the managed service on it, then you are correct, the
> "wrong" node can get shot -- because you didn't specify anything about
> what the right node would be. This is a somewhat tricky area, but it
> can be done with a quorum-only node, qdisk, or fence_heuristics_ping,
> all of which are different ways of "preferring" the node that can reach
> a certain host.



Or in other words why would I - as a cluster-node - shoot the
peer to be able to start the services locally if I can somehow
tell beforehand that my services anyway wouldn't be
reachable by anybody (e.g. network disconnected).
Then it might make more sense to sit still and wait to be shot by
the other side for the case that guy is more lucky and
has e.g. access to the network.


-Klaus


in case of 2node setup, they are both know nothing if their services are
reachable by anybody.

Sharing you my config and my tests:

Last login: Thu Jul 12 14:57:21 2018
[root@ArcosRhel1 ~]# pcs config
Cluster Name: ARCOSCLUSTER
Corosync Nodes:
 ArcosRhel1 ArcosRhel2
Pacemaker Nodes:
 ArcosRhel1 ArcosRhel2

Resources:
 Resource: ClusterIP (class=ocf provider=heartbeat type=IPaddr2)
  Attributes: cidr_netmask=32 ip=172.16.10.243
  Operations: monitor interval=30s (ClusterIP-monitor-interval-30s)
  start interval=0s timeout=20s (ClusterIP-start-interval-0s)
  stop interval=0s timeout=20s (ClusterIP-stop-interval-0s)

Stonith Devices:
 Resource: Fence1 (class=stonith type=fence_vmware_soap)
  Attributes: action=off ipaddr=172.16.11.201 login=test passwd=testing
pcmk_host_list=ArcosRhel1 pcmk_monitor_timeout=60s port=ArcosRhel1(Joniel)
ssl_insecure=1
  Operations: monitor interval=60s (Fence1-monitor-interval-60s)
 Resource: fence2 (class=stonith type=fence_vmware_soap)
  Attributes: action=off ipaddr=172.16.11.202 login=test passwd=testing
pcmk_delay_max=10s pcmk_host_list=ArcosRhel2 pcmk_monitor_timeout=60s
port=ArcosRhel2(Ben) ssl_insecure=1
  Operations: monitor interval=60s (fence2-monitor-interval-60s)
Fencing Levels:

Location Constraints:
  Resource: Fence1
Enabled on: ArcosRhel2 (score:INFINITY)
(id:location-Fence1-ArcosRhel2-INFINITY)
  Resource: fence2
Enabled on: ArcosRhel1 (score:INFINITY)
(id:location-fence2-ArcosRhel1-INFINITY)
Ordering Constraints:
Colocation Constraints:
Ticket Constraints:

Alerts:
 No alerts defined

Resources Defaults:
 No defaults set
Operations Defaults:
 No defaults set

Cluster Properties:
 cluster-infrastructure: corosync
 cluster-name: ARCOSCLUSTER
 dc-version: 1.1.16-12.el7-94ff4df
 have-watchdog: false
 last-lrm-refresh: 1531375458
 stonith-enabled: true

Quorum:
  Options:
[root@ArcosRhel1 ~]#

**Test scenario:
Given:
Nodes has two interfaces: (ens192 for corosync traffic / ens224 for esxi
traffic)

a.) Node1=Active and Node2=Passive.
 Action=disconnect ens192 of Node1
Output= Node2 was fenced and shutdown
b.) Node1=Passive and Node2=Active
Action=disconnect ens192 of Node1
Output= Node1 was fenced and shutdown
c.) Node1=Passive and Node2=Active
Action=disconnect ens192 of Node2
Output=Node2 was fenced and shutdown


Thanks,
imnotarobot



>
> If you mean the cluster-managed resource crashes on node2, but node2
> itself is still functioning properly, then what h

Re: [ClusterLabs] What triggers fencing?

2018-07-11 Thread Confidential Company

Message: 1
Date: Wed, 11 Jul 2018 11:06:56 +0200
From: Klaus Wenninger 
To: Cluster Labs - All topics related to open-source clustering
welcomed , Andrei Borzenkov

Subject: Re: [ClusterLabs] What triggers fencing?
Message-ID: 
Content-Type: text/plain; charset=utf-8

On 07/11/2018 05:48 AM, Andrei Borzenkov wrote:
> 11.07.2018 05:45, Confidential Company ?:
>> Not true, the faster node will kill the slower node first. It is
>> possible that through misconfiguration, both could die, but it's rare
>> and easily avoided with a 'delay="15"' set on the fence config for the
>> node you want to win.
>>
>> Don't use a delay on the other node, just the node you want to live in
>> such a case.
>>
>> **
>> 1. Given Active/Passive setup, resources are active on
Node1
>> 2. fence1(prefers to Node1, delay=15) and fence2(prefers
to
>> Node2, delay=30)
>> 3. Node2 goes down
>> 4. Node1 thinks Node2 goes down / Node2 thinks Node1 goes
>> down
> If node2 is down, it cannot think anything.

True. Assuming it is not really down but just somehow disconnected
for my answer below.

>
>> 5. fence1 counts 15 seconds before he fence Node1 while
>> fence2 counts 30 seconds before he fence Node2
>> 6. Since fence1 do have shorter time than fence2, fence1
>> executes and shutdown Node1.
>> 7. fence1(action: shutdown Node1)  will trigger first
>> always because it has shorter delay than fence2.
>>
>> ** Okay what's important is that they should be different. But in the
case
>> above, even though Node2 goes down but Node1 has shorter delay, Node1
gets
>> fenced/shutdown. This is a sample scenario. I don't get the point. Can
you
>> comment on this?

You didn't send the actual config but from your description
I get the scenario that way:

fencing-resource fence1 is running on Node2 and it is there
to fence Node1 and it has a delay of 15s.
fencing-resource fence2 is running on Node1 and it is there
to fence Node2 and it has a delay of 30s.
If they now begin to fence each other at the same time the
node actually fenced would be Node1 of course as the
fencing-resource fence1 is gonna shoot 15s earlier that the
fence2.
Looks consistent to me ...

Regards,
Klaus



***
Yes, that is right Klaus. fence1 running on Node2 will fence Node1, fence1
will execute first whichever Node goes down because it has shorter delay.
But if Node2 goes down or disconnected, how can it be fenced by Node1 using
fence2, if fence2 cannot be triggered because fence1 always comes first.

My point here is that giving delay on fencing resolves the issue of double
fencing, but it doesnt resolve or doesnt know who's Node should be fenced.
Even though Node2 gets disconnected, Node1 will be fenced and the whole
service totally goes down.

**Let me share you my actual config:

I have two ESXI hosts, 2 virtual machines, 2 interfaces on each (1=corosync
interface, 1=interface for VM to contact ESXI host)

Pacemaker Nodes:
 ArcosRhel1 ArcosRhel2

Resources:
 Resource: ClusterIP (class=ocf provider=heartbeat type=IPaddr2)
  Attributes: cidr_netmask=32 ip=172.16.10.243
  Operations: monitor interval=30s (ClusterIP-monitor-interval-30s)
  start interval=0s timeout=20s (ClusterIP-start-interval-0s)
  stop interval=0s timeout=20s (ClusterIP-stop-interval-0s)

Stonith Devices:
 Resource: Fence1 (class=stonith type=fence_vmware_soap)
  Attributes: action=off ipaddr=172.16.10.201 login=test passwd=testing
pcmk_host_list=ArcosRhel1 pcmk_monitor_timeout=60s port=ArcosRhel1
ssl_insecure=1
  Operations: monitor interval=60s (Fence1-monitor-interval-60s)
 Resource: fence2 (class=stonith type=fence_vmware_soap)
  Attributes: action=off ipaddr=172.16.10.202 login=test passwd=testing
pcmk_delay_max=10s pcmk_host_list=ArcosRhel2 pcmk_monitor_timeout=60s
port=ArcosRhel2 ssl_insecure=1
  Operations: monitor interval=60s (fence2-monitor-interval-60s)
Fencing Levels:

Location Constraints:
  Resource: Fence1
Enabled on: ArcosRhel2 (score:INFINITY)
(id:location-Fence1-ArcosRhel2-INFINITY)
  Resource: fence2
Enabled on: ArcosRhel1 (score:INFINITY)
(id:location-fence2-ArcosRhel1-INFINITY)
Ordering Constraints:
Colocation Constraints:
Ticket Constraints:

Alerts:
 No alerts defined

Resources Defaults:
 No defaults set
Operations Defaults:
 No defaults set

Cluster Properties:
 cluster-infrastructure: corosync
 cluster-name: ARCOSCLUSTER
 dc-version: 1.1.16-12.el7-94ff4df
 have-watchdog: false
 last-lrm-refresh: 1531300540
 stonith-enabled: true

*
>>
>> Thanks
>>
>> On Tue, Jul 10, 2018 at 12:18 AM, Klaus Wenninger 
>> wrote:
>>
>>> On 07/09/2018 05:53 PM, Digimer

Re: [ClusterLabs] What triggers fencing?

2018-07-10 Thread Confidential Company

Not true, the faster node will kill the slower node first. It is
possible that through misconfiguration, both could die, but it's rare
and easily avoided with a 'delay="15"' set on the fence config for the
node you want to win.

Don't use a delay on the other node, just the node you want to live in
such a case.

**
1. Given Active/Passive setup, resources are active on Node1
2. fence1(prefers to Node1, delay=15) and fence2(prefers to
Node2, delay=30)
3. Node2 goes down
4. Node1 thinks Node2 goes down / Node2 thinks Node1 goes
down
5. fence1 counts 15 seconds before he fence Node1 while
fence2 counts 30 seconds before he fence Node2
6. Since fence1 do have shorter time than fence2, fence1
executes and shutdown Node1.
7. fence1(action: shutdown Node1)  will trigger first
always because it has shorter delay than fence2.

** Okay what's important is that they should be different. But in the case
above, even though Node2 goes down but Node1 has shorter delay, Node1 gets
fenced/shutdown. This is a sample scenario. I don't get the point. Can you
comment on this?

Thanks

On Tue, Jul 10, 2018 at 12:18 AM, Klaus Wenninger 
wrote:

> On 07/09/2018 05:53 PM, Digimer wrote:
> > On 2018-07-09 11:45 AM, Klaus Wenninger wrote:
> >> On 07/09/2018 05:33 PM, Digimer wrote:
> >>> On 2018-07-09 09:56 AM, Klaus Wenninger wrote:
> >>>> On 07/09/2018 03:49 PM, Digimer wrote:
> >>>>> On 2018-07-09 08:31 AM, Klaus Wenninger wrote:
> >>>>>> On 07/09/2018 02:04 PM, Confidential Company wrote:
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> Any ideas what triggers fencing script or stonith?
> >>>>>>>
> >>>>>>> Given the setup below:
> >>>>>>> 1. I have two nodes
> >>>>>>> 2. Configured fencing on both nodes
> >>>>>>> 3. Configured delay=15 and delay=30 on fence1(for Node1) and
> >>>>>>> fence2(for Node2) respectively
> >>>>>>>
> >>>>>>> *What does it mean to configured delay in stonith? wait for 15
> seconds
> >>>>>>> before it fence the node?
> >>>>>> Given that on a 2-node-cluster you don't have real quorum to make
> one
> >>>>>> partial cluster fence the rest of the nodes the different delays
> are meant
> >>>>>> to prevent a fencing-race.
> >>>>>> Without different delays that would lead to both nodes fencing each
> >>>>>> other at the same time - finally both being down.
> >>>>> Not true, the faster node will kill the slower node first. It is
> >>>>> possible that through misconfiguration, both could die, but it's rare
> >>>>> and easily avoided with a 'delay="15"' set on the fence config for
> the
> >>>>> node you want to win.
> >>>> What exactly is not true? Aren't we saying the same?
> >>>> Of course one of the delays can be 0 (most important is that
> >>>> they are different).
> >>> Perhaps I misunderstood your message. It seemed to me that the
> >>> implication was that fencing in 2-node without a delay always ends up
> >>> with both nodes being down, which isn't the case. It can happen if the
> >>> fence methods are not setup right (ie: the node isn't set to
> immediately
> >>> power off on ACPI power button event).
> >> Yes, a misunderstanding I guess.
> >>
> >> Should have been more verbose in saying that due to the
> >> time between the fencing-command fired off to the fencing
> >> device and the actual fencing taking place (as you state
> >> dependent on how it is configured in detail - but a measurable
> >> time in all cases) there is a certain probability that when
> >> both nodes start fencing at roughly the same time we will
> >> end up with 2 nodes down.
> >>
> >> Everybody has to find his own tradeoff between reliability
> >> fence-races are prevented and fencing delay I guess.
> > We've used this;
> >
> > 1. IPMI (with the guest OS set to immediately power off) as primary,
> > with a 15 second delay on the active node.
> >
> > 2. Two Switched PDUs (two power circuits, two PSUs) as backup fencing
> > for when IPMI fails, with no delay.
> >
> > In ~8 years, across dozens and dozens of clusters and countless fence

[ClusterLabs] What triggers fencing?

2018-07-09 Thread Confidential Company

Hi,

Any ideas what triggers fencing script or stonith?

Given the setup below:
1. I have two nodes
2. Configured fencing on both nodes
3. Configured delay=15 and delay=30 on fence1(for Node1) and fence2(for
Node2) respectively

*What does it mean to configured delay in stonith? wait for 15 seconds
before it fence the node?

*Given Node1 is active and Node2 goes down, does it mean fence1 will first
execute and shutdowns Node1 even though Node2 goes down?

Thanks

imnotarobot
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Resource-stickiness is not working

2018-06-05 Thread Confidential Company

On Sat, 2018-06-02 at 22:14 +0800, Confidential Company wrote:
> On Fri, 2018-06-01 at 22:58 +0800, Confidential Company wrote:
> > Hi,
> >?
> > I have two-node active/passive setup. My goal is to failover a
> > resource once a Node goes down with minimal downtime as possible.
> > Based on my testing, when Node1 goes down it failover to Node2. If
> > Node1 goes up after link reconnection (reconnect physical cable),
> > resource failback to Node1 even though I configured resource-
> > stickiness. Is there something wrong with configuration below?
> >?
> > #service firewalld stop
> > #vi /etc/hosts --> 192.168.10.121 (Node1) / 192.168.10.122 (Node2)
> --
> > --- Private Network (Direct connect)
> > #systemctl start pcsd.service
> > #systemctl enable pcsd.service
> > #passwd hacluster --> define pw
> > #pcs cluster auth Node1 Node2
> > #pcs setup --name Cluster Node1 Node2
> > #pcs cluster start -all
> > #pcs property set stonith-enabled=false
> > #pcs resource create ClusterIP ocf:heartbeat:IPaddr2
> > ip=192.168.10.123 cidr_netmask=32 op monitor interval=30s
> > #pcs resource defaults resource-stickiness=100
> >?
> > Regards,
> > imnotarobot
>
> Your configuration is correct, but keep in mind scores of all kinds
> will be added together to determine where the final placement is.
>
> In this case, I'd check that you don't have any constraints with a
> higher score preferring the other node. For example, if you
> previously?
> did a "move" or "ban" from the command line, that adds a constraint
> that has to be removed manually if you no longer want it.
> --?
> Ken Gaillot 
>
>
> >>>>>>>>>>
> I'm confused. constraint from what I think means there's a preferred
> node. But if I want my resources not to have a preferred node is that
> possible?
>
> Regards,
> imnotarobot

Yes, that's one type of constraint -- but you may not have realized you
added one if you ran something like "pcs resource move", which is a way
of saying there's a preferred node.

There are a variety of other constraints. For example, as you add more
resources, you might say that resource A can't run on the same node as
resource B, and if that constraint's score is higher than the
stickiness, A might move if B starts on its node.

To see your existing constraints using pcs, run "pcs constraint show".
If there are any you don't want, you can remove them with various pcs
commands.
-- 
Ken Gaillot 


>>>>>>>>>>
Correct me if I'm wrong. So resource-stickiness policy can not be used
alone. A constraint configuration should be setup in order to make it work
but will also be dependent on the level of scores that was setup between
the two. Can you suggest what type of constraint configuration should i set
to achieve the simple goal above?

Regards,
imnotarobot
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] Resource-stickiness is not working

2018-06-02 Thread Confidential Company

On Fri, 2018-06-01 at 22:58 +0800, Confidential Company wrote:
> Hi,
>
> I have two-node active/passive setup. My goal is to failover a
> resource once a Node goes down with minimal downtime as possible.
> Based on my testing, when Node1 goes down it failover to Node2. If
> Node1 goes up after link reconnection (reconnect physical cable),
> resource failback to Node1 even though I configured resource-
> stickiness. Is there something wrong with configuration below?
>
> #service firewalld stop
> #vi /etc/hosts --> 192.168.10.121 (Node1) / 192.168.10.122 (Node2) --
> --- Private Network (Direct connect)
> #systemctl start pcsd.service
> #systemctl enable pcsd.service
> #passwd hacluster --> define pw
> #pcs cluster auth Node1 Node2
> #pcs setup --name Cluster Node1 Node2
> #pcs cluster start -all
> #pcs property set stonith-enabled=false
> #pcs resource create ClusterIP ocf:heartbeat:IPaddr2
> ip=192.168.10.123 cidr_netmask=32 op monitor interval=30s
> #pcs resource defaults resource-stickiness=100
>
> Regards,
> imnotarobot

Your configuration is correct, but keep in mind scores of all kinds
will be added together to determine where the final placement is.

In this case, I'd check that you don't have any constraints with a
higher score preferring the other node. For example, if you previously
did a "move" or "ban" from the command line, that adds a constraint
that has to be removed manually if you no longer want it.
-- 
Ken Gaillot 


>>>>>>>>>>
I'm confused. constraint from what I think means there's a preferred node.
But if I want my resources not to have a preferred node is that possible?

Regards,
imnotarobot
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] Resource-stickiness is not working

2018-06-01 Thread Confidential Company

Hi,

I have two-node active/passive setup. My goal is to failover a resource
once a Node goes down with minimal downtime as possible. Based on my
testing, when Node1 goes down it failover to Node2. If Node1 goes up after
link reconnection (reconnect physical cable), resource failback to Node1
even though I configured resource-stickiness. Is there something wrong with
configuration below?

#service firewalld stop
#vi /etc/hosts --> 192.168.10.121 (Node1) / 192.168.10.122 (Node2)
- Private Network (Direct connect)
#systemctl start pcsd.service
#systemctl enable pcsd.service
#passwd hacluster --> define pw
#pcs cluster auth Node1 Node2
#pcs setup --name Cluster Node1 Node2
#pcs cluster start -all
#pcs property set stonith-enabled=false
#pcs resource create ClusterIP ocf:heartbeat:IPaddr2 ip=192.168.10.123
cidr_netmask=32 op monitor interval=30s
#pcs resource defaults resource-stickiness=100

Regards,
imnotarobot
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] ethmonitor is not working

2018-06-01 Thread Confidential Company

Hi,

I have two node active/passive setup. This is my configuration:

#service firewalld stop
#vi /etc/hosts --> 192.168.2.121 (Node1) / 192.168.2.122 (Node2)
- Private Network (Direct connect)
#systemctl start pcsd.service
#systemctl enable pcsd.service
#passwd hacluster --> define pw
#pcs cluster auth Node1 Node2
#pcs setup --name Cluster Node1 Node2
#pcs cluster start -all
#pcs property set stonith-enabled=false
#pcs resource create ClusterIP ocf:heartbeat:IPaddr2 ip=192.168.10.123
cidr_netmask=32 op monitor interval=30s
#pcs resource defaults resource-stickiness=100
#pcs resource create ens192-monitor ethmonitor interface=ens192 --clone
#pcs constraint location ClusterIP rule score=-INFINITY ethmonitor-ens192
ne 1


My goal is to have two interfaces, service network(ens192) and heartbeat
network(ens224). Based on my research, ethmonitor is used to monitor the
interface via pacemaker, if there's a failed link, resources will failover
to other Node. My problem is no error were prompted via pcs status,
resource does not failover and service(ClusterIP) goes down.

Testing scenario:
1. Disconnected physicall Link of Node1 > No error appears on pcs
status and ClusterIP is not reachable as failover to other node does not
happen.

Any ideas? Is this a bug? Are there any missing configurations?

Regards,
imnotarobot
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] ethmonitor RA agent error. How can I fix this? (RHEL)

2018-05-21 Thread Confidential Company

I have two Virtual machines with two network interfaces.

See configuration below:

*eth0 - service network
*eth1 - heartbeat network

*vi /etc/hosts - RhelA(ip of eth1) / RhelB(ip of eth1)
*service firewalld stop

*pcs resource create VirtualIP ocf:heartbeat:IPaddr2 ip=(virtual ip)
cidr_netmas=32 op monitor interval =30s

*pcs resource create eth1-monitor ethmonitor interface=eth1 --clone

*pcs constraint location VirtualIP rule score=-INFINITY ethmonitor-ens192
ne 1

+++

I tried to ifdown eth0(service network). The result is:

1. VirtualIP resource switched to Node2
2. Got an error from pcs status, Error "unable to find nic"
3. Even after a successful failover, error still exist

Since it automatically switced to Node2, my goal is to failover again to
Node1. This is what I did:

1. Enable eth0 of node1, wait for 15 seconds.
2. Disable eth0 of node2
2. VirtualIP resource got stopped
3. Even after enabling eth0 of node1, error from previous procedure still
exist.
4. Got an additional error, I have two errors now
5. VirtualIP resource doesn't start


Regards,

imnotarobot
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] ethmonitor configuration

2018-05-17 Thread Confidential Company

Hi,

1. What are the configurations on ethmonitor if I want to setup a two nic
(data and heartbeat), I have virtualIP resource, and this virtualIP should
only run on a node the ethmonitor resource tells the ethernet device is up
on?

2. What is the minimum monitoring interval of ethmonitor RA?

Regards,

imnotarobot
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] Two-node cluster fencing

2018-05-13 Thread Confidential Company

Message: 2
Date: Sun, 13 May 2018 10:20:36 +0300
From: Andrei Borzenkov 
To: users@clusterlabs.org
Subject: Re: [ClusterLabs] Two-node cluster fencing
Message-ID: 
Content-Type: text/plain; charset=utf-8

12.05.2018 07:31, Confidential Company ?:
> Hi,
>
> This is my setup:
>
> 1. I have Two vMware-ESXI hosts with one virtual machine (RHEL 7.4) on
each.
> 2. On my physical machine, I have four vmnic --> vmnic 0,1 for uplink
going
> to switchA and switchB --> vmnic 2,3 for heartbeat corosync traffic
(direct
> connect to other ESXI host)
> 3. I plan on clustering my two virtual machines via corosync and create a
> virtual-IP via pacemaker.
> 4. I plan on using the uplink interface for data and totem interface for
> corosync packets(heartbeat messages).
> 5. These two virtual machines doesnt need for a shared storage, or a
shared
> LUN because the application is, by nature, a standalone application that
> doesnt need to have a centralized location as it does not store any data
> that needs to be synchronized between two servers.

>Then why you need fialover cluster in the first place? Why cannot you
have both nodes running at the same time and use virtual server to let
clients connect to either of them?


*Reply to Andrei: *I want to it to be simple as possible. If I setup them
as active/active, i need a loadbalancer (nginx I assume). Or maybe you have
suggestions. Thanks bro




> 6. I have a PC that only needs to contact the Virtual IP of the rhel
> virtual servers.
> 7. Seamless failover from primary to secondary is not required.> 8.
Active/Passive setup
>
>
> Given the setup above,
> 1. Is there any drawbacks?
> 2. Do I need fencing? Can you explain me by giving a scenario on the above
> setup? What instances will occur if I didnt put a fence device?
> 3. If I need a fence device? what fence device you recommend? SAN, vmWare,
> or PDU?
>
>
> Thanks,
>
> imnotarobot
>
>
>
> ___
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] Two-node cluster fencing

2018-05-13 Thread Confidential Company

Hi Casey,

1. I tried shutting down my VM while testing, the "ClusterIP" resource
switched automatically onto the standy node(Node2)
2. I do "systemctl enable corosnyc/pacemaker" - so that after reboot,
corosync and pacemaker will automatically start.
3. As I turn-on Node1, I experienced downtime (maybe syncing of nodes will
result of downtime), but my cluster still works as expected --> The active
node is still Node2.
4. If I choose ESXI as my fence device, if the physical server goes down,
would it still be reasonable because its on one host?

Thanks Casey, I want to understand more about fencing.

Regards,

imnotarobot

>>Without fencing, if the primary is powered off abruptly (e.g. if one of
your ESX servers crashes), the standby will not become primary, and you
will need to promote it manually.  We had exactly this scenario happen last
week with a 2-node cluster.  Without fencing, you don't have high
availability.  If you don't need high availability, you probably don't need
pacemaker.

There are instructions for setting up fencing with vmware here:
https://www.hastexo.com/resources/hints-and-kinks/fencing-vmware-virtualized-pacemaker-nodes/

One note - rather than the SDK, I believe you actually need the CLI
package, which can be found here:
https://my.vmware.com/web/vmware/details?downloadGroup=VCLI600&productId=491

Good luck - I haven't managed to get it to build yet - vmware gives you a
black box installer script that compiles a bunch of dependent perl modules,
and it ends up getting hung with 100% CPU usage for days - digging into
this further with lsof and friends, it seems to be prompting for where your
apache source code is to compile mod_perl.  Why does it need mod_perl for
the CLI??  Anyways, I haven't managed to get past that roadblock yet.  I'm
using Ubuntu 16 so it may happen to just work better on your RHEL
instances.  If you have a different ESX version than 6.0, you may have
better luck as well.

Best wishes,
-- 
Casey

> On May 11, 2018, at 10:31 PM, Confidential Company 
wrote:
>
> Hi,
>
> This is my setup:
>
> 1. I have Two vMware-ESXI hosts with one virtual machine (RHEL 7.4) on
each.
> 2. On my physical machine, I have four vmnic --> vmnic 0,1 for uplink
going to switchA and switchB --> vmnic 2,3 for heartbeat corosync traffic
(direct connect to other ESXI host)
> 3. I plan on clustering my two virtual machines via corosync and create a
virtual-IP via pacemaker.
> 4. I plan on using the uplink interface for data and totem interface for
corosync packets(heartbeat messages).
> 5. These two virtual machines doesnt need for a shared storage, or a
shared LUN because the application is, by nature, a standalone application
that doesnt need to have a centralized location as it does not store any
data that needs to be synchronized between two servers.
> 6. I have a PC that only needs to contact the Virtual IP of the rhel
virtual servers.
> 7. Seamless failover from primary to secondary is not required.
> 8. Active/Passive setup
>
>
> Given the setup above,
> 1. Is there any drawbacks?
> 2. Do I need fencing? Can you explain me by giving a scenario on the
above setup? What instances will occur if I didnt put a fence device?
> 3. If I need a fence device? what fence device you recommend? SAN,
vmWare, or PDU?
>
>
> Thanks,
>
> imnotarobot
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] Two-node cluster fencing

2018-05-11 Thread Confidential Company

Hi,

This is my setup:

1. I have Two vMware-ESXI hosts with one virtual machine (RHEL 7.4) on each.
2. On my physical machine, I have four vmnic --> vmnic 0,1 for uplink going
to switchA and switchB --> vmnic 2,3 for heartbeat corosync traffic (direct
connect to other ESXI host)
3. I plan on clustering my two virtual machines via corosync and create a
virtual-IP via pacemaker.
4. I plan on using the uplink interface for data and totem interface for
corosync packets(heartbeat messages).
5. These two virtual machines doesnt need for a shared storage, or a shared
LUN because the application is, by nature, a standalone application that
doesnt need to have a centralized location as it does not store any data
that needs to be synchronized between two servers.
6. I have a PC that only needs to contact the Virtual IP of the rhel
virtual servers.
7. Seamless failover from primary to secondary is not required.
8. Active/Passive setup


Given the setup above,
1. Is there any drawbacks?
2. Do I need fencing? Can you explain me by giving a scenario on the above
setup? What instances will occur if I didnt put a fence device?
3. If I need a fence device? what fence device you recommend? SAN, vmWare,
or PDU?


Thanks,

imnotarobot
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] ping Resource Agent doesnt work

Re: [ClusterLabs] Weird Fencing Behavior

Re: [ClusterLabs] Weird Fencing Behavior

[ClusterLabs] Weird Fencing Behavior

[ClusterLabs] Weird Fencing Behavior?

[ClusterLabs] What triggers fencing?

Re: [ClusterLabs] What triggers fencing?

Re: [ClusterLabs] What triggers fencing?

[ClusterLabs] What triggers fencing?

Re: [ClusterLabs] Resource-stickiness is not working

[ClusterLabs] Resource-stickiness is not working

[ClusterLabs] Resource-stickiness is not working

[ClusterLabs] ethmonitor is not working

[ClusterLabs] ethmonitor RA agent error. How can I fix this? (RHEL)

[ClusterLabs] ethmonitor configuration

[ClusterLabs] Two-node cluster fencing

[ClusterLabs] Two-node cluster fencing

[ClusterLabs] Two-node cluster fencing

18 matches

Site Navigation

Mail list logo

Footer information