I just checked if the VIP resource brings up the network device and it turns
out it doesn't.
I created a simple cluster with one VIP resource:
MDA1PFP-S01 09:06:34 2115 0 ~ # pcs cluster setup --name MDA1PFP
Shutting down pacemaker/corosync services...
Redirecting to /bin/systemctl stop pacemaker.service
Redirecting to /bin/systemctl stop corosync.service
Killing any remaining services...
Removing all cluster configuration files...
Synchronizing pcsd certificates on nodes MDA1PFP-PCS01, MDA1PFP-PCS02...
Restaring pcsd on the nodes in order to reload the certificates...
MDA1PFP-S01 09:06:40 2116 0 ~ # pcs cluster start --all
MDA1PFP-PCS01: Starting Cluster...
MDA1PFP-PCS02: Starting Cluster...
MDA1PFP-S01 09:06:41 2117 0 ~ # sleep 5
rm -f mda; pcs cluster cib mda
pcs -f mda property set no-quorum-policy=ignore
pcs -f mda resource create mda-ip ocf:heartbeat:IPaddr2 ip=192.168.120.20
cidr_netmask=32 nic=bond0 op monitor interval=1s
pcs -f mda constraint location mda-ip prefers MDA1PFP-PCS01=50
MDA1PFP-S01 09:06:46 2118 0 ~ # crm_attribute --type nodes --node MDA1PFP-PCS01
--name ServerRole --update PRIME
MDA1PFP-S01 09:06:46 2119 0 ~ # crm_attribute --type nodes --node MDA1PFP-PCS02
--name ServerRole --update BACKUP
MDA1PFP-S01 09:06:46 2120 0 ~ # pcs property set stonith-enabled=false
MDA1PFP-S01 09:06:47 2121 0 ~ # rm -f mda; pcs cluster cib mda
MDA1PFP-S01 09:06:47 2122 0 ~ # pcs -f mda property set no-quorum-policy=ignore
MDA1PFP-S01 09:06:47 2123 0 ~ #
MDA1PFP-S01 09:06:47 2123 0 ~ # pcs -f mda resource create mda-ip
ocf:heartbeat:IPaddr2 ip=192.168.120.20 cidr_netmask=32 nic=bond0 op monitor
MDA1PFP-S01 09:06:47 2124 0 ~ # pcs -f mda constraint location mda-ip prefers
MDA1PFP-S01 09:06:47 2125 0 ~ # pcs cluster cib-push mda
Now, I bring down the network device, wait for the failure and the restart.
MDA1PFP-S01 09:06:48 2126 0 ~ # ifdown bond0
Last updated: Mon Sep 19 09:10:29 2016 Last change: Mon Sep 19
09:07:03 2016 by hacluster via crmd on MDA1PFP-PCS01
Current DC: MDA1PFP-PCS01 (version 1.1.13-10.el7-44eb2dd) - partition with
2 nodes and 1 resource configured
Online: [ MDA1PFP-PCS01 MDA1PFP-PCS02 ]
mda-ip (ocf::heartbeat:IPaddr2): Started MDA1PFP-PCS01
* mda-ip_monitor_1000 on MDA1PFP-PCS01 'not running' (7): call=7,
last-rc-change='Mon Sep 19 09:07:54 2016', queued=0ms, exec=0ms
After the restart ifconfig still shows the device bond0 to be not RUNNING:
MDA1PFP-S01 09:07:54 2127 0 ~ # ifconfig
bond0: flags=5123<UP,BROADCAST,MASTER,MULTICAST> mtu 1500
inet 192.168.120.20 netmask 255.255.255.255 broadcast 0.0.0.0
ether a6:17:2c:2a:72:fc txqueuelen 30000 (Ethernet)
RX packets 2034 bytes 286728 (280.0 KiB)
RX errors 0 dropped 29 overruns 0 frame 0
TX packets 2284 bytes 355975 (347.6 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
virbr0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
inet 192.168.122.1 netmask 255.255.255.0 broadcast 192.168.122.255
ether 52:54:00:74:d9:39 txqueuelen 0 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
Pinging another node via that device fails as expected:
MDA1PFP-S01 09:08:00 2128 0 ~ # ping pf-pep-dev-1
PING pf-pep-dev-1 (192.168.120.1) 56(84) bytes of data.
--- pf-pep-dev-1 ping statistics ---
4 packets transmitted, 0 received, 100% packet loss, time 2999ms
The question is why the monitor operation detects the failure once after
bringing the device down, but then restarts it and does not detect any further
Jens Auer | CGI | Software-Engineer
CGI (Germany) GmbH & Co. KG
Rheinstraße 95 | 64295 Darmstadt | Germany
T: +49 6151 36860 154
Unsere Pflichtangaben gemäß § 35a GmbHG / §§ 161, 125a HGB finden Sie unter
CONFIDENTIALITY NOTICE: Proprietary/Confidential information belonging to CGI
Group Inc. and its affiliates may be contained in this message. If you are not
a recipient indicated or intended in this message (or responsible for delivery
of this message to such person), or you think for any reason that this message
may have been addressed to you in error, you may not use or copy or deliver
this message to anyone else. In such case, you should destroy this message and
are asked to notify the sender by reply e-mail.
Von: Jan Pokorný [jpoko...@redhat.com]
Gesendet: Freitag, 16. September 2016 23:13
Betreff: Re: [ClusterLabs] Virtual ip resource restarted on node with down
On 16/09/16 11:01 -0500, Ken Gaillot wrote:
> On 09/16/2016 10:43 AM, Auer, Jens wrote:
>> thanks for the help.
>>> I'm not sure what you mean by "the device the virtual ip is attached
>>> to", but a separate question is why the resource agent reported that
>>> restarting the IP was successful, even though that device was
>>> unavailable. If the monitor failed when the device was made unavailable,
>>> I would expect the restart to fail as well.
>> I created the virtual ip with parameter nic=bond0, and this is the
>> device I am bringing down and was referring to in my question. I
>> think the current behavior is a little inconsistent. I bring down
>> the device and pacemaker recognizes this and restarts the resource.
>> However, the monitor then should fail again, but it just doesn't
>> detect any problems.
> That is odd. Pacemaker is just acting on what the resource agent
> reports, so the issue will be in the agent.
I'd note that IPaddr2 agent attempts to bring the network interface
(back) up if not already on start so this appears, perhaps against
one's liking and expectations (if putting it down is considered
a sufficiently big hammer to observe a service failover),
as a magic "self-healing" :-)
Would "rmmod <interface-driver-module>" be a better hammer of choice?
Users mailing list: Users@clusterlabs.org
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf