Hi, I just checked if the VIP resource brings up the network device and it turns out it doesn't.
I created a simple cluster with one VIP resource: MDA1PFP-S01 09:06:34 2115 0 ~ # pcs cluster setup --name MDA1PFP MDA1PFP-PCS01,MDA1PFP-S01 MDA1PFP-PCS02,MDA1PFP-S02 Shutting down pacemaker/corosync services... Redirecting to /bin/systemctl stop pacemaker.service Redirecting to /bin/systemctl stop corosync.service Killing any remaining services... Removing all cluster configuration files... MDA1PFP-PCS01: Succeeded MDA1PFP-PCS02: Succeeded Synchronizing pcsd certificates on nodes MDA1PFP-PCS01, MDA1PFP-PCS02... MDA1PFP-PCS01: Success MDA1PFP-PCS02: Success Restaring pcsd on the nodes in order to reload the certificates... MDA1PFP-PCS01: Success MDA1PFP-PCS02: Success MDA1PFP-S01 09:06:40 2116 0 ~ # pcs cluster start --all MDA1PFP-PCS01: Starting Cluster... MDA1PFP-PCS02: Starting Cluster... MDA1PFP-S01 09:06:41 2117 0 ~ # sleep 5 rm -f mda; pcs cluster cib mda pcs -f mda property set no-quorum-policy=ignore pcs -f mda resource create mda-ip ocf:heartbeat:IPaddr2 ip=192.168.120.20 cidr_netmask=32 nic=bond0 op monitor interval=1s pcs -f mda constraint location mda-ip prefers MDA1PFP-PCS01=50 MDA1PFP-S01 09:06:46 2118 0 ~ # crm_attribute --type nodes --node MDA1PFP-PCS01 --name ServerRole --update PRIME MDA1PFP-S01 09:06:46 2119 0 ~ # crm_attribute --type nodes --node MDA1PFP-PCS02 --name ServerRole --update BACKUP MDA1PFP-S01 09:06:46 2120 0 ~ # pcs property set stonith-enabled=false MDA1PFP-S01 09:06:47 2121 0 ~ # rm -f mda; pcs cluster cib mda MDA1PFP-S01 09:06:47 2122 0 ~ # pcs -f mda property set no-quorum-policy=ignore MDA1PFP-S01 09:06:47 2123 0 ~ # MDA1PFP-S01 09:06:47 2123 0 ~ # pcs -f mda resource create mda-ip ocf:heartbeat:IPaddr2 ip=192.168.120.20 cidr_netmask=32 nic=bond0 op monitor interval=1s MDA1PFP-S01 09:06:47 2124 0 ~ # pcs -f mda constraint location mda-ip prefers MDA1PFP-PCS01=50 MDA1PFP-S01 09:06:47 2125 0 ~ # pcs cluster cib-push mda CIB updated Now, I bring down the network device, wait for the failure and the restart. MDA1PFP-S01 09:06:48 2126 0 ~ # ifdown bond0 Last updated: Mon Sep 19 09:10:29 2016 Last change: Mon Sep 19 09:07:03 2016 by hacluster via crmd on MDA1PFP-PCS01 Stack: corosync Current DC: MDA1PFP-PCS01 (version 1.1.13-10.el7-44eb2dd) - partition with quorum 2 nodes and 1 resource configured Online: [ MDA1PFP-PCS01 MDA1PFP-PCS02 ] mda-ip (ocf::heartbeat:IPaddr2): Started MDA1PFP-PCS01 Failed Actions: * mda-ip_monitor_1000 on MDA1PFP-PCS01 'not running' (7): call=7, status=complete, exitreason='none', last-rc-change='Mon Sep 19 09:07:54 2016', queued=0ms, exec=0ms After the restart ifconfig still shows the device bond0 to be not RUNNING: MDA1PFP-S01 09:07:54 2127 0 ~ # ifconfig bond0: flags=5123<UP,BROADCAST,MASTER,MULTICAST> mtu 1500 inet 192.168.120.20 netmask 255.255.255.255 broadcast 0.0.0.0 ether a6:17:2c:2a:72:fc txqueuelen 30000 (Ethernet) RX packets 2034 bytes 286728 (280.0 KiB) RX errors 0 dropped 29 overruns 0 frame 0 TX packets 2284 bytes 355975 (347.6 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 virbr0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500 inet 192.168.122.1 netmask 255.255.255.0 broadcast 192.168.122.255 ether 52:54:00:74:d9:39 txqueuelen 0 (Ethernet) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 Pinging another node via that device fails as expected: MDA1PFP-S01 09:08:00 2128 0 ~ # ping pf-pep-dev-1 PING pf-pep-dev-1 (192.168.120.1) 56(84) bytes of data. ^C --- pf-pep-dev-1 ping statistics --- 4 packets transmitted, 0 received, 100% packet loss, time 2999ms The question is why the monitor operation detects the failure once after bringing the device down, but then restarts it and does not detect any further errors. Best wishes, Jens -- Jens Auer | CGI | Software-Engineer CGI (Germany) GmbH & Co. KG Rheinstraße 95 | 64295 Darmstadt | Germany T: +49 6151 36860 154 jens.a...@cgi.com Unsere Pflichtangaben gemäß § 35a GmbHG / §§ 161, 125a HGB finden Sie unter de.cgi.com/pflichtangaben. CONFIDENTIALITY NOTICE: Proprietary/Confidential information belonging to CGI Group Inc. and its affiliates may be contained in this message. If you are not a recipient indicated or intended in this message (or responsible for delivery of this message to such person), or you think for any reason that this message may have been addressed to you in error, you may not use or copy or deliver this message to anyone else. In such case, you should destroy this message and are asked to notify the sender by reply e-mail. ________________________________________ Von: Jan Pokorný [jpoko...@redhat.com] Gesendet: Freitag, 16. September 2016 23:13 An: users@clusterlabs.org Betreff: Re: [ClusterLabs] Virtual ip resource restarted on node with down network device On 16/09/16 11:01 -0500, Ken Gaillot wrote: > On 09/16/2016 10:43 AM, Auer, Jens wrote: >> thanks for the help. >> >>> I'm not sure what you mean by "the device the virtual ip is attached >>> to", but a separate question is why the resource agent reported that >>> restarting the IP was successful, even though that device was >>> unavailable. If the monitor failed when the device was made unavailable, >>> I would expect the restart to fail as well. >> >> I created the virtual ip with parameter nic=bond0, and this is the >> device I am bringing down and was referring to in my question. I >> think the current behavior is a little inconsistent. I bring down >> the device and pacemaker recognizes this and restarts the resource. >> However, the monitor then should fail again, but it just doesn't >> detect any problems. > > That is odd. Pacemaker is just acting on what the resource agent > reports, so the issue will be in the agent. I'd note that IPaddr2 agent attempts to bring the network interface (back) up if not already on start so this appears, perhaps against one's liking and expectations (if putting it down is considered a sufficiently big hammer to observe a service failover), as a magic "self-healing" :-) Would "rmmod <interface-driver-module>" be a better hammer of choice? -- Jan (Poki) _______________________________________________ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org