Re: [ClusterLabs] Virtual ip resource restarted on node with down network device

Auer, Jens Tue, 20 Sep 2016 04:50:39 -0700

Hi,

I've updated to resource-agents 3.9.7 which is the latest stable version, but I 
am still seeing the same issues.
MDA1PFP-S01 11:31:40 2495 130 ~ # yum list resource-agents
Loaded plugins: langpacks, product-id, search-disabled-repos, 
subscription-manager
Installed Packages
resource-agents.x86_64                                                          
                          3.9.7-4.el7                                           
                                         @/resource-agents-3.9.7-4.el7.x86_64

ifdown still shows the same behavior. Initially, I can see two ip addresses 
assigned to device bond0. After doing "ifdown bond0" on the command line, 
Pacemaker restarts the resource "successfully" but does not assign the default 
ip address to the device:
25: bond0: <NO-CARRIER,BROADCAST,MULTICAST,MASTER,UP> mtu 1500 qdisc noqueue 
state DOWN qlen 30000
    link/ether 46:0a:be:70:36:11 brd ff:ff:ff:ff:ff:ff
    inet 192.168.120.20/32 scope global bond0
       valid_lft forever preferred_lft forever

The log says that IPaddr2 assigns 192.168.120.20 to bond0, but nothing else:
Sep 20 11:34:25 MDA1PFP-S01 kernel: bond0: Removing slave eno49
Sep 20 11:34:25 MDA1PFP-S01 kernel: bond0: Releasing active interface eno49
Sep 20 11:34:25 MDA1PFP-S01 kernel: bond0: the permanent HWaddr of eno49 - 
5c:b9:01:9c:e7:fc - is still in use by bond0 - set the HWaddr of eno49 to a 
different address to avoid conflicts
Sep 20 11:34:25 MDA1PFP-S01 kernel: bond0: making interface eno50 the new 
active one
Sep 20 11:34:25 MDA1PFP-S01 kernel: ixgbe 0000:04:00.0: removed PHC on eno49
Sep 20 11:34:25 MDA1PFP-S01 NetworkManager[881]: <info>  (bond0): bond slave 
eno49 was released
Sep 20 11:34:25 MDA1PFP-S01 NetworkManager[881]: <info>  (eno49): released from 
master bond0
Sep 20 11:34:26 MDA1PFP-S01 kernel: bond0: Removing slave eno50
Sep 20 11:34:26 MDA1PFP-S01 kernel: bond0: Releasing active interface eno50
Sep 20 11:34:26 MDA1PFP-S01 kernel: ixgbe 0000:04:00.1: removed PHC on eno50
Sep 20 11:34:26 MDA1PFP-S01 NetworkManager[881]: <info>  (bond0): bond slave 
eno50 was released
Sep 20 11:34:26 MDA1PFP-S01 NetworkManager[881]: <info>  (eno50): released from 
master bond0
Sep 20 11:34:26 MDA1PFP-S01 NetworkManager[881]: <info>  (eno50): link 
disconnected
Sep 20 11:34:26 MDA1PFP-S01 NetworkManager[881]: <info>  (bond0): link 
disconnected
Sep 20 11:34:26 MDA1PFP-S01 avahi-daemon[912]: Withdrawing address record for 
192.168.120.10 on bond0.
Sep 20 11:34:26 MDA1PFP-S01 avahi-daemon[912]: Leaving mDNS multicast group on 
interface bond0.IPv4 with address 192.168.120.10.
Sep 20 11:34:26 MDA1PFP-S01 avahi-daemon[912]: Joining mDNS multicast group on 
interface bond0.IPv4 with address 192.168.120.20.
Sep 20 11:34:26 MDA1PFP-S01 avahi-daemon[912]: Withdrawing address record for 
192.168.120.20 on bond0.
Sep 20 11:34:26 MDA1PFP-S01 avahi-daemon[912]: Leaving mDNS multicast group on 
interface bond0.IPv4 with address 192.168.120.20.
Sep 20 11:34:26 MDA1PFP-S01 avahi-daemon[912]: Interface bond0.IPv4 no longer 
relevant for mDNS.
Sep 20 11:34:26 MDA1PFP-S01 avahi-daemon[912]: Withdrawing address record for 
fe80::5eb9:1ff:fe9c:e7fc on bond0.
Sep 20 11:34:29 MDA1PFP-S01 corosync[30167]: [TOTEM ] Retransmit List: 7e
Sep 20 11:34:29 MDA1PFP-S01 corosync[30167]: [TOTEM ] Retransmit List: 7e
Sep 20 11:34:29 MDA1PFP-S01 corosync[30167]: [TOTEM ] Marking ringid 1 
interface 192.168.120.10 FAULTY
Sep 20 11:34:29 MDA1PFP-S01 corosync[30167]: [TOTEM ] Retransmit List: 7e
Sep 20 11:34:29 MDA1PFP-S01 IPaddr2(mda-ip)[32025]: INFO: IP status = no, 
IP_CIP=
Sep 20 11:34:29 MDA1PFP-S01 crmd[30188]:  notice: Operation mda-ip_stop_0: ok 
(node=MDA1PFP-PCS01, call=9, rc=0, cib-update=17, confirmed=true)
Sep 20 11:34:29 MDA1PFP-S01 IPaddr2(mda-ip)[32072]: INFO: Adding inet address 
192.168.120.20/32 to device bond0
Sep 20 11:34:29 MDA1PFP-S01 IPaddr2(mda-ip)[32072]: INFO: Bringing device bond0 
up
Sep 20 11:34:29 MDA1PFP-S01 kernel: IPv6: ADDRCONF(NETDEV_UP): bond0: link is 
not ready
Sep 20 11:34:29 MDA1PFP-S01 avahi-daemon[912]: Joining mDNS multicast group on 
interface bond0.IPv4 with address 192.168.120.20.
Sep 20 11:34:29 MDA1PFP-S01 avahi-daemon[912]: New relevant interface 
bond0.IPv4 for mDNS.
Sep 20 11:34:29 MDA1PFP-S01 avahi-daemon[912]: Registering new address record 
for 192.168.120.20 on bond0.IPv4.
Sep 20 11:34:29 MDA1PFP-S01 IPaddr2(mda-ip)[32072]: INFO: 
/usr/libexec/heartbeat/send_arp -i 200 -r 5 -p 
/var/run/resource-agents/send_arp-192.168.120.20 bond0 192.168.120.20 auto 
not_used not_used
Sep 20 11:34:29 MDA1PFP-S01 crmd[30188]:  notice: Operation mda-ip_start_0: ok 
(node=MDA1PFP-PCS01, call=10, rc=0, cib-update=18, confirmed=true)

The VIP is reachable locally, but not from other hosts:
MDA1PFP-S01 11:36:12 2526 0 ~ # ping 192.168.120.20
PING 192.168.120.20 (192.168.120.20) 56(84) bytes of data.
64 bytes from 192.168.120.20: icmp_seq=1 ttl=64 time=0.027 ms
64 bytes from 192.168.120.20: icmp_seq=2 ttl=64 time=0.016 ms
64 bytes from 192.168.120.20: icmp_seq=3 ttl=64 time=0.029 ms

MDA1PFP-S02 11:33:31 1273 0 ~ # ping 192.168.120.20
PING 192.168.120.20 (192.168.120.20) 56(84) bytes of data.
>From 192.168.120.11 icmp_seq=10 Destination Host Unreachable
>From 192.168.120.11 icmp_seq=11 Destination Host Unreachable
>From 192.168.120.11 icmp_seq=12 Destination Host Unreachable

Best wishes,
  Jens

--
Jens Auer | CGI | Software-Engineer
CGI (Germany) GmbH & Co. KG
Rheinstraße 95 | 64295 Darmstadt | Germany
T: +49 6151 36860 154
[email protected]
Unsere Pflichtangaben gemäß § 35a GmbHG / §§ 161, 125a HGB finden Sie unter 
de.cgi.com/pflichtangaben.

CONFIDENTIALITY NOTICE: Proprietary/Confidential information belonging to CGI 
Group Inc. and its affiliates may be contained in this message. If you are not 
a recipient indicated or intended in this message (or responsible for delivery 
of this message to such person), or you think for any reason that this message 
may have been addressed to you in error, you may not use or copy or deliver 
this message to anyone else. In such case, you should destroy this message and 
are asked to notify the sender by reply e-mail.

________________________________________
Von: Ken Gaillot [[email protected]]
Gesendet: Montag, 19. September 2016 17:31
An: [email protected]
Betreff: Re: [ClusterLabs] Virtual ip resource restarted on node with down 
network device

On 09/19/2016 10:04 AM, Jan Pokorný wrote:
> On 19/09/16 10:18 +0000, Auer, Jens wrote:
>> Ok, after reading the log files again I found
>>
>> Sep 19 10:03:45 MDA1PFP-S01 crmd[7797]:  notice: Initiating action 3: stop 
>> mda-ip_stop_0 on MDA1PFP-PCS01 (local)
>> Sep 19 10:03:45 MDA1PFP-S01 crmd[7797]:  notice: 
>> MDA1PFP-PCS01-mda-ip_monitor_1000:14 [ ocf-exit-reason:Unknown interface 
>> [bond0] No such device.\n ]
>> Sep 19 10:03:45 MDA1PFP-S01 IPaddr2(mda-ip)[8745]: ERROR: Unknown interface 
>> [bond0] No such device.
>> Sep 19 10:03:45 MDA1PFP-S01 IPaddr2(mda-ip)[8745]: WARNING: [findif] failed
>> Sep 19 10:03:45 MDA1PFP-S01 lrmd[7794]:  notice: mda-ip_stop_0:8745:stderr [ 
>> ocf-exit-reason:Unknown interface [bond0] No such device. ]
>> Sep 19 10:03:45 MDA1PFP-S01 crmd[7797]:  notice: Operation mda-ip_stop_0: ok 
>> (node=MDA1PFP-PCS01, call=16, rc=0, cib-update=49, confirmed=true)
>> Sep 19 10:03:46 MDA1PFP-S01 crmd[7797]:  notice: Transition 3 (Complete=2, 
>> Pending=0, Fired=0, Skipped=0, Incomplete=0, 
>> Source=/var/lib/pacemaker/pengine/pe-input-501.bz2): Complete
>> Sep 19 10:03:46 MDA1PFP-S01 crmd[7797]:  notice: State transition 
>> S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL 
>> origin=notify_crmd ]
>> Sep 19 10:03:46 MDA1PFP-S01 crmd[7797]:  notice: State transition S_IDLE -> 
>> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL 
>> origin=abort_transition_graph ]
>> Sep 19 10:03:46 MDA1PFP-S01 pengine[7796]:  notice: On loss of CCM Quorum: 
>> Ignore
>> Sep 19 10:03:46 MDA1PFP-S01 pengine[7796]: warning: Processing failed op 
>> monitor for mda-ip on MDA1PFP-PCS01: not configured (6)
>> Sep 19 10:03:46 MDA1PFP-S01 pengine[7796]:   error: Preventing mda-ip from 
>> re-starting anywhere: operation monitor failed 'not configured' (6)
>>
>> I think that explains why the resource is not started on the other
>> node, but I am not sure this is a good decision. It seems to be a
>> little harsh to prevent the resource from starting anywhere,
>> especially considering that the other node will be able to start the
>> resource.

The resource agent is supposed to return "not configured" only when the
*pacemaker* configuration of the resource is inherently invalid, so
there's no chance of it starting anywhere.

As Jan suggested, make sure you've applied any resource-agents updates.
If that doesn't fix it, it sounds like a bug in the agent, or something
really is wrong with your pacemaker resource config.

>
> The problem to start with is that based on
>
>> Sep 19 10:03:45 MDA1PFP-S01 IPaddr2(mda-ip)[8745]: ERROR: Unknown interface 
>> [bond0] No such device.
>> Sep 19 10:03:45 MDA1PFP-S01 IPaddr2(mda-ip)[8745]: WARNING: [findif] failed
>
> you may be using too ancient version resource-agents:
>
> https://github.com/ClusterLabs/resource-agents/pull/320
>
> so until you update, the troubleshooting would be quite moot.

_______________________________________________
Users mailing list: [email protected]
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

_______________________________________________
Users mailing list: [email protected]
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Virtual ip resource restarted on node with down network device

Reply via email to