Re: [ClusterLabs] Virtual ip resource restarted on node with down network device

Auer, Jens Tue, 20 Sep 2016 05:30:35 -0700

Hi,

one thing to add is that everything works as expected when I physically unplug 
the network cables to force a failover.


Best wishes,
  Jens

--
Jens Auer | CGI | Software-Engineer
CGI (Germany) GmbH & Co. KG
Rheinstraße 95 | 64295 Darmstadt | Germany
T: +49 6151 36860 154
[email protected]
Unsere Pflichtangaben gemäß § 35a GmbHG / §§ 161, 125a HGB finden Sie unter 
de.cgi.com/pflichtangaben.

CONFIDENTIALITY NOTICE: Proprietary/Confidential information belonging to CGI 
Group Inc. and its affiliates may be contained in this message. If you are not 
a recipient indicated or intended in this message (or responsible for delivery 
of this message to such person), or you think for any reason that this message 
may have been addressed to you in error, you may not use or copy or deliver 
this message to anyone else. In such case, you should destroy this message and 
are asked to notify the sender by reply e-mail.

________________________________________
Von: Auer, Jens [[email protected]]
Gesendet: Dienstag, 20. September 2016 13:44
An: Cluster Labs - All topics related to open-source clustering welcomed
Betreff: Re: [ClusterLabs] Virtual ip resource restarted on node with down 
network device

Hi,

I've decided to create two answers for the two problems. The cluster still 
fails to relocate the resource after unloading the modules even with 
resource-agents 3.9.7
MDA1PFP-S01 11:42:50 2533 0 ~ # yum list resource-agents
Loaded plugins: langpacks, product-id, search-disabled-repos, 
subscription-manager
Installed Packages
resource-agents.x86_64                                                          
                          3.9.7-4.el7                                           
                                         @/resource-agents-3.9.7-4.el7.x86_64

Sep 20 11:42:52 MDA1PFP-S01 crmd[13908]: warning: Action 9 (mda-ip_start_0) on 
MDA1PFP-PCS01 failed (target: 0 vs. rc: 6): Error
Sep 20 11:42:52 MDA1PFP-S01 crmd[13908]: warning: Action 9 (mda-ip_start_0) on 
MDA1PFP-PCS01 failed (target: 0 vs. rc: 6): Error
Sep 20 11:42:52 MDA1PFP-S01 crmd[13908]:  notice: Transition 5 (Complete=3, 
Pending=0, Fired=0, Skipped=0, Incomplete=1, 
Source=/var/lib/pacemaker/pengine/pe-input-552.bz2): Complete
Sep 20 11:42:52 MDA1PFP-S01 pengine[13907]:  notice: On loss of CCM Quorum: 
Ignore
Sep 20 11:42:52 MDA1PFP-S01 pengine[13907]: warning: Processing failed op start 
for mda-ip on MDA1PFP-PCS01: not configured (6)
Sep 20 11:42:52 MDA1PFP-S01 pengine[13907]:   error: Preventing mda-ip from 
re-starting anywhere: operation start failed 'not configured' (6)
Sep 20 11:42:52 MDA1PFP-S01 pengine[13907]: warning: Processing failed op start 
for mda-ip on MDA1PFP-PCS01: not configured (6)
Sep 20 11:42:52 MDA1PFP-S01 pengine[13907]:   error: Preventing mda-ip from 
re-starting anywhere: operation start failed 'not configured' (6)
Sep 20 11:42:52 MDA1PFP-S01 pengine[13907]:  notice: Stop    mda-ip     
(MDA1PFP-PCS01)
Sep 20 11:42:52 MDA1PFP-S01 pengine[13907]:  notice: Calculated Transition 6: 
/var/lib/pacemaker/pengine/pe-input-553.bz2
Sep 20 11:42:52 MDA1PFP-S01 crmd[13908]:  notice: Initiating action 2: stop 
mda-ip_stop_0 on MDA1PFP-PCS01 (local)
Sep 20 11:42:52 MDA1PFP-S01 IPaddr2(mda-ip)[15336]: INFO: IP status = no, 
IP_CIP=
Sep 20 11:42:52 MDA1PFP-S01 lrmd[13905]:  notice: mda-ip_stop_0:15336:stderr [ 
Device "bond0" does not exist. ]
Sep 20 11:42:52 MDA1PFP-S01 crmd[13908]:  notice: Operation mda-ip_stop_0: ok 
(node=MDA1PFP-PCS01, call=18, rc=0, cib-update=48, confirmed=true)
Sep 20 11:42:53 MDA1PFP-S01 corosync[13887]: [TOTEM ] Retransmit List: 93
Sep 20 11:42:53 MDA1PFP-S01 corosync[13887]: [TOTEM ] Retransmit List: 93 96 98
Sep 20 11:42:53 MDA1PFP-S01 corosync[13887]: [TOTEM ] Retransmit List: 93 98 9a 
9c
Sep 20 11:42:53 MDA1PFP-S01 corosync[13887]: [TOTEM ] Marking ringid 1 
interface 192.168.120.10 FAULTY
Sep 20 11:42:53 MDA1PFP-S01 corosync[13887]: [TOTEM ] Retransmit List: 98 9c 9f 
a1
Sep 20 11:42:53 MDA1PFP-S01 crmd[13908]:  notice: Transition 6 (Complete=2, 
Pending=0, Fired=0, Skipped=0, Incomplete=0, 
Source=/var/lib/pacemaker/pengine/pe-input-553.bz2): Complete
Sep 20 11:42:53 MDA1PFP-S01 crmd[13908]:  notice: State transition 
S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL 
origin=notify_crmd ]
Sep 20 11:42:53 MDA1PFP-S01 crmd[13908]:  notice: State transition S_IDLE -> 
S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL 
origin=abort_transition_graph ]
Sep 20 11:42:53 MDA1PFP-S01 pengine[13907]:  notice: On loss of CCM Quorum: 
Ignore
Sep 20 11:42:53 MDA1PFP-S01 pengine[13907]: warning: Processing failed op start 
for mda-ip on MDA1PFP-PCS01: not configured (6)
Sep 20 11:42:53 MDA1PFP-S01 pengine[13907]:   error: Preventing mda-ip from 
re-starting anywhere: operation start failed 'not configured' (6)
Sep 20 11:42:53 MDA1PFP-S01 pengine[13907]: warning: Forcing mda-ip away from 
MDA1PFP-PCS01 after 1000000 failures (max=1000000)
Sep 20 11:42:53 MDA1PFP-S01 pengine[13907]:  notice: Calculated Transition 7: 
/var/lib/pacemaker/pengine/pe-input-554.bz2
Sep 20 11:42:53 MDA1PFP-S01 crmd[13908]:  notice: Transition 7 (Complete=0, 
Pending=0, Fired=0, Skipped=0, Incomplete=0, 
Source=/var/lib/pacemaker/pengine/pe-input-554.bz2): Complete
Sep 20 11:42:53 MDA1PFP-S01 crmd[13908]:  notice: State transition 
S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL 
origin=notify_crmd ]
Sep 20 11:43:02 MDA1PFP-S01 crmd[13908]:  notice: State transition S_IDLE -> 
S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL 
origin=abort_transition_graph ]
Sep 20 11:43:02 MDA1PFP-S01 pengine[13907]:  notice: On loss of CCM Quorum: 
Ignore
Sep 20 11:43:02 MDA1PFP-S01 pengine[13907]: warning: Processing failed op start 
for mda-ip on MDA1PFP-PCS01: not configured (6)
Sep 20 11:43:02 MDA1PFP-S01 pengine[13907]:   error: Preventing mda-ip from 
re-starting anywhere: operation start failed 'not configured' (6)
Sep 20 11:43:02 MDA1PFP-S01 pengine[13907]: warning: Forcing mda-ip away from 
MDA1PFP-PCS01 after 1000000 failures (max=1000000)
Sep 20 11:43:02 MDA1PFP-S01 pengine[13907]:  notice: Calculated Transition 8: 
/var/lib/pacemaker/pengine/pe-input-555.bz2
Sep 20 11:43:02 MDA1PFP-S01 crmd[13908]:  notice: Transition 8 (Complete=0, 
Pending=0, Fired=0, Skipped=0, Incomplete=0, 
Source=/var/lib/pacemaker/pengine/pe-input-555.bz2): Complete
Sep 20 11:43:02 MDA1PFP-S01 crmd[13908]:  notice: State transition 
S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL 
origin=notify_crmd ]

Cheers,
  Jens

--
Jens Auer | CGI | Software-Engineer
CGI (Germany) GmbH & Co. KG
Rheinstraße 95 | 64295 Darmstadt | Germany
T: +49 6151 36860 154
[email protected]
Unsere Pflichtangaben gemäß § 35a GmbHG / §§ 161, 125a HGB finden Sie unter 
de.cgi.com/pflichtangaben.

CONFIDENTIALITY NOTICE: Proprietary/Confidential information belonging to CGI 
Group Inc. and its affiliates may be contained in this message. If you are not 
a recipient indicated or intended in this message (or responsible for delivery 
of this message to such person), or you think for any reason that this message 
may have been addressed to you in error, you may not use or copy or deliver 
this message to anyone else. In such case, you should destroy this message and 
are asked to notify the sender by reply e-mail.

________________________________________
Von: Auer, Jens [[email protected]]
Gesendet: Montag, 19. September 2016 16:36
An: Cluster Labs - All topics related to open-source clustering welcomed
Betreff: Re: [ClusterLabs] Virtual ip resource restarted on node with down 
network device

Hi,

>> After the restart ifconfig still shows the device bond0 to be not RUNNING:
>> MDA1PFP-S01 09:07:54 2127 0 ~ # ifconfig
>> bond0: flags=5123<UP,BROADCAST,MASTER,MULTICAST>  mtu 1500
>>         inet 192.168.120.20  netmask 255.255.255.255  broadcast 0.0.0.0
>>         ether a6:17:2c:2a:72:fc  txqueuelen 30000  (Ethernet)
>>         RX packets 2034  bytes 286728 (280.0 KiB)
>>         RX errors 0  dropped 29  overruns 0  frame 0
>>         TX packets 2284  bytes 355975 (347.6 KiB)
>>         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

There seems to be some difference because the device is not RUNNING;
mdaf-pf-pep-spare 14:17:53 999 0 ~ # ifconfig
bond0: flags=5187<UP,BROADCAST,RUNNING,MASTER,MULTICAST>  mtu 1500
        inet 192.168.120.10  netmask 255.255.255.0  broadcast 192.168.120.255
        inet6 fe80::5eb9:1ff:fe9c:e7fc  prefixlen 64  scopeid 0x20<link>
        ether 5c:b9:01:9c:e7:fc  txqueuelen 30000  (Ethernet)
        RX packets 15455692  bytes 22377220306 (20.8 GiB)
        RX errors 0  dropped 2392  overruns 0  frame 0
        TX packets 14706747  bytes 21361519159 (19.8 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

Also the netmask and the ip address are wrong. I have configured the device to 
192.168.120.10 with netmask 192.168.120.10. How does IpAddr2 get the wrong 
configuration? I have no idea.

>Anyway, you should rather be using "ip" command from iproute suite
>than various if* tools that come short in some cases:
>http://inai.de/2008/02/19
>This would also be consistent with IPaddr2 uses under the hood.

We are using RedHat 7 and this uses either NetworkManager or the network 
scripts. We use the later and ifup/ifdown should be the correct way to use the 
network card. I also tried using ip link set dev bond0 up/down and it brings up 
the device with the correct ip address and network mask.

Best wishes,
  Jens

--
Jens Auer | CGI | Software-Engineer
CGI (Germany) GmbH & Co. KG
Rheinstraße 95 | 64295 Darmstadt | Germany
T: +49 6151 36860 154
[email protected]
Unsere Pflichtangaben gemäß § 35a GmbHG / §§ 161, 125a HGB finden Sie unter 
de.cgi.com/pflichtangaben.

CONFIDENTIALITY NOTICE: Proprietary/Confidential information belonging to CGI 
Group Inc. and its affiliates may be contained in this message. If you are not 
a recipient indicated or intended in this message (or responsible for delivery 
of this message to such person), or you think for any reason that this message 
may have been addressed to you in error, you may not use or copy or deliver 
this message to anyone else. In such case, you should destroy this message and 
are asked to notify the sender by reply e-mail.

________________________________________
Von: Jan Pokorný [[email protected]]
Gesendet: Montag, 19. September 2016 14:57
An: Cluster Labs - All topics related to open-source clustering welcomed
Betreff: Re: [ClusterLabs] Virtual ip resource restarted on node with down 
network device

On 19/09/16 09:15 +0000, Auer, Jens wrote:
> After the restart ifconfig still shows the device bond0 to be not RUNNING:
> MDA1PFP-S01 09:07:54 2127 0 ~ # ifconfig
> bond0: flags=5123<UP,BROADCAST,MASTER,MULTICAST>  mtu 1500
>         inet 192.168.120.20  netmask 255.255.255.255  broadcast 0.0.0.0
>         ether a6:17:2c:2a:72:fc  txqueuelen 30000  (Ethernet)
>         RX packets 2034  bytes 286728 (280.0 KiB)
>         RX errors 0  dropped 29  overruns 0  frame 0
>         TX packets 2284  bytes 355975 (347.6 KiB)
>         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

This seems to suggest bond0 interface is up and address-assigned
(well, the netmask is strange).  So there would be nothing
contradictory to what I said on the address of IPaddr2.

Anyway, you should rather be using "ip" command from iproute suite
than various if* tools that come short in some cases:
http://inai.de/2008/02/19
This would also be consistent with IPaddr2 uses under the hood.

--
Jan (Poki)

_______________________________________________
Users mailing list: [email protected]
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

_______________________________________________
Users mailing list: [email protected]
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

_______________________________________________
Users mailing list: [email protected]
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Virtual ip resource restarted on node with down network device

Reply via email to