Openshift, vip-manager, and DHCP

Skarbek, John Thu, 08 Sep 2016 05:50:07 -0700

Good Morning,

I’m curious if anyone is successfully running openshift in an environment where 
they manage their own dhcp clients and scopes. Our infrastructure recently had 
an issue and we are struggling to find a root cause. In our environment we run 
two vip-manager POD’s which manages 2 ip addresses.


One of our suspicions has led us to believe that keepalived doesn’t play nice 
with dhcp. As an example, if the dhcp client dies or renews it’s IP address, 
the vip-manager POD recognizes this event. He logs the VIP he’s managing as 
well as the IP assigned to the node is removed, however, keepalived continues 
to send out the VRRP’s as if he’s still MASTER for that IP.

This puts us in a bad spot, as the BACKUP keepalived never takes this IP 
address over and this IP is no longer assigned to anything. Here’s example log 
output from the POD that I forced this failure:

10.0.0.1 == address assigned to node via DHCP

10.0.0.2 == address assigned to vip_manager_VIP_1

10.0.0.3 == address assigned to vip_manager_VIP_2

10.1.4.1 == lbr0/tun0

  - Loading ip_vs module ...
  - Checking if ip_vs module is available ...
ip_vs                 140944  0
  - Module ip_vs is loaded.
  - Generating and writing config to /etc/keepalived/keepalived.conf
  - Starting failover services ...
Starting Healthcheck child process, pid=136
Initializing ipvs 2.6
Starting VRRP child process, pid=137
Netlink reflector reports IP 10.0.0.1 added
Netlink reflector reports IP 10.0.0.1 added
Netlink reflector reports IP 10.1.4.1 added
Netlink reflector reports IP 10.1.4.1 added
Netlink reflector reports IP 10.1.4.1 added
Netlink reflector reports IP 10.1.4.1 added
Registering Kernel netlink reflector
Registering Kernel netlink reflector
Registering Kernel netlink command channel
Registering Kernel netlink command channel
Registering gratuitous ARP shared channel
Opening file '/etc/keepalived/keepalived.conf'.
Opening file '/etc/keepalived/keepalived.conf'.
Configuration is using : 8733 Bytes
Truncating auth_pass to 8 characters
Truncating auth_pass to 8 characters
Configuration is using : 73522 Bytes
Using LinkWatch kernel netlink reflector...
VRRP_Instance(vip_manager_VIP_1) Entering BACKUP STATE
VRRP sockpool: [ifindex(2), proto(112), unicast(0), fd(9,10)]
VRRP_Instance(vip_manager_VIP_2) Transition to MASTER STATE
VRRP_Instance(vip_manager_VIP_2) Entering FAULT STATE
VRRP_Script(chk_vip_manager) succeeded
VRRP_Instance(vip_manager_VIP_2) prio is higher than received advert
VRRP_Instance(vip_manager_VIP_2) Transition to MASTER STATE
VRRP_Instance(vip_manager_VIP_2) Received lower prio advert, forcing new 
election
VRRP_Instance(vip_manager_VIP_2) Entering MASTER STATE
VRRP_Instance(vip_manager_VIP_2) setting protocol VIPs.
Netlink reflector reports IP 10.0.0.3 added
VRRP_Instance(vip_manager_VIP_2) Sending gratuitous ARPs on eno16780032 for 
10.0.0.3
VRRP_Instance(vip_manager_VIP_2) Sending gratuitous ARPs on eno16780032 for 
10.0.0.3

...<dhclient renews the ip address>...

Netlink reflector reports IP 10.0.0.1 removed
Netlink reflector reports IP 10.0.0.1 removed
Netlink reflector reports IP 10.0.0.3 removed
Netlink reflector reports IP 10.0.0.3 removed
Netlink reflector reports IP 10.0.0.1 added
Netlink reflector reports IP 10.0.0.1 added


And the other vip-manager pod is still receiving VRRP’s for 10.0.0.3, therefore 
never takes over this IP address, so effectively half of the traffic (pending 
DNS round-robin) is being lost at this point.

Our recovery option at this point is to restart the network, which would stop 
the VRRP packets long enough to cause a failover, or restart the effected POD.

The version of keepalived provided by RHEL is 10 minor revisions behind, I’m 
curious if there may be a benefit to getting this package updated. Pending any 
advice from anyone my next step in troubleshooting this would be to go about 
building my own version of the vip-manager with an upgraded version of 
keepalived to see if this issue continues.


--
John Skarbek

_______________________________________________
users mailing list
[email protected]
http://lists.openshift.redhat.com/openshiftmm/listinfo/users

Openshift, vip-manager, and DHCP

Reply via email to