Ok I have now completed testing the bionic-proposed keepalived package with Openstack Queens and am happy that it resolves the problem of ensuring that keepalived will teardown routes, vips, evips etc when it comes back up and transitions from master to backup. My test comprised of deploying Queens with 3 gateways, creating 100 users/projects each with 1 router, creating some instances with floating ips then forcibly killing both the keepalived and neutron-keepalived-state-change processes associated with a particular router for which i have an instance with a fip. I then observed that the qrouter ns interfaces for that router were definitely unconfigured and the vrrp transition happened as expected. This is in contrast to e.g. keepalived 1:1.2.19-1ubuntu0.2 available with all Xenial releases of Openstack for which I consistently see the qrouter interfaces remain configured on > 1 gateway.
For completeness (although not having any bearing on the keepalived fix) I also still see the other issue remain for bionic whereby in neutron the router is listed as being active on > 1 host e.g. (truncating so that it will display properly) +-//---------------------------+---------+----------------+-------+----------+ | // id | host | admin_state_up | alive | ha_state | +-//---------------------------+---------+----------------+-------+----------+ | //901-4edd-86fb-8dbfe7373255 | crustle | True | :-) | active | | //961-4318-9743-775ebc9b0067 | chespin | True | :-) | active | | //628-4c2e-8e91-c309e4477c75 | orgen | True | :-) | standby | +-//---------------------------+---------+----------------+-------+----------+ The reason for this is simple and the good news is that with the fixed keepalived it is also benign. Neutron detects state changes by running ip monitor on the qrouter interfaces and since my test involved killing both neutron-keepalived-state-change (that runs ip monitor) and keepalived, the vrrp transition appears to have happened before neutron had ip monitor running again. Looking at the l3-agent logs is see: 2018-07-25 10:19:33.636 14018 WARNING neutron.agent.linux.external_process [-] Respawning keepalived for uuid 75d24bfb-9807-4216-af4a-3aac37cf2417 2018-07-25 10:19:33.638 14018 DEBUG neutron.agent.linux.utils [-] Running command: ['sudo', '/usr/bin/neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'netns', 'exec', 'qrouter-75d24bfb-9807-4216-af4a-3aac37cf2417', 'keepalived', '-P', '-f', '/var/lib/neutron/ha_confs/ 2018-07-25 10:19:33.886 14018 DEBUG neutron.agent.linux.utils [-] Running command: ['sudo', '/usr/bin/neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'netns', 'exec', 'qrouter-75d24bfb-9807-4216-af4a-3aac37cf2417', 'neutron-keepalived-state-change', '--router_id=75d24 i.e. neutron starts keepalived BEFORE keepalived-state-change so if the transition and teardown happens prior to the latter coming up and launching ip monitor it never sees the changes and has nothing to report to neutron. ** Tags removed: verification-needed-bionic ** Tags added: verification-done-bionic -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1744062 Title: [SRU] L3 HA: multiple agents are active at the same time To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-archive/+bug/1744062/+subscriptions -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
