Moving back to New for neutron for the time being since we think this may be fixed in keepalived.
** Description changed: + [Impact] + This is the same issue reported in https://bugs.launchpad.net/neutron/+bug/1731595, however that is marked as 'Fix Released' and the issue is still occurring and I can't change back to 'New' so it seems best to just open a new bug. It seems as if this bug surfaces due to load issues. While the fix - provided by Venkata (https://review.openstack.org/#/c/522641/) should - help clean things up at the time of l3 agent restart, issues seem to - come back later down the line in some circumstances. xavpaice mentioned - he saw multiple routers active at the same time when they had 464 - routers configured on 3 neutron gateway hosts using L3HA, and each - router was scheduled to all 3 hosts. However, jhebden mentions that - things seem stable at the 400 L3HA router mark, and it's worth noting - this is the same deployment that xavpaice was referring to. + provided by Venkata in https://bugs.launchpad.net/neutron/+bug/1731595 + (https://review.openstack.org/#/c/522641/) should help clean things up + at the time of l3 agent restart, issues seem to come back later down the + line in some circumstances. xavpaice mentioned he saw multiple routers + active at the same time when they had 464 routers configured on 3 + neutron gateway hosts using L3HA, and each router was scheduled to all 3 + hosts. However, jhebden mentions that things seem stable at the 400 L3HA + router mark, and it's worth noting this is the same deployment that + xavpaice was referring to. - It seems to me that something is being pushed to it's limit, and - possibly once that limit is hit, master router advertisements aren't - being received, causing a new master to be elected. If this is the case - it would be great to get to the bottom of what resource is getting - constrained. + keepalived has a patch upstream in 1.4.0 that provides a fix for + removing left-over addresses if keepalived aborts. That patch will be + cherry-picked to Ubuntu keepalived packages. + + [Test Case] + The following SRU process will be followed: + https://wiki.ubuntu.com/OpenStackUpdates + + In order to avoid regression of existing consumers, the OpenStack team + will run their continuous integration test against the packages that are + in -proposed. A successful run of all available tests will be required + before the proposed packages can be let into -updates. + + The OpenStack team will be in charge of attaching the output summary of + the executed tests. The OpenStack team members will not mark + ‘verification-done’ until this has happened. + + [Regression Potential] + The regression potential is lowered as the fix is cherry-picked without change from upstream. In order to mitigate the regression potential, the results of the aforementioned tests are attached to this bug. + + [Discussion] ** Changed in: neutron (Ubuntu) Status: Triaged => New ** Changed in: neutron (Ubuntu) Importance: High => Undecided ** Changed in: neutron (Ubuntu Xenial) Importance: High => Undecided ** Changed in: neutron (Ubuntu Xenial) Status: Triaged => New ** Changed in: neutron (Ubuntu Bionic) Importance: High => Undecided ** Changed in: neutron (Ubuntu Bionic) Status: Triaged => New -- You received this bug notification because you are a member of Ubuntu Server, which is subscribed to keepalived in Ubuntu. https://bugs.launchpad.net/bugs/1744062 Title: [SRU] L3 HA: multiple agents are active at the same time To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-archive/+bug/1744062/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs