Public bug reported: This is the same issue reported in https://bugs.launchpad.net/neutron/+bug/1731595, however that is marked as 'Fix Released' and the issue is still occurring and I can't change back to 'New' so it seems best to just open a new bug.
It seems as if this bug surfaces due to load issues. While the fix provided by Venkata (https://review.openstack.org/#/c/522641/) should help clean things up at the time of l3 agent restart, issues seem to come back later down the line in some circumstances. xavpaice mentioned he saw multiple routers active at the same time when they had 464 routers configured on 3 neutron gateway hosts using L3HA, and each router was scheduled to all 3 hosts. However, jhebden mentions that things seem stable at the 400 L3HA router mark, and it's worth noting this is the same deployment that xavpaice was referring to. It seems to me that something is being pushed to it's limit, and possibly once that limit is hit, master router advertisements aren't being received, causing a new master to be elected. If this is the case it would be great to get to the bottom of what resource is getting constrained. ** Affects: cloud-archive Importance: High Status: Triaged ** Affects: cloud-archive/mitaka Importance: High Status: Triaged ** Affects: cloud-archive/newton Importance: High Status: Triaged ** Affects: cloud-archive/ocata Importance: High Status: Triaged ** Affects: cloud-archive/pike Importance: High Status: Triaged ** Affects: cloud-archive/queens Importance: High Status: Triaged ** Affects: neutron Importance: Undecided Status: New ** Affects: neutron (Ubuntu) Importance: High Status: Triaged ** Affects: neutron (Ubuntu Xenial) Importance: High Status: Triaged ** Affects: neutron (Ubuntu Artful) Importance: High Status: Triaged ** Affects: neutron (Ubuntu Bionic) Importance: High Status: Triaged ** Description changed: This is the same issue as https://bugs.launchpad.net/neutron/+bug/1731595 however that bug is 'Fix Released' and the issue is still occurring. There are a lot of details - in the linked bug so I won't add them here unless it's useful. + in the linked bug so I won't add too many here. + + It seems as if this bug surfaces due to load issues. While the fix + provided by Venkata (https://review.openstack.org/#/c/522641/) should + help clean things up at the time of l3 agent restart, issues seem to + come back later down the line in some circumstances. xavpaice mentioned + he saw multiple routers active at the same time when they had 464 + routers configured on 3 neutron gateway hosts using L3HA, and each + router was scheduled to all 3 hosts. However, jhebden mentions that + things seem stable at the 400 L3HA router mark, and it's worth noting + this is the same deployment that xavpaice was referring to. + + It seems to me that something is being pushed to it's limit, and + possibly once that limit is hit, master router advertisements aren't + being received, causing a new master to be elected. If this is the case + it would be great to get to the bottom of what resource is getting + constrained. ** Also affects: neutron (Ubuntu) Importance: Undecided Status: New ** Description changed: - This is the same issue as - https://bugs.launchpad.net/neutron/+bug/1731595 however that bug is 'Fix - Released' and the issue is still occurring. There are a lot of details - in the linked bug so I won't add too many here. - - It seems as if this bug surfaces due to load issues. While the fix - provided by Venkata (https://review.openstack.org/#/c/522641/) should - help clean things up at the time of l3 agent restart, issues seem to - come back later down the line in some circumstances. xavpaice mentioned - he saw multiple routers active at the same time when they had 464 - routers configured on 3 neutron gateway hosts using L3HA, and each - router was scheduled to all 3 hosts. However, jhebden mentions that - things seem stable at the 400 L3HA router mark, and it's worth noting - this is the same deployment that xavpaice was referring to. - - It seems to me that something is being pushed to it's limit, and - possibly once that limit is hit, master router advertisements aren't - being received, causing a new master to be elected. If this is the case - it would be great to get to the bottom of what resource is getting - constrained. + - ** No longer affects: neutron ** Summary changed: - L3 HA: multiple agents are active at the same time + - ** Changed in: neutron (Ubuntu) Status: New => Incomplete ** Summary changed: - - + L3 HA: multiple agents are active at the same time ** Description changed: - - + This is the same issue reported in + https://bugs.launchpad.net/neutron/+bug/1731595, however that is marked + as 'Fix Released' and the issue is still occurring. + + It seems as if this bug surfaces due to load issues. While the fix + provided by Venkata (https://review.openstack.org/#/c/522641/) should + help clean things up at the time of l3 agent restart, issues seem to + come back later down the line in some circumstances. xavpaice mentioned + he saw multiple routers active at the same time when they had 464 + routers configured on 3 neutron gateway hosts using L3HA, and each + router was scheduled to all 3 hosts. However, jhebden mentions that + things seem stable at the 400 L3HA router mark, and it's worth noting + this is the same deployment that xavpaice was referring to. + + It seems to me that something is being pushed to it's limit, and + possibly once that limit is hit, master router advertisements aren't + being received, causing a new master to be elected. If this is the case + it would be great to get to the bottom of what resource is getting + constrained. ** Changed in: neutron (Ubuntu) Status: Incomplete => Triaged ** Changed in: neutron (Ubuntu) Importance: Undecided => High ** Also affects: neutron Importance: Undecided Status: New ** Also affects: cloud-archive Importance: Undecided Status: New ** Also affects: cloud-archive/queens Importance: Undecided Status: New ** Also affects: cloud-archive/ocata Importance: Undecided Status: New ** Also affects: cloud-archive/pike Importance: Undecided Status: New ** Also affects: cloud-archive/mitaka Importance: Undecided Status: New ** Also affects: cloud-archive/newton Importance: Undecided Status: New ** Also affects: neutron (Ubuntu Xenial) Importance: Undecided Status: New ** Also affects: neutron (Ubuntu Bionic) Importance: High Status: Triaged ** Also affects: neutron (Ubuntu Artful) Importance: Undecided Status: New ** Changed in: cloud-archive/mitaka Importance: Undecided => High ** Changed in: cloud-archive/mitaka Status: New => Triaged ** Changed in: cloud-archive/newton Importance: Undecided => High ** Changed in: cloud-archive/newton Status: New => Triaged ** Changed in: cloud-archive/ocata Importance: Undecided => High ** Changed in: cloud-archive/ocata Status: New => Triaged ** Changed in: cloud-archive/pike Importance: Undecided => High ** Changed in: cloud-archive/pike Status: New => Triaged ** Changed in: cloud-archive/queens Importance: Undecided => High ** Changed in: cloud-archive/queens Status: New => Triaged ** Changed in: neutron (Ubuntu Xenial) Importance: Undecided => High ** Changed in: neutron (Ubuntu Xenial) Status: New => Triaged ** Changed in: neutron (Ubuntu Artful) Importance: Undecided => High ** Changed in: neutron (Ubuntu Artful) Status: New => Triaged ** Description changed: This is the same issue reported in https://bugs.launchpad.net/neutron/+bug/1731595, however that is marked - as 'Fix Released' and the issue is still occurring. + as 'Fix Released' and the issue is still occurring and I can't change + back to 'New' so it seems best to just open a new bug. It seems as if this bug surfaces due to load issues. While the fix provided by Venkata (https://review.openstack.org/#/c/522641/) should help clean things up at the time of l3 agent restart, issues seem to come back later down the line in some circumstances. xavpaice mentioned he saw multiple routers active at the same time when they had 464 routers configured on 3 neutron gateway hosts using L3HA, and each router was scheduled to all 3 hosts. However, jhebden mentions that things seem stable at the 400 L3HA router mark, and it's worth noting this is the same deployment that xavpaice was referring to. It seems to me that something is being pushed to it's limit, and possibly once that limit is hit, master router advertisements aren't being received, causing a new master to be elected. If this is the case it would be great to get to the bottom of what resource is getting constrained. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1744062 Title: L3 HA: multiple agents are active at the same time To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-archive/+bug/1744062/+subscriptions -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
