Public bug reported: Seems like collateral from https://bugs.launchpad.net/neutron/+bug/1850779
I think this fix causes problems. We have multiple nodes that are DVR_SNAT mode. Snat namespace is scheduled to 1 of them. When l3-agent is restarted on the othre nodes, now, initialize() is invoked always for DvrEdgeRouter which creates the SNAT namespace prematurely. This in turn causes external_gateway_added() to later detect that this host is NOT hosting snat router, but the namespace exists, so it removes it by triggering external_gateway_removed(dvr_edge_router --> dvr_local_router) Problem is that the dvr_local_router code for external_gateway_removed() ends up DELETING the rfp/fpr pair and severs the qrouter connection to fip namespace (and deletes all the FIP routes in fip namespace as a result). Prior to this bug fix, _create_snat_namespace was only invoked in _create_dvr_gateway(), which was only invoked when the node was actually hosting SNAT for the router. Even without the breaking issue of deleting the rtr_2_fip link, this fix unnecessarily creates SNAT namespace on every host, only for it to be deleted. FYI this is for non-HA routers 1. Where the qrouter to FIP link is deleted: https://github.com/openstack/neutron/blob/master/neutron/agent/l3/dvr_local_router.py#L599 This results in connectivity breakage 2. Above #1 is triggered by code here in edge router which sees snat namespace, but SNAT is scheduled to different host: https://github.com/openstack/neutron/blob/master/neutron/agent/l3/dvr_edge_router.py#L56 3. SNAT namespace is created on wrong host because of bug fix for 1850779 which moved it to DvrEdgeRouter intilization ** Affects: neutron Importance: Undecided Status: New ** Tags: l3-dvr-backlog l3-ha -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1926531 Title: SNAT namespace prematurely created then deleted on hosts, resulting in removal of RFP/FPR link to FIP namespace Status in neutron: New Bug description: Seems like collateral from https://bugs.launchpad.net/neutron/+bug/1850779 I think this fix causes problems. We have multiple nodes that are DVR_SNAT mode. Snat namespace is scheduled to 1 of them. When l3-agent is restarted on the othre nodes, now, initialize() is invoked always for DvrEdgeRouter which creates the SNAT namespace prematurely. This in turn causes external_gateway_added() to later detect that this host is NOT hosting snat router, but the namespace exists, so it removes it by triggering external_gateway_removed(dvr_edge_router --> dvr_local_router) Problem is that the dvr_local_router code for external_gateway_removed() ends up DELETING the rfp/fpr pair and severs the qrouter connection to fip namespace (and deletes all the FIP routes in fip namespace as a result). Prior to this bug fix, _create_snat_namespace was only invoked in _create_dvr_gateway(), which was only invoked when the node was actually hosting SNAT for the router. Even without the breaking issue of deleting the rtr_2_fip link, this fix unnecessarily creates SNAT namespace on every host, only for it to be deleted. FYI this is for non-HA routers 1. Where the qrouter to FIP link is deleted: https://github.com/openstack/neutron/blob/master/neutron/agent/l3/dvr_local_router.py#L599 This results in connectivity breakage 2. Above #1 is triggered by code here in edge router which sees snat namespace, but SNAT is scheduled to different host: https://github.com/openstack/neutron/blob/master/neutron/agent/l3/dvr_edge_router.py#L56 3. SNAT namespace is created on wrong host because of bug fix for 1850779 which moved it to DvrEdgeRouter intilization To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1926531/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : [email protected] Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp

