Public bug reported: Right now, there's a chance that deleting a port in Neutron with ML2/OVN actually deletes the object from Neutron DB while leaving a stale port in the OVN NB database.
This can happen when deleting a port [0] raises a RowNotFound exception. While it may look like it'd mean that the port didn't exist already in OVN NB truth is that the current port_delete function can throw that exception for different reasons (especially against OVN < 2.10 when Address Sets were used instead of Port Groups). Such exception can be observed for example if some ACL or Address Set doesn't exist [1][2] amongst others. In this case, the revision number of the object will be deleted [3] and the port will be stale forever in OVN NB (it'll be skipped by the maintenance task). One of the main impacts of this issue is that the OVN NB database will grow and have stale objects that are undetected (they'll be detected by the neutron-ovn-db-sync-script) but most importantly, that multiple ports in the same OVN Logical Switch may have the same IP addresses and this cause legitimate ports to be left without Metadata. As per metadata agent code here [4] if more than one port in the same network has the same IP address, a 404 will be returned back to the instance upon requesting metadata. The workaround is running the neutron-db-sync script in repair mode to get rid of the stale ports. A proper fix would involve a better granularity of the exceptions that can happen around a port deletion and acting accordingly upon each of them. In the worst case, we won't be deleting the revision number if the port still exists leaving up to the Maintenance task to fix it later on (< 5 minutes). Ideally, we should identify all possible code paths and delete the port from OVN whenever possible even if some other associated operation fails (with proper logging). Also, this scenario seems to be more likely under a high concurrency of API operations (such as heat) and possibly when Port Groups are not supported by the schema (OVN < 2.10). Danie Alvarez [0] https://github.com/openstack/neutron/blob/99774a0465bce893e0b7178fe83fe1985432c704/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py#L719 [1] https://github.com/openstack/neutron/blob/99774a0465bce893e0b7178fe83fe1985432c704/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py#L680 [2] https://github.com/openstack/neutron/blob/99774a0465bce893e0b7178fe83fe1985432c704/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py#L690 [3] https://github.com/openstack/neutron/blob/99774a0465bce893e0b7178fe83fe1985432c704/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py#L722 [4] https://github.com/openstack/neutron/blob/99774a0465bce893e0b7178fe83fe1985432c704/neutron/agent/ovn/metadata/server.py#L86 ** Affects: neutron Importance: Undecided Status: New ** Tags: ovn ** Tags added: ovn -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1874733 Title: [OVN] Stale ports can be present in OVN NB leading to metadata errors Status in neutron: New Bug description: Right now, there's a chance that deleting a port in Neutron with ML2/OVN actually deletes the object from Neutron DB while leaving a stale port in the OVN NB database. This can happen when deleting a port [0] raises a RowNotFound exception. While it may look like it'd mean that the port didn't exist already in OVN NB truth is that the current port_delete function can throw that exception for different reasons (especially against OVN < 2.10 when Address Sets were used instead of Port Groups). Such exception can be observed for example if some ACL or Address Set doesn't exist [1][2] amongst others. In this case, the revision number of the object will be deleted [3] and the port will be stale forever in OVN NB (it'll be skipped by the maintenance task). One of the main impacts of this issue is that the OVN NB database will grow and have stale objects that are undetected (they'll be detected by the neutron-ovn-db-sync-script) but most importantly, that multiple ports in the same OVN Logical Switch may have the same IP addresses and this cause legitimate ports to be left without Metadata. As per metadata agent code here [4] if more than one port in the same network has the same IP address, a 404 will be returned back to the instance upon requesting metadata. The workaround is running the neutron-db-sync script in repair mode to get rid of the stale ports. A proper fix would involve a better granularity of the exceptions that can happen around a port deletion and acting accordingly upon each of them. In the worst case, we won't be deleting the revision number if the port still exists leaving up to the Maintenance task to fix it later on (< 5 minutes). Ideally, we should identify all possible code paths and delete the port from OVN whenever possible even if some other associated operation fails (with proper logging). Also, this scenario seems to be more likely under a high concurrency of API operations (such as heat) and possibly when Port Groups are not supported by the schema (OVN < 2.10). Danie Alvarez [0] https://github.com/openstack/neutron/blob/99774a0465bce893e0b7178fe83fe1985432c704/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py#L719 [1] https://github.com/openstack/neutron/blob/99774a0465bce893e0b7178fe83fe1985432c704/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py#L680 [2] https://github.com/openstack/neutron/blob/99774a0465bce893e0b7178fe83fe1985432c704/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py#L690 [3] https://github.com/openstack/neutron/blob/99774a0465bce893e0b7178fe83fe1985432c704/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py#L722 [4] https://github.com/openstack/neutron/blob/99774a0465bce893e0b7178fe83fe1985432c704/neutron/agent/ovn/metadata/server.py#L86 To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1874733/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : [email protected] Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp

