Reviewed: https://review.opendev.org/c/openstack/neutron/+/949217 Committed: https://opendev.org/openstack/neutron/commit/e69505d293ef12999f948531acbde75e16a65cd4 Submitter: "Zuul (22348)" Branch: master
commit e69505d293ef12999f948531acbde75e16a65cd4 Author: Bence Romsics <bence.roms...@gmail.com> Date: Wed May 7 16:29:58 2025 +0200 Limit trunk ACTIVE state hack to OVN In https://review.opendev.org/c/openstack/neutron/+/853779 we started moving a trunk to ACTIVE when its parent port went to ACTIVE. The intention was to not leave the trunk in DOWN after a live migration as reported in #1988549. However this had side effects. Earlier we moved a trunk to ACTIVE when all of its ports were processed. That means we unintentionally changed the meaning of the trunk ACTIVE status. This affected all backends and not just live migrate but create too. This change moves the logic of propagating the trunk parent's ACTIVE to the trunk itself to the OVN trunk driver, so we limit the undesired effects to ml2/ovn. By that we restore the original meaning of trunk ACTIVE for all non-OVN backends. Ideally we would want to limit the effect to live migrate (so we don't affect create) but I did not find a way to do that. Change-Id: I4d2c3db355e29fffcce0f50cd12bb1e31d1be43a Closes-Bug: #2095152 Related-Bug: #1988549 Related-Change: https://review.opendev.org/c/openstack/os-vif/+/949736 ** Changed in: neutron Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/2095152 Title: ovs-agent: Leftover tpi/spi interfaces after VM boot/delete with trunk port(s) Status in neutron: Fix Released Status in os-vif: Fix Released Status in os-vif 2024.1 series: Triaged Status in os-vif 2024.2 series: Triaged Status in os-vif 2025.1 series: Triaged Status in os-vif 2025.2 series: Fix Released Bug description: We have seen tpi- and spi- interfaces in ovs not deleted by ovs-agent when they should have been deleted already. At the moment I only have a reproduction based on chance with wildly varying frequency of the error symptoms: ovs-dump() { for bridge in $( sudo ovs-vsctl list-br ) do for port in $( sudo ovs-vsctl list-ports $bridge ) do echo $bridge $port done done | sort } ovs-dump > ovs-state.0 for j in $( seq 1 10 ) do openstack network create tnet0 openstack subnet create --network tnet0 --subnet-range 10.0.100.0/24 tsubnet0 openstack port create --network tnet0 tport0 openstack network trunk create --parent-port tport0 trunk0 tport0_mac="$( openstack port show tport0 -f value -c mac_address )" for i in $( seq 1 30 ) do openstack network create tnet$i openstack subnet create --network tnet$i --subnet-range 10.0.$(( 100 + $i )).0/24 tsubnet$i openstack port create --network tnet$i --mac-address "$tport0_mac" tport$i openstack network trunk set --subport port=tport$i,segmentation-type=vlan,segmentation-id=$(( 100 + $i )) trunk0 done openstack server create --flavor cirros256 --image cirros-0.6.3-x86_64-disk --nic port-id=tport0 tvm0 --wait # Theoretically not needed, but still make sure we don't interrupt anything work in progress to make the repro more uniform. while [ "$( openstack network trunk show trunk0 -f value -c status )" != "ACTIVE" ] do sleep 1 done openstack server delete tvm0 --wait openstack network trunk delete trunk0 openstack port list -f value -c ID -c Name | awk '/tport/ { print $1 }' | xargs -r openstack port delete openstack net list -f value -c ID -c Name | awk '/tnet/ { print $1 }' | xargs -r openstack net delete done sleep 10 ovs-dump > ovs-state.1 diff -u ovs-state.{0,1} One example output with j=1..20 and i=1..30: --- ovs-state.0 2025-01-16 13:31:07.881407421 +0000 +++ ovs-state.1 2025-01-16 14:52:45.323392243 +0000 @@ -8,9 +8,27 @@ br-int qr-88029aef-01 br-int sg-73e24638-69 br-int sg-e45cf925-de +br-int spi-1eeb4ae6-1b +br-int spi-2093a8c2-df +br-int spi-2d9ae883-d9 +br-int spi-3f17d563-cd +br-int spi-9c0d9c98-d8 +br-int spi-a2dc4baf-ef +br-int spi-af2efafa-39 +br-int spi-c14e8bc3-62 +br-int spi-c16959f8-da +br-int spi-e90d4d84-31 br-int tap03961474-06 br-int tap3e6a6311-95 br-int tpi-1f8b5666-bf +br-int tpi-2477b06f-5d +br-int tpi-4421d69a-be +br-int tpi-572a3af8-42 br-int tpi-9cf24ba1-ba +br-int tpi-9e60cb66-5e +br-int tpi-a533a27b-78 +br-int tpi-cddcaa7b-15 +br-int tpi-d7cd2e3e-e6 +br-int tpi-e68ca29d-4d br-physnet0 phy-br-physnet0 br-tun patch-int These ports are not even cleaned up by an ovs-agent restart. During the runs I have not found ERROR messages in ovs-agent logs. The amount of ports left behind varies wildly. I have seen cases when more than 50% of vm start/deletes left behind one tpi port. But I have also seen cases when I had to have ten runs (j=1..10) to see the first leftover interface. This makes me believe there's a causal factor present here (probably timing based) I don't understand and cannot control yet. I want to get back to analyse the root cause, however I hope that first I can find a quicker and more reliable reproduction method so it becomes easier to work with this. devstack 2f3440dc neutron 8cca47f2e7 To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/2095152/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp