** Description changed:

  [ Impact ]
  
  Starting with libvirt 9.5.0, the default behavior for TAP devices changed; it 
now expects to manage the creation and lifecycle of the TAP device itself.
  In the case of using OpenStack with Calico networking, Calico is designed to 
pre-create the TAP device before handing it off to libvirt. Because libvirt 
10.0.0 tries to "own" the device Nova already created, it fails to launch the 
instance.
  
  The patch explicitly adds managed="no" to the interface configuration in
  the libvirt domain XML. This tells libvirt to skip its management
  attempt and simply use the device provided by Nova, restoring the
  intended workflow.
  
  [ Test Plan ]
  
- Ubuntu does not have support for Neutron Calico (it isn't packaged) but
- to test and verify this issue we don't need it because it is just a
- matter of adding a tap device to an existing vm which we can do
- manually.
+ Ubuntu does not have support for Neutron Calico (it isn't packaged).
+ While it is possible to deploy OpenStack Caracal on Ubuntu Noble and
+ configure it to use Calico, doing so requires additional manual
+ configuration and is significantly more complex.
  
- To test this we can do:
+ Calico uses the tap VIF type. In Nova, this causes the networking code
+ to follow a legacy, non-OpenStack-specific code path, which is the area
+ affected by this patch.
  
-  * Deploy Openstack with Neutron OVN
  
-  * Create a guest vm with one port (the tap device will be created by
- libvirt)
+ A way to validate this patch is as follows.
  
- # openstack server create --flavor m1.tiny --image cirros --network
- test-net test-vm
+ 1. Deploy OpenStack with ML2/OVS
  
-  * Stop the vm
+ This is important because OVS allows the vif_type of a port to be
+ modified directly in the Neutron database. Other backends, such as OVN,
+ will overwrite the vif_type and force it back to ovs.
  
- # openstack server stop test-vm
- or
- # virsh shutdown instance-00000001
+ 2. Create a Neutron port and note the returned port ID
  
-  * Manually create a new tap device and add it to the vm libvirt xml
+ $ openstack port create --network <network_id> my-tap-port
  
- # sudo ip tuntap add dev tap1 mode tap
- # sudo ip link set tap1 up
- # virsh edit instance-00000001
+ 3. Modify the port binding
  
- On the <interface> section replace with  <target dev='tap1'/>
+ Connect to the MySQL database and select the Neutron database:
  
-  * Start the vm
+ mysql> USE neutron;
  
- # virsh start instance-00000001
- error: Failed to start domain 'instance-00000001'
- error: Requested operation is not valid: The tap1 interface already exists
+ Update the port binding to use the tap VIF type. Replace the host and
+ port_id values with those appropriate for your environment:
  
- * Without the patch this will cause an error but with the patch it
- should work.
+ mysql> UPDATE ml2_port_bindings
+        SET vif_type = 'tap',
+        host = 'juju-adc18e-flamingo-ovs2-9',
+        status = 'ACTIVE'
+        WHERE port_id = 'b6953c43-05a9-41fc-848a-788493a2197f';
  
- * With the patch, the VM can boot successfully.
+ Verify the update:
  
- This patch explicitly adds managed="no" to the interface configuration
- in the libvirt domain XML. This tells libvirt to skip its management
- attempt and simply use the device provided by Nova, restoring the
- intended workflow.
+ mysql> SELECT port_id, host, vif_type
+        FROM ml2_port_bindings
+        WHERE port_id = 'b6953c43-05a9-41fc-848a-788493a2197f';
  
- We will do this manually and edit the XML again and add managed = no.
  
- Now try to start the VM again;
- # virsh start instance-00000001
- Domain 'instance-00000001' started
+ 4. Launch a VM using the newly created TAP port
+ 
+ $ openstack server create   --flavor m1.small   --image cirros-0.4.0
+ --port b6953c43-05a9-41fc-848a-788493a2197f  --key-name mykey   my-
+ forced-tap-vm
+ 
+ 
+ # Expected Results
+ 
+ ## Without the patch
+ 
+ The instance enters the ERROR state and fails to boot.
+ Nova logs contain an error similar to:
+ ERROR nova.virt.libvirt.guest libvirt.libvirtError: Requested operation is 
not valid: The tapc5df06f4-2d interface already exists
+ 
+ 
+ ## With the patch
+ 
+ The instance boots successfully and transitions to the ACTIVE state.
+ 
+ This was verified on Noble/Caracal and Noble/Flamingo, both deployed
+ with OVS.
  
  [Where problems could occur]
  
  This change specifically targets the XML generation for TAP interfaces.
  Since Noble requires libvirt >= 10.0.0, we are not worried about backwards 
compatibility with extremely old libvirt versions that might not recognize the 
attribute.
  
  [ Other Info ]
  
  The bug has been reported upstream:
  https://bugs.launchpad.net/nova/+bug/2033681
  
  This fix is already merged upstream in Nova (see:
  https://review.opendev.org/c/openstack/nova/+/967570) and is required
  for Nova to function correctly on any distribution using libvirt 9.5.0
  or newer, which includes Ubuntu Noble.
  
  [ Old description ]
  Description
  ===========
  Calico (out of tree) uses vif type tap. But libvirt doesn't like pre-existing 
tap devices https://github.com/libvirt/libvirt/commit/a2ae3d299cf from libvirt 
9.5.0. This causes openstack clusters that run calico networking backend to 
fail during instance creation.
  
  Steps to reproduce
  ==================
  
  Expected result
  ===============
  The VM is able to boot without any problems
  
  Actual result
  
  Other information
  =================
  
  13:34:38 < sean-k-mooney> calico is apparently still using vif type tap
  
https://github.com/projectcalico/calico/blob/cf7fa35475eba84f5afcd7f53ac7d07dcb403202/networking-
  calico/networking_calico/plugins/ml2/drivers/calico/test/lib.py#L66C31-L66C34
  
  13:35:06 < sean-k-mooney> vif type tap is not supported by our os-vif code so 
its usign the legacy fallback
  13:35:51 < sean-k-mooney> 
https://github.com/openstack/nova/blob/master/nova/virt/libvirt/vif.py#L595-L596
  13:36:15 < sean-k-mooney> 
https://github.com/openstack/nova/blob/master/nova/virt/libvirt/vif.py#L420-L430
  13:36:48 < sean-k-mooney> 
https://github.com/openstack/nova/blob/master/nova/virt/libvirt/designer.py#L44-L55
  
  13:37:40 < sean-k-mooney> zer0c00l: with that said the tap was always ment to 
be created by libvirt so it sound like calico might have been doing things it 
shoudl not have been
  13:38:03 < zer0c00l> sean-k-mooney: Thanks for looking into this. :(
  13:38:36 < sean-k-mooney> we could proably correct this with a bug fix
  13:38:52 < sean-k-mooney> jsut setting managed='no'
  13:39:13 < sean-k-mooney> here 
https://github.com/openstack/nova/blob/master/nova/virt/libvirt/vif.py#L427
  13:39:54 < sean-k-mooney> the problem is that the there is no way to test 
this really upstream
  13:40:06 < sean-k-mooney> well beyond unit/fucntional tests
  13:40:12 < sean-k-mooney> but we dont have any calico ci
  13:40:37 < sean-k-mooney> calico should be the only backend using vif_type=tap
  13:40:52 < sean-k-mooney> but im not sure if we woudl need a config option in 
the workarounds section for this or not
  
  Potential patch
  ===============
  diff --git a/nova/virt/libvirt/config.py b/nova/virt/libvirt/config.py
  index 47e92e3..5af3ce4 100644
  --- a/nova/virt/libvirt/config.py
  +++ b/nova/virt/libvirt/config.py
  @@ -1749,6 +1749,7 @@
           self.device_addr = None
           self.mtu = None
           self.alias = None
  +        self.managed = 'no'
  
       def __eq__(self, other):
           if not isinstance(other, LibvirtConfigGuestInterface):
  @@ -1851,7 +1852,7 @@
               dev.append(vlan_elem)
  
           if self.target_dev is not None:
  -            dev.append(etree.Element("target", dev=self.target_dev))
  +            dev.append(etree.Element("target", dev=self.target_dev, 
managed=self.managed))
  
           if self.vporttype is not None:
               vport = etree.Element("virtualport", type=self.vporttype)

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2033681

Title:
  Calico still uses vif type tap and it causes failures with libvirt
  9.5.0

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/2033681/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to