Reviewed: https://review.opendev.org/749175 Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=b8695de6da56db42b83b9d9d4c330148766644be Submitter: Zuul Branch: master
commit b8695de6da56db42b83b9d9d4c330148766644be Author: Hemanth Nakkina <[email protected]> Date: Tue Sep 1 09:36:51 2020 +0530 Update pci stat pools based on PCI device changes At start up of nova-compute service, the PCI stat pools are populated based on information in pci_devices table in Nova database. The pools are updated only when new device is added or removed but not on any device changes like device type. If an existing device is configured as SRIOV and nova-compute is restarted, the pci_devices table gets updated but the device is still listed under the old pool in pci_tracker.stats.pool (in-memory object). This patch looks for device type updates in existing devices and updates the pools accordingly. Change-Id: Id4ebb06e634a612c8be4be6c678d8265e0b99730 Closes-Bug: #1892361 ** Changed in: nova Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1892361 Title: SRIOV instance gets type-PF interface, libvirt kvm fails Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) queens series: Triaged Status in OpenStack Compute (nova) rocky series: Triaged Status in OpenStack Compute (nova) stein series: Triaged Status in OpenStack Compute (nova) train series: Triaged Status in OpenStack Compute (nova) ussuri series: Triaged Bug description: When spawning an SR-IOV enabled instance on a newly deployed host, nova attempts to spawn it with an type-PF pci device. This fails with the below stack trace. After restarting neutron-sriov-agent and nova-compute services on the compute node and spawning an SR-IOV instance again, a type-VF pci device is selected, and instance spawning succeeds. Stack trace: 2020-08-20 08:29:09.558 7624 DEBUG oslo_messaging._drivers.amqpdriver [-] received reply msg_id: 6db8011e6ecd4fd0aaa53c8f89f08b1b __call__ /usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py:400 2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [req-e3e49d07-24c6-4c62-916e-f830f70983a2 ddcfb3640535428798aa3c8545362bd4 dd99e7950a5b46b5b924ccd1720b6257 - 015e4fd7db304665ab5378caa691bb8b 015e4fd7db304665ab5378caa691bb8b] [insta nce: 9498ea75-fe88-4020-9a9e-f4c437c6de11] Instance failed to spawn: libvirtError: unsupported configuration: Interface type hostdev is currently supported on SR-IOV Virtual Functions only 2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] Traceback (most recent call last): 2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2274, in _build_resources 2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] yield resources 2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2054, in _build_and_run_instance 2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] block_device_info=block_device_info) 2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 3147, in spawn 2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] destroy_disks_on_failure=True) 2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 5651, in _create_domain_and_network 2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] destroy_disks_on_failure) 2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__ 2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] self.force_reraise() 2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise 2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] six.reraise(self.type_, self.value, self.tb) 2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 5620, in _create_domain_and_network 2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] post_xml_callback=post_xml_callback) 2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 5555, in _create_domain 2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] guest.launch(pause=pause) 2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/guest.py", line 144, in launch 2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] self._encoded_xml, errors='ignore') 2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__ 2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] self.force_reraise() 2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise 2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] six.reraise(self.type_, self.value, self.tb) 2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/guest.py", line 139, in launch 2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] return self._domain.createWithFlags(flags) 2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 186, in doit 2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] result = proxy_call(self._autowrap, f, *args, **kwargs) 2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 144, in proxy_call 2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] rv = execute(f, *args, **kwargs) 2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 125, in execute 2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] six.reraise(c, e, tb) 2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 83, in tworker 2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] rv = meth(*args, **kwargs) 2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] File "/usr/lib/python2.7/dist-packages/libvirt.py", line 1092, in createWithFlags 2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self) 2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] libvirtError: unsupported configuration: Interface type hostdev is currently supported on SR-IOV Virtual Functions only 2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] 2020-08-20 08:29:09.599 7624 INFO nova.compute.manager [req-e3e49d07-24c6-4c62-916e-f830f70983a2 ddcfb3640535428798aa3c8545362bd4 dd99e7950a5b46b5b924ccd1720b6257 - 015e4fd7db304665ab5378caa691bb8b 015e4fd7db304665ab5378caa691bb8b] [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] Terminating instance To reproduce, bring up an instance with an SR-IOV port on a freshly deployed compute: + openstack port create -f value -c id --network testinstance_net --vnic-type=direct --binding-profile type=dict --binding-profile physical_network=physnet2 testinstance_net-port + openstack server create --flavor ce6da933-adc3-4e5f-a688-63b037705729 --image a3580f59-a6c6-41f6-85fa-2fc7277492a1 --nic port-id=547cd89a-3f91-4646-84d9-c9559b497526 --availability-zone nova:foo-compute-host testinstance_vanilla_66016d81-bc32-4def-a7b3-a3a164ca5164 Observe that a PF is getting selected for the sriov nic. From nova-compute.log: <interface type='hostdev' managed='yes'> <mac address='98:03:9b:61:22:e9'/> <source> <address type='pci' domain='0x0000' bus='0xd8' slot='0x00' function='0x1'/> </source> <vlan> <tag id='48'/> </vlan> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> </interface> ... 2020-08-20 08:29:09.056 7624 DEBUG nova.virt.libvirt.vif [req-e3e49d07-24c6-4c62-916e-f830f70983a2 ddcfb3640535428798aa3c8545362bd4 dd99e7950a5b46b5b924ccd1720b6257 - 015e4fd7db304665ab5378caa691bb8b 015e4fd7db304665ab5378caa691bb8b] vif_type=hw_veb ... vif={"profile": {"pci_slot": "0000:d8:00.1", "physical_network": "physnet2", "pci_vendor_info": "15b3:1015"}, "ovs_interfaceid": null, "preserve_on_delete": true, "network": {"bridge": null, "subnets": [{"ips": [{"meta": {}, "version": 4, "type": "fixed", "floating_ips": [], "address": "192.168.0.5"}], "version": 4, "meta": {"dhcp_server": "192.168.0.2"}, "dns": [], "routes": [], "cidr": "192.168.0.0/29", "gateway": {"meta": {}, "version": 4, "type": "gateway", "address": "192.168.0.1"}}], "meta": {"injected": false, "tenant_id": "dd99e7950a5b46b5b924ccd1720b6257", "physical_network": "physnet2", "mtu": 9000}, "id": "60b3001e-21c1-4947-8996-314449f614c060b3001e-21c1-4947-8996-314449f614c0", "label": "net_20Aug-1"}, "devname": "tapf3953098-98", "vnic_type": "direct", "qbh_params": null, "meta": {}, "details": {"port_filter": false, "vlan": "48"}, "address": "98:03:9b:61:22:e9", "active": false, "type": "hw_veb", "id": "f3953098-98f7-4dd1-8b31-11f51a5a760f", "qbg_params": null} virt_type=kvm get_config /usr/lib/python2.7/dist-packages/nova/virt/libvirt/vif.py:572 Device is a PF: # lspci | grep d8:00.1 d8:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx] Also the nova pci_devices table has it's dev_type correctly listed: mysql> select compute_nodes.host, pci_devices.created_at, compute_node_id, address, dev_type, status, pci_devices.dev_id from pci_devices join compute_nodes ON (compute_nodes.id = pci_devices.compute_node_id) where compute_nodes.host = 'foo-compute-host' and pci_devices.dev_type = 'type-PF'; +------------------+---------------------+-----------------+--------------+----------+-----------+------------------+ | host | created_at | compute_node_id | address | dev_type | status | dev_id | +------------------+---------------------+-----------------+--------------+----------+-----------+------------------+ | foo-compute-host | 2020-08-12 17:10:19 | 95 | 0000:19:00.1 | type-PF | available | pci_0000_19_00_1 | | foo-compute-host | 2020-08-12 17:10:19 | 95 | 0000:d8:00.1 | type-PF | available | pci_0000_d8_00_1 | +------------------+---------------------+-----------------+--------------+----------+-----------+------------------+ Restarting services: # systemctl status neutron-sriov-agent.service # systemctl restart neutron-sriov-agent.service Spawning an instance again, it gets allocated a type-VF port (and spawning succeeds): <interface type='hostdev' managed='yes'> <mac address='fa:16:3e:34:d2:99'/> <source> <address type='pci' domain='0x0000' bus='0xd8' slot='0x05' function='0x1'/> </source> <vlan> <tag id='4'/> </vlan> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> </interface> # lspci | grep d8:05.1 d8:05.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx Virtual Function] After spawning an instance, the PF get marked as "unavailable" in the nova db: +------------------+---------------------+---------------------+---------------+-----------------+--------------+----------+-------------+------------------+ | host | created_at | updated_at | instance_uuid | compute_node_id | address | dev_type | status | dev_id | +------------------+---------------------+---------------------+---------------+-----------------+--------------+----------+-------------+------------------+ | foo-compute-host | 2020-08-12 17:10:19 | 2020-08-20 11:45:07 | NULL | 95 | 0000:19:00.1 | type-PF | available | pci_0000_19_00_1 | | foo-compute-host | 2020-08-12 17:10:19 | 2020-08-20 11:46:30 | NULL | 95 | 0000:d8:00.1 | type-PF | unavailable | pci_0000_d8_00_1 | +------------------+---------------------+---------------------+---------------+-----------------+--------------+----------+-------------+------------------+ Software versions: # dpkg -l | grep nova-common ii nova-common 2:17.0.12-0ubuntu1 all OpenStack Compute - common files # dpkg -l | grep libvirt0 ii libvirt0:amd64 4.0.0-1ubuntu8.17 amd64 library for interfacing with different virtualization systems # lsb_release -r Release: 18.04 To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1892361/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : [email protected] Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp

