im torn between considering this a wishlist bug or a feature request. i think this is related perhaps to the resource provider mapings
with this configuration [devices] enabled_vgpu_types = nvidia-474,nvidia-475,nvidia-476 [vgpu_nvidia-474] device_addresses = 0000:61:00.4,0000:61:01.0 [vgpu_nvidia-475] device_addresses = 0000:61:01.7 [vgpu_nvidia-476] device_addresses = 0000:61:00.6 i would expect there to be 4 resource providers created each with an inventory of 1 vgpu from the logs below 3cd4dbc7-2c2a-448d-a041-27c8fd685950 7d5abf99-3c42-4c62-ba33-15682c6cfc5b 5e26d9e8-b59a-47b3-879c-c2c50ab7f1f0 58fbbedb-9845-4397-bd20-f559ba68daee can you do an inventory show on each and confirm that. looking at the flavor you appear ot have added the correct trait request to have them target the appropriate rps. the approach you are taking was replaced by the generic mdev feature in xena. https://specs.openstack.org/openstack/nova-specs/specs/xena/implemented/generic-mdevs.html there instead of tagging the rp manually with a trait you would use a different resource case per mdev type. you are essically trying to use this feature https://specs.openstack.org/openstack/nova- specs/specs/stein/approved/vgpu-stein.html but instead of having multiple; physical gpus you are trying to use mig to partition the GPU first into VFs. that was intended to be enabled by https://specs.openstack.org/openstack/nova-specs/specs/ussuri/implemented/vgpu-multiple-types.html however when that feature was implemented no released GPU supported mig or multiple mdev types on the same card. as such it was only ever tested with multiple mdev type on the same host but with one pGUP per mdev_type ** Changed in: nova Importance: Undecided => Wishlist ** Changed in: nova Status: Invalid => Incomplete -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/2008883 Title: VGPU - I can only use one MIG profile in Nova Status in OpenStack Compute (nova): Incomplete Bug description: I have one Nvidia A100 card and can only use one MIG profile. I divided the card into 4 different MIG profiles 2x A100-1-5C, 1x A100-2-10C, 1x A100-3-20C, As below. +-----------------------------------------------------------------------------+ | MIG devices: | +------------------+----------------------+-----------+-----------------------+ | GPU GI CI MIG | Memory-Usage | Vol| Shared | | ID ID Dev | BAR1-Usage | SM Unc| CE ENC DEC OFA JPG| | | | ECC| | |==================+======================+===========+=======================| | 0 2 0 0 | 10MiB / 20096MiB | 42 0 | 3 0 2 0 0 | | | 0MiB / 32767MiB | | | +------------------+----------------------+-----------+-----------------------+ | 0 3 0 1 | 6MiB / 9984MiB | 28 0 | 2 0 1 0 0 | | | 0MiB / 16383MiB | | | +------------------+----------------------+-----------+-----------------------+ | 0 9 0 2 | 4739MiB / 4864MiB | 14 0 | 1 0 0 0 0 | | | 0MiB / 8191MiB | | | +------------------+----------------------+-----------+-----------------------+ | 0 10 0 3 | 4739MiB / 4864MiB | 14 0 | 1 0 0 0 0 | | | 0MiB / 8191MiB | | | +------------------+----------------------+-----------+-----------------------+ My nova configuration: /etc/nova/nova.conf [devices] enabled_vgpu_types = nvidia-474,nvidia-475,nvidia-476 [vgpu_nvidia-474] device_addresses = 0000:61:00.4,0000:61:01.0 [vgpu_nvidia-475] device_addresses = 0000:61:01.7 [vgpu_nvidia-476] device_addresses = 0000:61:00.6 # openstack resource provider list +--------------------------------------+----------------------------------------------+------------+--------------------------------------+--------------------------------------+ | uuid | name | generation | root_provider_uuid | parent_provider_uuid | +--------------------------------------+----------------------------------------------+------------+--------------------------------------+--------------------------------------+ | a0269b89-d43d-4042-a64e-3c832f0bb23f | gpu-a01.example.os-tests.com | 104 | a0269b89-d43d-4042-a64e-3c832f0bb23f | None | | f2e5a4e0-479e-4ee3-b504-36371ded49f5 | gpu-a01.example.os-tests.com_pci_0000_61_01_4 | 1 | a0269b89-d43d-4042-a64e-3c832f0bb23f | a0269b89-d43d-4042-a64e-3c832f0bb23f | | a513c661-6dd2-4462-b719-9fbf7b70c409 | gpu-a01.example.os-tests.com_pci_0000_61_01_2 | 1 | a0269b89-d43d-4042-a64e-3c832f0bb23f | a0269b89-d43d-4042-a64e-3c832f0bb23f | | 9124d3a8-00fb-475e-a0f8-892ccf5d255e | gpu-a01.example.os-tests.com_pci_0000_61_00_7 | 1 | a0269b89-d43d-4042-a64e-3c832f0bb23f | a0269b89-d43d-4042-a64e-3c832f0bb23f | | 5f443da2-3c75-45c6-9d8a-05ca8a487802 | gpu-a01.example.os-tests.com_pci_0000_61_02_1 | 1 | a0269b89-d43d-4042-a64e-3c832f0bb23f | a0269b89-d43d-4042-a64e-3c832f0bb23f | | 20da8814-c5f0-4575-a785-579e9abdbb1d | gpu-a01.example.os-tests.com_pci_0000_61_01_3 | 1 | a0269b89-d43d-4042-a64e-3c832f0bb23f | a0269b89-d43d-4042-a64e-3c832f0bb23f | | 37014baa-fba6-4f14-8be8-17084b3aad36 | gpu-a01.example.os-tests.com_pci_0000_61_01_1 | 1 | a0269b89-d43d-4042-a64e-3c832f0bb23f | a0269b89-d43d-4042-a64e-3c832f0bb23f | | a9e2f509-fb03-45a0-9ff0-7c50143c1a9c | gpu-a01.example.os-tests.com_pci_0000_61_02_2 | 1 | a0269b89-d43d-4042-a64e-3c832f0bb23f | a0269b89-d43d-4042-a64e-3c832f0bb23f | | 202462de-7d01-45c7-b197-1f2ca5c9c7ae | gpu-a01.example.os-tests.com_pci_0000_61_02_3 | 1 | a0269b89-d43d-4042-a64e-3c832f0bb23f | a0269b89-d43d-4042-a64e-3c832f0bb23f | | 5e26d9e8-b59a-47b3-879c-c2c50ab7f1f0 | gpu-a01.example.os-tests.com_pci_0000_61_01_7 | 43 | a0269b89-d43d-4042-a64e-3c832f0bb23f | a0269b89-d43d-4042-a64e-3c832f0bb23f | | 315b5205-26ac-4ec6-b5d2-623cafc18f39 | gpu-a01.example.os-tests.com_pci_0000_61_01_6 | 1 | a0269b89-d43d-4042-a64e-3c832f0bb23f | a0269b89-d43d-4042-a64e-3c832f0bb23f | | 208f395d-d9c7-4108-8f83-e48cbea0b637 | gpu-a01.example.os-tests.com_pci_0000_61_02_0 | 1 | a0269b89-d43d-4042-a64e-3c832f0bb23f | a0269b89-d43d-4042-a64e-3c832f0bb23f | | 7d5abf99-3c42-4c62-ba33-15682c6cfc5b | gpu-a01.example.os-tests.com_pci_0000_61_00_4 | 18 | a0269b89-d43d-4042-a64e-3c832f0bb23f | a0269b89-d43d-4042-a64e-3c832f0bb23f | | 315b2c35-401c-4165-add8-2b025961b9a0 | gpu-a01.example.os-tests.com_pci_0000_61_01_5 | 1 | a0269b89-d43d-4042-a64e-3c832f0bb23f | a0269b89-d43d-4042-a64e-3c832f0bb23f | | c43be42e-e564-46be-9025-4f00a1f7454e | gpu-a01.example.os-tests.com_pci_0000_61_00_5 | 1 | a0269b89-d43d-4042-a64e-3c832f0bb23f | a0269b89-d43d-4042-a64e-3c832f0bb23f | | 3cd4dbc7-2c2a-448d-a041-27c8fd685950 | gpu-a01.example.os-tests.com_pci_0000_61_01_0 | 14 | a0269b89-d43d-4042-a64e-3c832f0bb23f | a0269b89-d43d-4042-a64e-3c832f0bb23f | | 58fbbedb-9845-4397-bd20-f559ba68daee | gpu-a01.example.os-tests.com_pci_0000_61_00_6 | 27 | a0269b89-d43d-4042-a64e-3c832f0bb23f | a0269b89-d43d-4042-a64e-3c832f0bb23f | +--------------------------------------+----------------------------------------------+------------+--------------------------------------+--------------------------------------+ Created Flavor: openstack --os-placement-api-version 1.6 trait create CUSTOM_N_1 openstack --os-placement-api-version 1.6 resource provider trait set --trait CUSTOM_N_1 3cd4dbc7-2c2a-448d-a041-27c8fd685950 openstack flavor create --private --description "vgpu-test" --ram $((8*1024)) --disk 0 --vcpus 8 vgpu-1 --project vgpu --property resources:VGPU=1 --property trait:CUSTOM_N_1=required openstack --os-placement-api-version 1.6 trait create CUSTOM_N_2 openstack --os-placement-api-version 1.6 resource provider trait set --trait CUSTOM_N_2 7d5abf99-3c42-4c62-ba33-15682c6cfc5b openstack flavor create --private --description "vgpu-test" --ram $((8*1024)) --disk 0 --vcpus 8 vgpu-2 --project vgpu --property resources:VGPU=1 --property trait:CUSTOM_N_2=required openstack --os-placement-api-version 1.6 trait create CUSTOM_N_3 openstack --os-placement-api-version 1.6 resource provider trait set --trait CUSTOM_N_3 5e26d9e8-b59a-47b3-879c-c2c50ab7f1f0 openstack flavor create --private --description "vgpu-test" --ram $((8*1024)) --disk 0 --vcpus 8 vgpu-3 --project vgpu --property resources:VGPU=1 --property trait:CUSTOM_N_3=required openstack --os-placement-api-version 1.6 trait create CUSTOM_N_4 openstack --os-placement-api-version 1.6 resource provider trait set --trait CUSTOM_N_4 58fbbedb-9845-4397-bd20-f559ba68daee openstack flavor create --private --description "vgpu-test" --ram $((8*1024)) --disk 0 --vcpus 8 vgpu-4 --project vgpu --property resources:VGPU=1 --property trait:CUSTOM_N_4=required gpu-a01:~/nvidia-dev-ctl# ls /sys/class/mdev_bus/*/mdev_supported_types '/sys/class/mdev_bus/0000:61:00.4/mdev_supported_types': nvidia-468 nvidia-469 nvidia-470 nvidia-471 nvidia-472 nvidia-473 nvidia-474 nvidia-475 nvidia-476 nvidia-477 nvidia-478 nvidia-706 '/sys/class/mdev_bus/0000:61:00.5/mdev_supported_types': nvidia-468 nvidia-469 nvidia-470 nvidia-471 nvidia-472 nvidia-473 nvidia-474 nvidia-475 nvidia-476 nvidia-477 nvidia-478 nvidia-706 '/sys/class/mdev_bus/0000:61:00.6/mdev_supported_types': nvidia-468 nvidia-469 nvidia-470 nvidia-471 nvidia-472 nvidia-473 nvidia-474 nvidia-475 nvidia-476 nvidia-477 nvidia-478 nvidia-706 '/sys/class/mdev_bus/0000:61:00.7/mdev_supported_types': nvidia-468 nvidia-469 nvidia-470 nvidia-471 nvidia-472 nvidia-473 nvidia-474 nvidia-475 nvidia-476 nvidia-477 nvidia-478 nvidia-706 '/sys/class/mdev_bus/0000:61:01.0/mdev_supported_types': nvidia-468 nvidia-469 nvidia-470 nvidia-471 nvidia-472 nvidia-473 nvidia-474 nvidia-475 nvidia-476 nvidia-477 nvidia-478 nvidia-706 '/sys/class/mdev_bus/0000:61:01.1/mdev_supported_types': nvidia-468 nvidia-469 nvidia-470 nvidia-471 nvidia-472 nvidia-473 nvidia-474 nvidia-475 nvidia-476 nvidia-477 nvidia-478 nvidia-706 '/sys/class/mdev_bus/0000:61:01.2/mdev_supported_types': nvidia-468 nvidia-469 nvidia-470 nvidia-471 nvidia-472 nvidia-473 nvidia-474 nvidia-475 nvidia-476 nvidia-477 nvidia-478 nvidia-706 '/sys/class/mdev_bus/0000:61:01.3/mdev_supported_types': nvidia-468 nvidia-469 nvidia-470 nvidia-471 nvidia-472 nvidia-473 nvidia-474 nvidia-475 nvidia-476 nvidia-477 nvidia-478 nvidia-706 '/sys/class/mdev_bus/0000:61:01.4/mdev_supported_types': nvidia-468 nvidia-469 nvidia-470 nvidia-471 nvidia-472 nvidia-473 nvidia-474 nvidia-475 nvidia-476 nvidia-477 nvidia-478 nvidia-706 '/sys/class/mdev_bus/0000:61:01.5/mdev_supported_types': nvidia-468 nvidia-469 nvidia-470 nvidia-471 nvidia-472 nvidia-473 nvidia-474 nvidia-475 nvidia-476 nvidia-477 nvidia-478 nvidia-706 '/sys/class/mdev_bus/0000:61:01.6/mdev_supported_types': nvidia-468 nvidia-469 nvidia-470 nvidia-471 nvidia-472 nvidia-473 nvidia-474 nvidia-475 nvidia-476 nvidia-477 nvidia-478 nvidia-706 '/sys/class/mdev_bus/0000:61:01.7/mdev_supported_types': nvidia-468 nvidia-469 nvidia-470 nvidia-471 nvidia-472 nvidia-473 nvidia-474 nvidia-475 nvidia-476 nvidia-477 nvidia-478 nvidia-706 '/sys/class/mdev_bus/0000:61:02.0/mdev_supported_types': nvidia-468 nvidia-469 nvidia-470 nvidia-471 nvidia-472 nvidia-473 nvidia-474 nvidia-475 nvidia-476 nvidia-477 nvidia-478 nvidia-706 '/sys/class/mdev_bus/0000:61:02.1/mdev_supported_types': nvidia-468 nvidia-469 nvidia-470 nvidia-471 nvidia-472 nvidia-473 nvidia-474 nvidia-475 nvidia-476 nvidia-477 nvidia-478 nvidia-706 '/sys/class/mdev_bus/0000:61:02.2/mdev_supported_types': nvidia-468 nvidia-469 nvidia-470 nvidia-471 nvidia-472 nvidia-473 nvidia-474 nvidia-475 nvidia-476 nvidia-477 nvidia-478 nvidia-706 '/sys/class/mdev_bus/0000:61:02.3/mdev_supported_types': nvidia-468 nvidia-469 nvidia-470 nvidia-471 nvidia-472 nvidia-473 nvidia-474 nvidia-475 nvidia-476 nvidia-477 nvidia-478 nvidia-706 Problem: =========== I can create only two instances with A100-1-5C(nvidia-474) and types nvidia-475, nvidia-476 are omitted and I can't use them. If I edit the nova config and replace in /etc/nova/nova.conf enabled_vgpu_types = nvidia-474,nvidia-475,nvidia-476 on enabled_vgpu_types = nvidia-475,nvidia-476 I will be able to use only one A100-2-10C(nvidia-475) type. Packages: ============ Version: Ussuri gpu-a01:~# lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 20.04.4 LTS Release: 20.04 Codename: focal gpu-a01:~# dpkg -l | grep nova ii nova-common 2:21.2.4-0ubuntu1 all OpenStack Compute - common files ii nova-compute 2:21.2.4-0ubuntu1 all OpenStack Compute - compute node base ii nova-compute-kvm 2:21.2.4-0ubuntu1 all OpenStack Compute - compute node (KVM) ii nova-compute-libvirt 2:21.2.4-0ubuntu1 all OpenStack Compute - compute node libvirt support ii python3-nova 2:21.2.4-0ubuntu1 all OpenStack Compute Python 3 libraries ii python3-novaclient 2:17.0.0-0ubuntu1 all client library for OpenStack Compute API - 3.x gpu-a01:~# uname -a Linux compgpu-a01 5.4.0-122-generic #138-Ubuntu SMP Wed Jun 22 15:00:31 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/2008883/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : [email protected] Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp

