[Yahoo-eng-team] [Bug 1900006] [NEW] Asking for different vGPU types is racey

Sylvain Bauza Thu, 15 Oct 2020 09:46:12 -0700

Public bug reported:

When testing on Victoria virtual GPUs, I wanted to have different types
:


[devices]
enabled_vgpu_types = nvidia-320,nvidia-321

[vgpu_nvidia-320]
device_addresses = 0000:04:02.1,0000:04:02.2

[vgpu_nvidia-321]
device_addresses = 0000:04:02.3


Unfortunately, I saw that only the first type was used.
When restarting the nova-compute service, we got the log :
WARNING nova.virt.libvirt.driver [None req-a23d9cb4-6554-499c-9fcf-d7f9706535ef 
None None] The vGPU type 'nvidia-320' was listed in '[devices] 
enabled_vgpu_types' but no corresponding '[vgpu_nvidia-320]' group or 
'[vgpu_nvidia-320] device_addresses' option was defined. Only the first type 
'nvidia-320' will be used.


It's due to the fact that we call _get_supported_vgpu_types() first when 
creating the libvirt implementation [1] while we only register the new CONF 
options by init_host() [2] which is called after.


[1] 
https://github.com/openstack/nova/blob/90777d790d7c268f50851ac3e5b4e02617f5ae1c/nova/virt/libvirt/driver.py#L418

[2]
https://github.com/openstack/nova/blob/90777d7/nova/compute/manager.py#L1405

A simple fix would just be to make sure we have dynamic options within
_get_supported_vgpu_types()

** Affects: nova
     Importance: Medium
     Assignee: Sylvain Bauza (sylvain-bauza)
         Status: Confirmed


** Tags: libvirt vgpu

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1900006

Title:
  Asking for different vGPU types is racey

Status in OpenStack Compute (nova):
  Confirmed

Bug description:
  When testing on Victoria virtual GPUs, I wanted to have different
  types :

  [devices]
  enabled_vgpu_types = nvidia-320,nvidia-321

  [vgpu_nvidia-320]
  device_addresses = 0000:04:02.1,0000:04:02.2

  [vgpu_nvidia-321]
  device_addresses = 0000:04:02.3

  
  Unfortunately, I saw that only the first type was used.
  When restarting the nova-compute service, we got the log :
  WARNING nova.virt.libvirt.driver [None 
req-a23d9cb4-6554-499c-9fcf-d7f9706535ef None None] The vGPU type 'nvidia-320' 
was listed in '[devices] enabled_vgpu_types' but no corresponding 
'[vgpu_nvidia-320]' group or '[vgpu_nvidia-320] device_addresses' option was 
defined. Only the first type 'nvidia-320' will be used.

  
  It's due to the fact that we call _get_supported_vgpu_types() first when 
creating the libvirt implementation [1] while we only register the new CONF 
options by init_host() [2] which is called after.

  
  [1] 
https://github.com/openstack/nova/blob/90777d790d7c268f50851ac3e5b4e02617f5ae1c/nova/virt/libvirt/driver.py#L418

  [2]
  https://github.com/openstack/nova/blob/90777d7/nova/compute/manager.py#L1405

  A simple fix would just be to make sure we have dynamic options within
  _get_supported_vgpu_types()

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1900006/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1900006] [NEW] Asking for different vGPU types is racey

Reply via email to