[Yahoo-eng-team] [Bug 1986838] [NEW] Booting with two identical PCI aliases on a host with a single matching dev succeeds but the instance will have no PCI allocations

Balazs Gibizer Wed, 17 Aug 2022 09:02:14 -0700

Public bug reported:

Detected during reading the code.


Reproduction
1) configure a host with a single PCI passthrough device
2) configure two PCI aliases (a1, a2) with different names but each matching 
the above device
3) boot an instance with 'pci_passthrough:alias': 'a1:1,a2:1' flavor extra_spec.

Expected result
The instance fails to schedule

Actual result
The instance schedules to the host but has no PCI allocations
The nova scheduler logs:
Selected host: compute1 failed to consume from instance. Error: PCI device 
request 
[InstancePCIRequest(alias_name='a1',count=1,is_new=<?>,numa_policy='legacy',request_id=None,requester_id=<?>,spec=[{product_id='1533',vendor_id='8086'}]),
 
InstancePCIRequest(alias_name='a2',count=1,is_new=<?>,numa_policy='legacy',request_id=None,requester_id=<?>,spec=[{product_id='1533',vendor_id='8086'}])]
 failed

The nova compute logs:
Failed to allocate PCI devices for instance. Unassigning devices back to pools. 
This should not happen, since the scheduler should have accurate information, 
and allocation during claims is controlled via a hold on the compute node 
semaphore.

I think the root cause of the fault is that the
PciDeviceStats.support_requests() [1] call matches each
InstancePCIRequest object independently to the available PCI pools and
does not update the status of the pools locally.

I will push a functional reproduction test shortly.

[1]
https://github.com/openstack/nova/blob/69bc4c38d1c5b98fcbbe8b16a7dfeb654e3b8173/nova/pci/stats.py#L645

** Affects: nova
     Importance: Undecided
         Status: New


** Tags: pci

** Tags added: pci

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1986838

Title:
  Booting with two identical PCI aliases on a host with a single
  matching dev succeeds but the instance will have no PCI allocations

Status in OpenStack Compute (nova):
  New

Bug description:
  Detected during reading the code.

  Reproduction
  1) configure a host with a single PCI passthrough device
  2) configure two PCI aliases (a1, a2) with different names but each matching 
the above device
  3) boot an instance with 'pci_passthrough:alias': 'a1:1,a2:1' flavor 
extra_spec.

  Expected result
  The instance fails to schedule

  Actual result
  The instance schedules to the host but has no PCI allocations
  The nova scheduler logs:
  Selected host: compute1 failed to consume from instance. Error: PCI device 
request 
[InstancePCIRequest(alias_name='a1',count=1,is_new=<?>,numa_policy='legacy',request_id=None,requester_id=<?>,spec=[{product_id='1533',vendor_id='8086'}]),
 
InstancePCIRequest(alias_name='a2',count=1,is_new=<?>,numa_policy='legacy',request_id=None,requester_id=<?>,spec=[{product_id='1533',vendor_id='8086'}])]
 failed

  The nova compute logs:
  Failed to allocate PCI devices for instance. Unassigning devices back to 
pools. This should not happen, since the scheduler should have accurate 
information, and allocation during claims is controlled via a hold on the 
compute node semaphore.

  I think the root cause of the fault is that the
  PciDeviceStats.support_requests() [1] call matches each
  InstancePCIRequest object independently to the available PCI pools and
  does not update the status of the pools locally.

  I will push a functional reproduction test shortly.

  [1]
  
https://github.com/openstack/nova/blob/69bc4c38d1c5b98fcbbe8b16a7dfeb654e3b8173/nova/pci/stats.py#L645

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1986838/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1986838] [NEW] Booting with two identical PCI aliases on a host with a single matching dev succeeds but the instance will have no PCI allocations

Reply via email to