Public bug reported:
Detected during reading the code.
Reproduction
1) configure a host with a single PCI passthrough device
2) configure two PCI aliases (a1, a2) with different names but each matching
the above device
3) boot an instance with 'pci_passthrough:alias': 'a1:1,a2:1' flavor extra_spec.
Expected result
The instance fails to schedule
Actual result
The instance schedules to the host but has no PCI allocations
The nova scheduler logs:
Selected host: compute1 failed to consume from instance. Error: PCI device
request
[InstancePCIRequest(alias_name='a1',count=1,is_new=<?>,numa_policy='legacy',request_id=None,requester_id=<?>,spec=[{product_id='1533',vendor_id='8086'}]),
InstancePCIRequest(alias_name='a2',count=1,is_new=<?>,numa_policy='legacy',request_id=None,requester_id=<?>,spec=[{product_id='1533',vendor_id='8086'}])]
failed
The nova compute logs:
Failed to allocate PCI devices for instance. Unassigning devices back to pools.
This should not happen, since the scheduler should have accurate information,
and allocation during claims is controlled via a hold on the compute node
semaphore.
I think the root cause of the fault is that the
PciDeviceStats.support_requests() [1] call matches each
InstancePCIRequest object independently to the available PCI pools and
does not update the status of the pools locally.
I will push a functional reproduction test shortly.
[1]
https://github.com/openstack/nova/blob/69bc4c38d1c5b98fcbbe8b16a7dfeb654e3b8173/nova/pci/stats.py#L645
** Affects: nova
Importance: Undecided
Status: New
** Tags: pci
** Tags added: pci
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1986838
Title:
Booting with two identical PCI aliases on a host with a single
matching dev succeeds but the instance will have no PCI allocations
Status in OpenStack Compute (nova):
New
Bug description:
Detected during reading the code.
Reproduction
1) configure a host with a single PCI passthrough device
2) configure two PCI aliases (a1, a2) with different names but each matching
the above device
3) boot an instance with 'pci_passthrough:alias': 'a1:1,a2:1' flavor
extra_spec.
Expected result
The instance fails to schedule
Actual result
The instance schedules to the host but has no PCI allocations
The nova scheduler logs:
Selected host: compute1 failed to consume from instance. Error: PCI device
request
[InstancePCIRequest(alias_name='a1',count=1,is_new=<?>,numa_policy='legacy',request_id=None,requester_id=<?>,spec=[{product_id='1533',vendor_id='8086'}]),
InstancePCIRequest(alias_name='a2',count=1,is_new=<?>,numa_policy='legacy',request_id=None,requester_id=<?>,spec=[{product_id='1533',vendor_id='8086'}])]
failed
The nova compute logs:
Failed to allocate PCI devices for instance. Unassigning devices back to
pools. This should not happen, since the scheduler should have accurate
information, and allocation during claims is controlled via a hold on the
compute node semaphore.
I think the root cause of the fault is that the
PciDeviceStats.support_requests() [1] call matches each
InstancePCIRequest object independently to the available PCI pools and
does not update the status of the pools locally.
I will push a functional reproduction test shortly.
[1]
https://github.com/openstack/nova/blob/69bc4c38d1c5b98fcbbe8b16a7dfeb654e3b8173/nova/pci/stats.py#L645
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1986838/+subscriptions
--
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : [email protected]
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help : https://help.launchpad.net/ListHelp