Answering my own questions a bti faster than I though I could. nova DB has a pci_devices table.
what happened was there was in intermediate state where the pci_passthrough_whitelist value on the hypervisor was missin. apparently during taht time the row for this hypervisor in the pci_devices table got marked as deleted. Teh when the nova.conf go fixed they got recreated (even though the old 'deleted' resources we really actively in use) so I end up this colliding state: > SELECT > created_at,deleted_at,deleted,id,compute_node_id,address,status,instance_uuid > FROM pci_devices WHERE address='0000:09:00.0'; +---------------------+---------------------+---------+----+-----------------+--------------+-----------+--------------------------------------+ | created_at | deleted_at | deleted | id | compute_node_id | address | status | instance_uuid | +---------------------+---------------------+---------+----+-----------------+--------------+-----------+--------------------------------------+ | 2016-07-06 00:12:30 | 2016-10-13 21:04:53 | 4 | 4 | 90 | 0000:09:00.0 | allocated | 9269391a-4ce4-4c8d-993d-5ad7a9c3879b | | 2016-10-18 18:01:35 | NULL | 0 | 12 | 90 | 0000:09:00.0 | available | NULL | +---------------------+---------------------+---------+----+-----------------+--------------+-----------+--------------------------------------+ since it's only really 3 entries I can fix this by hand then head over to bug report land. -Jon On Tue, Oct 18, 2016 at 02:50:11PM -0400, Jonathan D. Proulx wrote: :Hi all, : :I have a test GPU system that seemed to be working properly under Kilo :running 1 and 2 GPU instnace types on an 8GPU server. : :After Mitaka upgrade it seems to alway try and assing the same Device :which is alredy in use rather than pick one of the 5 currently :available. : : : Build of instance 9542cc63-793c-440e-9a57-cc06eb401839 was : re-scheduled: Requested operation is not valid: PCI device : 0000:09:00.0 is in use by driver QEMU, domain instance-000abefa : _do_build_and_run_instance : /usr/lib/python2.7/dist-packages/nova/compute/manager.py:1945 : :it tries to schedule 5 times, but each time uses the same busy :device. Since there are currently 3 in use if it had just picked a :new one each time : :In trying to debug this I realize I have no idea how devices are :selected. Does OpenStack track which PCI devices are claimed or is :that a libvirt function and in either case where woudl I look to find :out what it thinks the current state is? : :Thanks, :-Jon :-- -- _______________________________________________ OpenStack-operators mailing list [email protected] http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
