Hi Jay, Below are few logs and information you may want to check.
I wrote GPU inforamtion into nova.conf like this. pci_passthrough_whitelist = [{ "product_id":"0ff3", "vendor_id":"10de" }, { "product_id":"68c8", "vendor_id":"1002" }] pci_alias = [{ "product_id":"0ff3", "vendor_id":"10de", "device_type": "type-PCI", "name":"k420" }, { "product_id":"68c8", "vendor_id":"1002", "device_type":"type-PCI", "name":"v4800" }] Then restart the services. nova-compute log when insert new GPU device info into nova.conf and restart service: http://paste.openstack.org/show/z015rYGXaxYhVoafKdbx/ Strange is, the log shows that resource tracker only collect information of new setup GPU, not included the old one. But If I do some actions on the instance contained old GPU, the tracker will get both GPU. http://paste.openstack.org/show/614658/ Nova database shows correct information on both GPU http://paste.openstack.org/show/8JS0i6BMitjeBVRJTkRo/ Now remove ID "1002:68c8" from nova.conf and compute node, and restart services. The pci_passthrough_whitelist and pci_alias only keep "10de:0ff3" GPU info. pci_passthrough_whitelist = { "product_id":"0ff3", "vendor_id":"10de" } pci_alias = { "product_id":"0ff3", "vendor_id":"10de", "device_type": "type-PCI", "name":"k420" } nova-compute log shows resource tracker report node only have "10de:0ff3" PCI resource http://paste.openstack.org/show/VjLinsipne5nM8o0TYcJ/ But in Nova database, "1002:68c8" still exist, and stayed in "Available" status. Even "deleted" value shows not zero. http://paste.openstack.org/show/SnJ8AzJYD6wCo7jslIc2/ Many thanks, Eddie. 2017-07-07 9:05 GMT+08:00 Eddie Yen <missile0...@gmail.com>: > Uh wait, > > Is that possible it still shows available if PCI device still exist in the > same address? > > Because when I remove the GPU card, I replace it to a SFP+ network card in > the same slot. > So when I type lspci the SFP+ card stay in the same address. > > But it still doesn't make any sense because these two cards definitely not > a same VID:PID. > And I set the information as VID:PID in nova.conf > > > I'll try reproduce this issue and put a log on this list. > > Thanks, > > 2017-07-07 9:01 GMT+08:00 Jay Pipes <jaypi...@gmail.com>: > >> Hmm, very odd indeed. Any way you can save the nova-compute logs from >> when you removed the GPU and restarted the nova-compute service and paste >> those logs to paste.openstack.org? Would be useful in tracking down this >> buggy behaviour... >> >> Best, >> -jay >> >> On 07/06/2017 08:54 PM, Eddie Yen wrote: >> >>> Hi Jay, >>> >>> The status of the "removed" GPU still shows as "Available" in >>> pci_devices table. >>> >>> 2017-07-07 8:34 GMT+08:00 Jay Pipes <jaypi...@gmail.com <mailto: >>> jaypi...@gmail.com>>: >>> >>> >>> Hi again, Eddie :) Answer inline... >>> >>> On 07/06/2017 08:14 PM, Eddie Yen wrote: >>> >>> Hi everyone, >>> >>> I'm using OpenStack Mitaka version (deployed from Fuel 9.2) >>> >>> In present, I installed two different model of GPU card. >>> >>> And wrote these information into pci_alias and >>> pci_passthrough_whitelist in nova.conf on Controller and Compute >>> (the node which installed GPU). >>> Then restart nova-api, nova-scheduler,and nova-compute. >>> >>> When I check database, both of GPU info registered in >>> pci_devices table. >>> >>> Now I removed one of the GPU from compute node, and remove the >>> information from nova.conf, then restart services. >>> >>> But I check database again, the information of the removed card >>> still exist in pci_devices table. >>> >>> How can I do to fix this problem? >>> >>> >>> So, when you removed the GPU from the compute node and restarted the >>> nova-compute service, it *should* have noticed you had removed the >>> GPU and marked that PCI device as deleted. At least, according to >>> this code in the PCI manager: >>> >>> https://github.com/openstack/nova/blob/master/nova/pci/manag >>> er.py#L168-L183 >>> <https://github.com/openstack/nova/blob/master/nova/pci/mana >>> ger.py#L168-L183> >>> >>> Question for you: what is the value of the status field in the >>> pci_devices table for the GPU that you removed? >>> >>> Best, >>> -jay >>> >>> p.s. If you really want to get rid of that device, simply remove >>> that record from the pci_devices table. But, again, it *should* be >>> removed automatically... >>> >>> _______________________________________________ >>> Mailing list: >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack >>> <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack> >>> Post to : openstack@lists.openstack.org >>> <mailto:openstack@lists.openstack.org> >>> Unsubscribe : >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack >>> <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack> >>> >>> >>> >
_______________________________________________ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack