Hi Eddie,

Looking on the your nova database after the delete looks correct to me.

| created_at          | updated_at          | deleted_at          | deleted | id

| 2017-06-21 00:56:06 | 2017-07-07 02:27:16 | NULL                |       0 |  2

| 2017-07-07 01:42:48 | 2017-07-07 02:13:14 | 2017-07-07 02:13:42 |       9 |  9
See that the second row has deleted_at timestamp  and deleted with no zero 
value (the id of the row). Nova is doing soft delete which is just marking the 
row as deleted but not actually deleting it from nova pci_devices table.
See [1] and [2]

There is a bug with the pci_devices in a scenario  when we can delete allocated 
pci device e.g. if pci.passthrough_whitelist is changed  commit [3] try to 
resolve.


[1] - 
https://github.com/openstack/oslo.db/blob/master/oslo_db/sqlalchemy/models.py#L142-L150
[2] - 
https://github.com/openstack/nova/blob/master/nova/db/sqlalchemy/models.py#L1411
[3-] - https://review.openstack.org/#/c/426243/

From: Eddie Yen [mailto:missile0...@gmail.com]
Sent: Tuesday, July 11, 2017 3:18 AM
To: Jay Pipes <jaypi...@gmail.com>
Cc: openstack@lists.openstack.org
Subject: Re: [Openstack] [nova] Database not delete PCI info after device is 
removed from host and nova.conf

Roger that,

I may going to report this bug on the OpenStack Compute (Nova) Launchpad to see 
what happen.

Anyway, thanks for ur help, really appreciate.


Eddie.

2017-07-11 8:12 GMT+08:00 Jay Pipes 
<jaypi...@gmail.com<mailto:jaypi...@gmail.com>>:
Unfortunately, Eddie, I'm not entirely sure what is going on with your 
situation. According to the code, the non-existing PCI device should be removed 
from the pci_devices table when the PCI manager notices the PCI device is no 
longer on the local host...

On 07/09/2017 08:36 PM, Eddie Yen wrote:
Hi there,

Does the information already enough or need additional items?

Thanks,
Eddie.

2017-07-07 10:49 GMT+08:00 Eddie Yen 
<missile0...@gmail.com<mailto:missile0...@gmail.com> 
<mailto:missile0...@gmail.com<mailto:missile0...@gmail.com>>>:

    Sorry,

    Re-new the nova-compute log after remove "1002:68c8" and restart
    nova-compute.
    
http://paste.openstack.org/show/qUCOX09jyeMydoYHc8Oz/<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpaste.openstack.org%2Fshow%2FqUCOX09jyeMydoYHc8Oz%2F&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098573075&sdata=brxkAv3AgO%2BwpwPXow5SY%2By0rGZ%2B7STTbEfm3gH1KSM%3D&reserved=0>
    
<http://paste.openstack.org/show/qUCOX09jyeMydoYHc8Oz/<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpaste.openstack.org%2Fshow%2FqUCOX09jyeMydoYHc8Oz%2F&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098573075&sdata=brxkAv3AgO%2BwpwPXow5SY%2By0rGZ%2B7STTbEfm3gH1KSM%3D&reserved=0>>

    2017-07-07 10:37 GMT+08:00 Eddie Yen 
<missile0...@gmail.com<mailto:missile0...@gmail.com>
    <mailto:missile0...@gmail.com<mailto:missile0...@gmail.com>>>:


        Hi Jay,

        Below are few logs and information you may want to check.



        I wrote GPU inforamtion into nova.conf like this.

        pci_passthrough_whitelist = [{ "product_id":"0ff3",
        "vendor_id":"10de"}, { "product_id":"68c8", "vendor_id":"1002"}]

        pci_alias = [{ "product_id":"0ff3", "vendor_id":"10de",
        "device_type":"type-PCI", "name":"k420"}, { "product_id":"68c8",
        "vendor_id":"1002", "device_type":"type-PCI", "name":"v4800"}]


        Then restart the services.

        nova-compute log when insert new GPU device info into nova.conf
        and restart service:
        
http://paste.openstack.org/show/z015rYGXaxYhVoafKdbx/<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpaste.openstack.org%2Fshow%2Fz015rYGXaxYhVoafKdbx%2F&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=Jc1%2B7Uexui%2FFfEL%2FdADTp6tVa9ssIBPGabGwA85Qm2E%3D&reserved=0>
        
<http://paste.openstack.org/show/z015rYGXaxYhVoafKdbx/<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpaste.openstack.org%2Fshow%2Fz015rYGXaxYhVoafKdbx%2F&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=Jc1%2B7Uexui%2FFfEL%2FdADTp6tVa9ssIBPGabGwA85Qm2E%3D&reserved=0>>

        Strange is, the log shows that resource tracker only collect
        information of new setup GPU, not included the old one.


        But If I do some actions on the instance contained old GPU, the
        tracker will get both GPU.
        
http://paste.openstack.org/show/614658/<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpaste.openstack.org%2Fshow%2F614658%2F&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=EvEVi1mhEAbVLK7NQppVJX8i7aqkgCtwbH8GRFr81Fo%3D&reserved=0>
        
<http://paste.openstack.org/show/614658/<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpaste.openstack.org%2Fshow%2F614658%2F&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=EvEVi1mhEAbVLK7NQppVJX8i7aqkgCtwbH8GRFr81Fo%3D&reserved=0>>

        Nova database shows correct information on both GPU
        
http://paste.openstack.org/show/8JS0i6BMitjeBVRJTkRo/<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpaste.openstack.org%2Fshow%2F8JS0i6BMitjeBVRJTkRo%2F&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=V%2BFxNgTY2N3hDU6gK31axnLCf1bvz7B7Lw%2FmqY%2BrhT8%3D&reserved=0>
        
<http://paste.openstack.org/show/8JS0i6BMitjeBVRJTkRo/<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpaste.openstack.org%2Fshow%2F8JS0i6BMitjeBVRJTkRo%2F&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=V%2BFxNgTY2N3hDU6gK31axnLCf1bvz7B7Lw%2FmqY%2BrhT8%3D&reserved=0>>



        Now remove ID "1002:68c8" from nova.conf and compute node, and
        restart services.

        The pci_passthrough_whitelist and pci_alias only keep
        "10de:0ff3" GPU info.

        pci_passthrough_whitelist = { "product_id":"0ff3",
        "vendor_id":"10de" }

        pci_alias = { "product_id":"0ff3", "vendor_id":"10de",
        "device_type":"type-PCI", "name":"k420" }


        nova-compute log shows resource tracker report node only have
        "10de:0ff3" PCI resource
        
http://paste.openstack.org/show/VjLinsipne5nM8o0TYcJ/<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpaste.openstack.org%2Fshow%2FVjLinsipne5nM8o0TYcJ%2F&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=GmWsrHuv1DphNJXSKDils8iUWn%2BKbeihjmfDQHQHOMY%3D&reserved=0>
        
<http://paste.openstack.org/show/VjLinsipne5nM8o0TYcJ/<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpaste.openstack.org%2Fshow%2FVjLinsipne5nM8o0TYcJ%2F&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=GmWsrHuv1DphNJXSKDils8iUWn%2BKbeihjmfDQHQHOMY%3D&reserved=0>>

        But in Nova database, "1002:68c8" still exist, and stayed in
        "Available" status. Even "deleted" value shows not zero.
        
http://paste.openstack.org/show/SnJ8AzJYD6wCo7jslIc2/<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpaste.openstack.org%2Fshow%2FSnJ8AzJYD6wCo7jslIc2%2F&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=9bzrGFdYVtAtwKdTu0ZaxegUah3ZTBbNqAGjCrsT9lk%3D&reserved=0>
        
<http://paste.openstack.org/show/SnJ8AzJYD6wCo7jslIc2/<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpaste.openstack.org%2Fshow%2FSnJ8AzJYD6wCo7jslIc2%2F&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=9bzrGFdYVtAtwKdTu0ZaxegUah3ZTBbNqAGjCrsT9lk%3D&reserved=0>>


        Many thanks,
        Eddie.

        2017-07-07 9:05 GMT+08:00 Eddie Yen 
<missile0...@gmail.com<mailto:missile0...@gmail.com>
        <mailto:missile0...@gmail.com<mailto:missile0...@gmail.com>>>:

            Uh wait,

            Is that possible it still shows available if PCI device
            still exist in the same address?

            Because when I remove the GPU card, I replace it to a SFP+
            network card in the same slot.
            So when I type lspci the SFP+ card stay in the same address.

            But it still doesn't make any sense because these two cards
            definitely not a same VID:PID.
            And I set the information as VID:PID in nova.conf


            I'll try reproduce this issue and put a log on this list.

            Thanks,

            2017-07-07 9:01 GMT+08:00 Jay Pipes 
<jaypi...@gmail.com<mailto:jaypi...@gmail.com>
            <mailto:jaypi...@gmail.com<mailto:jaypi...@gmail.com>>>:

                Hmm, very odd indeed. Any way you can save the
                nova-compute logs from when you removed the GPU and
                restarted the nova-compute service and paste those logs
                to 
paste.openstack.org<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpaste.openstack.org&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=%2B6nouVdZuiGwaywLl%2BYGqbqDIbZZIjagLykv6%2BEYrf8%3D&reserved=0>
 
<http://paste.openstack.org<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpaste.openstack.org&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=%2B6nouVdZuiGwaywLl%2BYGqbqDIbZZIjagLykv6%2BEYrf8%3D&reserved=0>>?
                Would be useful in tracking down this buggy behaviour...

                Best,
                -jay

                On 07/06/2017 08:54 PM, Eddie Yen wrote:

                    Hi Jay,

                    The status of the "removed" GPU still shows as
                    "Available" in pci_devices table.

                    2017-07-07 8:34 GMT+08:00 Jay Pipes
                    <jaypi...@gmail.com<mailto:jaypi...@gmail.com> 
<mailto:jaypi...@gmail.com<mailto:jaypi...@gmail.com>>
                    <mailto:jaypi...@gmail.com<mailto:jaypi...@gmail.com>

                    <mailto:jaypi...@gmail.com<mailto:jaypi...@gmail.com>>>>:


                         Hi again, Eddie :) Answer inline...

                         On 07/06/2017 08:14 PM, Eddie Yen wrote:

                             Hi everyone,

                             I'm using OpenStack Mitaka version
                    (deployed from Fuel 9.2)

                             In present, I installed two different model
                    of GPU card.

                             And wrote these information into pci_alias and
                             pci_passthrough_whitelist in nova.conf on
                    Controller and Compute
                             (the node which installed GPU).
                             Then restart nova-api, nova-scheduler,and
                    nova-compute.

                             When I check database, both of GPU info
                    registered in
                             pci_devices table.

                             Now I removed one of the GPU from compute
                    node, and remove the
                             information from nova.conf, then restart
                    services.

                             But I check database again, the information
                    of the removed card
                             still exist in pci_devices table.

                             How can I do to fix this problem?


                         So, when you removed the GPU from the compute
                    node and restarted the
                         nova-compute service, it *should* have noticed
                    you had removed the
                         GPU and marked that PCI device as deleted. At
                    least, according to
                         this code in the PCI manager:

                    
https://github.com/openstack/nova/blob/master/nova/pci/manager.py#L168-L183<https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fopenstack%2Fnova%2Fblob%2Fmaster%2Fnova%2Fpci%2Fmanager.py%23L168-L183&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=CYb%2Fec5fiAkU9LfJ7W6eMxXsS%2F2VpdfaVYSAdcGRy94%3D&reserved=0>
                    
<https://github.com/openstack/nova/blob/master/nova/pci/manager.py#L168-L183<https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fopenstack%2Fnova%2Fblob%2Fmaster%2Fnova%2Fpci%2Fmanager.py%23L168-L183&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=CYb%2Fec5fiAkU9LfJ7W6eMxXsS%2F2VpdfaVYSAdcGRy94%3D&reserved=0>>
                                            
<https://github.com/openstack/nova/blob/master/nova/pci/manager.py#L168-L183<https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fopenstack%2Fnova%2Fblob%2Fmaster%2Fnova%2Fpci%2Fmanager.py%23L168-L183&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=CYb%2Fec5fiAkU9LfJ7W6eMxXsS%2F2VpdfaVYSAdcGRy94%3D&reserved=0>
                    
<https://github.com/openstack/nova/blob/master/nova/pci/manager.py#L168-L183<https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fopenstack%2Fnova%2Fblob%2Fmaster%2Fnova%2Fpci%2Fmanager.py%23L168-L183&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=CYb%2Fec5fiAkU9LfJ7W6eMxXsS%2F2VpdfaVYSAdcGRy94%3D&reserved=0>>>

                         Question for you: what is the value of the
                    status field in the
                         pci_devices table for the GPU that you removed?

                         Best,
                         -jay

                         p.s. If you really want to get rid of that
                    device, simply remove
                         that record from the pci_devices table. But,
                    again, it *should* be
                         removed automatically...

                         _______________________________________________
                         Mailing list:
                    
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.openstack.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fopenstack&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=ZpzNaE0Wra4KGRWcluDSyq9lIWTjcOa%2F0uEzllZ6ofI%3D&reserved=0>
                    
<http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.openstack.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fopenstack&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=ZpzNaE0Wra4KGRWcluDSyq9lIWTjcOa%2F0uEzllZ6ofI%3D&reserved=0>>
                                            
<http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.openstack.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fopenstack&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098593092&sdata=EM1gsCu55xLMlaPGl5QumwnCR%2FEfgNEEF3GpXOCDshE%3D&reserved=0>
                    
<http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.openstack.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fopenstack&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098593092&sdata=EM1gsCu55xLMlaPGl5QumwnCR%2FEfgNEEF3GpXOCDshE%3D&reserved=0>>>
                         Post to     : 
openstack@lists.openstack.org<mailto:openstack@lists.openstack.org>
                    
<mailto:openstack@lists.openstack.org<mailto:openstack@lists.openstack.org>>
                         
<mailto:openstack@lists.openstack.org<mailto:openstack@lists.openstack.org>
                    
<mailto:openstack@lists.openstack.org<mailto:openstack@lists.openstack.org>>>
                         Unsubscribe :
                    
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.openstack.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fopenstack&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098593092&sdata=EM1gsCu55xLMlaPGl5QumwnCR%2FEfgNEEF3GpXOCDshE%3D&reserved=0>
                    
<http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.openstack.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fopenstack&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098593092&sdata=EM1gsCu55xLMlaPGl5QumwnCR%2FEfgNEEF3GpXOCDshE%3D&reserved=0>>
                                            
<http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.openstack.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fopenstack&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098593092&sdata=EM1gsCu55xLMlaPGl5QumwnCR%2FEfgNEEF3GpXOCDshE%3D&reserved=0>
                    
<http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.openstack.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fopenstack&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098593092&sdata=EM1gsCu55xLMlaPGl5QumwnCR%2FEfgNEEF3GpXOCDshE%3D&reserved=0>>>






_______________________________________________
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to     : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Reply via email to