Reviewed: https://review.openstack.org/467220 Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=3d3e9cdd774efe96f468f2bcba6c09a40f5e71d3 Submitter: Jenkins Branch: master
commit 3d3e9cdd774efe96f468f2bcba6c09a40f5e71d3 Author: Kevin_Zheng <[email protected]> Date: Tue May 23 20:28:28 2017 +0800 Exclude deleted service records when calling hypervisor statistics Hypervisor statistics could be incorrect if not exclude deleted service records from DB. User may stop 'nova-compute' service on some compute nodes and delete the service from nova. When delete 'nova-compute' service, it performs 'soft-delete' to the corresponding db records in both 'service' table and 'compute_nodes' table if the compute_nodes record is old, i.e. it is linked to the service record. For modern compute_nodes records, they aren't linked to the services table so deleting the services record will not delete the compute_nodes record, and the ResourceTracker won't recreate the compute_nodes record if the host and hypervisor_hostname still match the existing record, but restarting the process after deleting the service will create a new services table record with the same host/binary/topic. If the 'nova-compute' service on that server re-starts, it will automatically add a record in 'compute_nodes' table (assuming it was deleted because it was an old-style record) and also a correspoding record in 'service' table, and if the host name of the compute node did not change, the newly created records in 'service' and 'compute_nodes' table will be identical to the priously soft-deleted records except the deleted row. When calling Hypervisor-statistics, the DB layer joined records across the whole deployment by comparing records' host field selected from serivce table and records' host field selected from compute_nodes table, and the calculated results could be multiplied if multiple records from service table have the same host field, and this scenario could happen if user perform the above actions. Co-Authored-By: Matt Riedemann <[email protected]> Change-Id: I9dfa15f69f8ef9c6cb36b2734a8601bd73e9d6b3 Closes-Bug: #1692397 ** Changed in: nova Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1692397 Title: hypervisor statistics could be incorrect Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) ocata series: Confirmed Bug description: Hypervisor statistics could be incorrect: When we killed a nova-compute service and deleted the service from nova DB, and then start the nova-compute service again, the result of Hypervisor/statistics API (nova hypervisor-stats) will be incorrect; How to reproduce: Step1. Check the correct statistics before we do anything: root@SZX1000291919:/opt/stack/nova# nova hypervisor-stats +----------------------+-------+ | Property | Value | +----------------------+-------+ | count | 1 | | current_workload | 0 | | disk_available_least | 14 | | free_disk_gb | 34 | | free_ram_mb | 6936 | | local_gb | 35 | | local_gb_used | 1 | | memory_mb | 7960 | | memory_mb_used | 1024 | | running_vms | 1 | | vcpus | 8 | | vcpus_used | 1 | +----------------------+-------+ Step2. Kill the compute service: root@SZX1000291919:/var/log/nova# ps -ef | grep nova-com root 120419 120411 0 11:06 pts/27 00:00:00 sg libvirtd /usr/local/bin/nova-compute --config-file /etc/nova/nova.conf --log-file /var/log/nova/nova-compute.log root 120420 120419 0 11:06 pts/27 00:00:07 /usr/bin/python /usr/local/bin/nova-compute --config-file /etc/nova/nova.conf --log-file /var/log/nova/nova-compute.log root@SZX1000291919:/var/log/nova# kill -9 120419 root@SZX1000291919:/var/log/nova# /usr/local/bin/stack: line 19: 120419 Killed sg libvirtd '/usr/local/bin/nova-compute --config-file /etc/nova/nova.conf --log-file /var/log/nova/nova-compute.log' > /dev/null 2>&1 root@SZX1000291919:/var/log/nova# nova service-list +----+------------------+---------------+----------+---------+-------+----------------------------+-----------------+ | Id | Binary | Host | Zone | Status | State | Updated_at | Disabled Reason | +----+------------------+---------------+----------+---------+-------+----------------------------+-----------------+ | 4 | nova-conductor | SZX1000291919 | internal | enabled | up | 2017-05-22T03:24:36.000000 | - | | 6 | nova-scheduler | SZX1000291919 | internal | enabled | up | 2017-05-22T03:24:36.000000 | - | | 7 | nova-consoleauth | SZX1000291919 | internal | enabled | up | 2017-05-22T03:24:37.000000 | - | | 8 | nova-compute | SZX1000291919 | nova | enabled | down | 2017-05-22T03:23:38.000000 | - | | 9 | nova-cert | SZX1000291919 | internal | enabled | down | 2017-05-17T02:50:13.000000 | - | +----+------------------+---------------+----------+---------+-------+----------------------------+-----------------+ Step3. Delete the service from DB: root@SZX1000291919:/var/log/nova# nova service-delete 8 root@SZX1000291919:/var/log/nova# nova service-list +----+------------------+---------------+----------+---------+-------+----------------------------+-----------------+ | Id | Binary | Host | Zone | Status | State | Updated_at | Disabled Reason | +----+------------------+---------------+----------+---------+-------+----------------------------+-----------------+ | 4 | nova-conductor | SZX1000291919 | internal | enabled | up | 2017-05-22T03:25:16.000000 | - | | 6 | nova-scheduler | SZX1000291919 | internal | enabled | up | 2017-05-22T03:25:16.000000 | - | | 7 | nova-consoleauth | SZX1000291919 | internal | enabled | up | 2017-05-22T03:25:17.000000 | - | | 9 | nova-cert | SZX1000291919 | internal | enabled | down | 2017-05-17T02:50:13.000000 | - | +----+------------------+---------------+----------+---------+-------+----------------------------+-----------------+ Step4. Start the compute service again: root@SZX1000291919:/var/log/nova# nova service-list +----+------------------+---------------+----------+---------+-------+----------------------------+-----------------+ | Id | Binary | Host | Zone | Status | State | Updated_at | Disabled Reason | +----+------------------+---------------+----------+---------+-------+----------------------------+-----------------+ | 4 | nova-conductor | SZX1000291919 | internal | enabled | up | 2017-05-22T03:48:55.000000 | - | | 6 | nova-scheduler | SZX1000291919 | internal | enabled | up | 2017-05-22T03:48:56.000000 | - | | 7 | nova-consoleauth | SZX1000291919 | internal | enabled | up | 2017-05-22T03:48:56.000000 | - | | 9 | nova-cert | SZX1000291919 | internal | enabled | down | 2017-05-17T02:50:13.000000 | - | | 10 | nova-compute | SZX1000291919 | nova | enabled | up | 2017-05-22T03:48:57.000000 | - | +----+------------------+---------------+----------+---------+-------+----------------------------+-----------------+ Step5. Check again the hyervisor statistics, the result is incorrect: root@SZX1000291919:/var/log/nova# nova hypervisor-stats +----------------------+-------+ | Property | Value | +----------------------+-------+ | count | 2 | | current_workload | 0 | | disk_available_least | 28 | | free_disk_gb | 68 | | free_ram_mb | 13872 | | local_gb | 70 | | local_gb_used | 2 | | memory_mb | 15920 | | memory_mb_used | 2048 | | running_vms | 2 | | vcpus | 16 | | vcpus_used | 2 | +----------------------+-------+ To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1692397/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : [email protected] Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp

