** Also affects: cloud-archive/mitaka Importance: Undecided Status: New
** Changed in: cloud-archive/mitaka Status: New => Triaged ** Changed in: cloud-archive/mitaka Importance: Undecided => Low ** Changed in: nova (Ubuntu Xenial) Status: New => Triaged ** Changed in: nova (Ubuntu Xenial) Importance: Undecided => Low -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1692397 Title: hypervisor statistics could be incorrect Status in Ubuntu Cloud Archive: Fix Released Status in Ubuntu Cloud Archive mitaka series: Triaged Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) newton series: Fix Committed Status in OpenStack Compute (nova) ocata series: Fix Committed Status in nova package in Ubuntu: Fix Released Status in nova source package in Xenial: Triaged Bug description: [Impact] If you deploy a nova-compute service to a node, delete that service (via the api), then deploy a new nova-compute service to that same node i.e. same hostname, the database will now have two service records one marked as deleted and the other not. So far so good until you do an 'openstack hypervisor stats show' at which point the api will aggregate the resource counts from both services. This has been fixed and backported all the way down to Newton so the problem still exists on Mitaka. I assume the reason why the patch was not backported to Mitaka is that the code in nova.db.sqlalchemy.apy.compute_node_statistics() changed quite a bit. However it only requires a one line change in the old code (that does the same thing as the new code) to fix this issue. [Test Case] * Deploy Mitaka with bundle http://pastebin.ubuntu.com/25968008/ * Do 'openstack hypervisor stats show' and verify that count is 3 * Do 'juju remove-unit nova-compute/2' to delete a compute service but not its physical host * Do 'openstack compute service delete <id>' to delete a compute service we just removed (choosing correct id) * Do 'openstack hypervisor stats show' and verify that count is 2 * Do juju add-unit nova-compute --to <machine id of deleted unit> * Do 'openstack hypervisor stats show' and verify that count is 3 (not 4 as it would be before fix) [Regression Potential] None anticipated other than for clients that were interpreting invalid counts as correct. [Other Info] =========================================================================== Hypervisor statistics could be incorrect: When we killed a nova-compute service and deleted the service from nova DB, and then start the nova-compute service again, the result of Hypervisor/statistics API (nova hypervisor-stats) will be incorrect; How to reproduce: Step1. Check the correct statistics before we do anything: root@SZX1000291919:/opt/stack/nova# nova hypervisor-stats +----------------------+-------+ | Property | Value | +----------------------+-------+ | count | 1 | | current_workload | 0 | | disk_available_least | 14 | | free_disk_gb | 34 | | free_ram_mb | 6936 | | local_gb | 35 | | local_gb_used | 1 | | memory_mb | 7960 | | memory_mb_used | 1024 | | running_vms | 1 | | vcpus | 8 | | vcpus_used | 1 | +----------------------+-------+ Step2. Kill the compute service: root@SZX1000291919:/var/log/nova# ps -ef | grep nova-com root 120419 120411 0 11:06 pts/27 00:00:00 sg libvirtd /usr/local/bin/nova-compute --config-file /etc/nova/nova.conf --log-file /var/log/nova/nova-compute.log root 120420 120419 0 11:06 pts/27 00:00:07 /usr/bin/python /usr/local/bin/nova-compute --config-file /etc/nova/nova.conf --log-file /var/log/nova/nova-compute.log root@SZX1000291919:/var/log/nova# kill -9 120419 root@SZX1000291919:/var/log/nova# /usr/local/bin/stack: line 19: 120419 Killed sg libvirtd '/usr/local/bin/nova-compute --config-file /etc/nova/nova.conf --log-file /var/log/nova/nova-compute.log' > /dev/null 2>&1 root@SZX1000291919:/var/log/nova# nova service-list +----+------------------+---------------+----------+---------+-------+----------------------------+-----------------+ | Id | Binary | Host | Zone | Status | State | Updated_at | Disabled Reason | +----+------------------+---------------+----------+---------+-------+----------------------------+-----------------+ | 4 | nova-conductor | SZX1000291919 | internal | enabled | up | 2017-05-22T03:24:36.000000 | - | | 6 | nova-scheduler | SZX1000291919 | internal | enabled | up | 2017-05-22T03:24:36.000000 | - | | 7 | nova-consoleauth | SZX1000291919 | internal | enabled | up | 2017-05-22T03:24:37.000000 | - | | 8 | nova-compute | SZX1000291919 | nova | enabled | down | 2017-05-22T03:23:38.000000 | - | | 9 | nova-cert | SZX1000291919 | internal | enabled | down | 2017-05-17T02:50:13.000000 | - | +----+------------------+---------------+----------+---------+-------+----------------------------+-----------------+ Step3. Delete the service from DB: root@SZX1000291919:/var/log/nova# nova service-delete 8 root@SZX1000291919:/var/log/nova# nova service-list +----+------------------+---------------+----------+---------+-------+----------------------------+-----------------+ | Id | Binary | Host | Zone | Status | State | Updated_at | Disabled Reason | +----+------------------+---------------+----------+---------+-------+----------------------------+-----------------+ | 4 | nova-conductor | SZX1000291919 | internal | enabled | up | 2017-05-22T03:25:16.000000 | - | | 6 | nova-scheduler | SZX1000291919 | internal | enabled | up | 2017-05-22T03:25:16.000000 | - | | 7 | nova-consoleauth | SZX1000291919 | internal | enabled | up | 2017-05-22T03:25:17.000000 | - | | 9 | nova-cert | SZX1000291919 | internal | enabled | down | 2017-05-17T02:50:13.000000 | - | +----+------------------+---------------+----------+---------+-------+----------------------------+-----------------+ Step4. Start the compute service again: root@SZX1000291919:/var/log/nova# nova service-list +----+------------------+---------------+----------+---------+-------+----------------------------+-----------------+ | Id | Binary | Host | Zone | Status | State | Updated_at | Disabled Reason | +----+------------------+---------------+----------+---------+-------+----------------------------+-----------------+ | 4 | nova-conductor | SZX1000291919 | internal | enabled | up | 2017-05-22T03:48:55.000000 | - | | 6 | nova-scheduler | SZX1000291919 | internal | enabled | up | 2017-05-22T03:48:56.000000 | - | | 7 | nova-consoleauth | SZX1000291919 | internal | enabled | up | 2017-05-22T03:48:56.000000 | - | | 9 | nova-cert | SZX1000291919 | internal | enabled | down | 2017-05-17T02:50:13.000000 | - | | 10 | nova-compute | SZX1000291919 | nova | enabled | up | 2017-05-22T03:48:57.000000 | - | +----+------------------+---------------+----------+---------+-------+----------------------------+-----------------+ Step5. Check again the hyervisor statistics, the result is incorrect: root@SZX1000291919:/var/log/nova# nova hypervisor-stats +----------------------+-------+ | Property | Value | +----------------------+-------+ | count | 2 | | current_workload | 0 | | disk_available_least | 28 | | free_disk_gb | 68 | | free_ram_mb | 13872 | | local_gb | 70 | | local_gb_used | 2 | | memory_mb | 15920 | | memory_mb_used | 2048 | | running_vms | 2 | | vcpus | 16 | | vcpus_used | 2 | +----------------------+-------+ To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-archive/+bug/1692397/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp