** Changed in: nova/mitaka Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1545675
Title: Resizing a pinned VM results in inconsistent state Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) mitaka series: Fix Released Bug description: It appears that executing certain resize operations on a pinned instance results in inconsistencies in the "state machine" that Nova uses to track instances. This was identified using Tempest and manifests itself in failures in follow up shelve/unshelve operations. --- # Steps Testing was conducted on host containing a single-node, Fedora 23-based (4.3.5-300.fc23.x86_64) OpenStack instance (built with DevStack). The '12d224e' commit of Nova was used. The Tempest tests (commit 'e913b82') were run using modified flavors, as seen below: nova flavor-create m1.small_nfv 420 2048 0 2 nova flavor-create m1.medium_nfv 840 4096 0 4 nova flavor-key 420 set "hw:numa_nodes=2" nova flavor-key 840 set "hw:numa_nodes=2" nova flavor-key 420 set "hw:cpu_policy=dedicated" nova flavor-key 840 set "hw:cpu_policy=dedicated" cd $TEMPEST_DIR cp etc/tempest.conf etc/tempest.conf.orig sed -i "s/flavor_ref = .*/flavor_ref = 420/" etc/tempest.conf sed -i "s/flavor_ref_alt = .*/flavor_ref_alt = 840/" etc/tempest.conf Tests were run in the order given below. 1. tempest.scenario.test_shelve_instance.TestShelveInstance.test_shelve_instance 2. tempest.api.compute.servers.test_server_actions.ServerActionsTestJSON.test_shelve_unshelve_server 3. tempest.api.compute.servers.test_server_actions.ServerActionsTestJSON.test_resize_server_revert 4. tempest.scenario.test_shelve_instance.TestShelveInstance.test_shelve_instance 5. tempest.api.compute.servers.test_server_actions.ServerActionsTestJSON.test_shelve_unshelve_server Like so: ./run_tempest.sh -- tempest.scenario.test_shelve_instance.TestShelveInstance.test_shelve_instance # Expected Result The tests should pass. # Actual Result +---+--------------------------------------+--------+ | # | test id | status | +---+--------------------------------------+--------+ | 1 | 1164e700-0af0-4a4c-8792-35909a88743c | ok | | 2 | 77eba8e0-036e-4635-944b-f7a8f3b78dc9 | ok | | 3 | c03aab19-adb1-44f5-917d-c419577e9e68 | ok | | 4 | 1164e700-0af0-4a4c-8792-35909a88743c | FAIL | | 5 | c03aab19-adb1-44f5-917d-c419577e9e68 | ok* | * this test reports as passing but is actually generating errors. Bad test! :) One test fails while the other "passes" but raises errors. The failures, where raised, are CPUPinningInvalid exceptions: CPUPinningInvalid: Cannot pin/unpin cpus [1] from the following pinned set [0, 25] **NOTE:** I also think there are issues with the non-reverted resize test, though I've yet to investigate this: * tempest.scenario.test_server_advanced_ops.TestServerAdvancedOps.test_resize_server_confirm What's worse, this error "snowballs" on successive runs. Because of the nature of the failure (a failure to pin/unpin CPUs), we're left with a list of CPUs that Nova thinks to be pinned but which are no longer actually used. This is reflected by the resource tracker. $ openstack server list $ cat /opt/stack/logs/screen/n-cpu.log | grep 'Total usable vcpus' | tail -1 *snip* INFO nova.compute.resource_tracker [*snip*] Total usable vcpus: 40, total allocated vcpus: 8 The error messages for both are given below, along with examples of this "snowballing" CPU list: {0} tempest.scenario.test_shelve_instance.TestShelveInstance.test_shelve_instance [36.713046s] ... FAILED Setting instance vm_state to ERROR Traceback (most recent call last): File "/opt/stack/nova/nova/compute/manager.py", line 2474, in do_terminate_instance self._delete_instance(context, instance, bdms, quotas) File "/opt/stack/nova/nova/hooks.py", line 149, in inner rv = f(*args, **kwargs) File "/opt/stack/nova/nova/compute/manager.py", line 2437, in _delete_instance quotas.rollback() File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__ self.force_reraise() File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise six.reraise(self.type_, self.value, self.tb) File "/opt/stack/nova/nova/compute/manager.py", line 2432, in _delete_instance self._update_resource_tracker(context, instance) File "/opt/stack/nova/nova/compute/manager.py", line 751, in _update_resource_tracker rt.update_usage(context, instance) File "/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 271, in inner return f(*args, **kwargs) File "/opt/stack/nova/nova/compute/resource_tracker.py", line 376, in update_usage self._update_usage_from_instance(context, instance) File "/opt/stack/nova/nova/compute/resource_tracker.py", line 863, in _update_usage_from_instance self._update_usage(instance, sign=sign) File "/opt/stack/nova/nova/compute/resource_tracker.py", line 705, in _update_usage self.compute_node, usage, free) File "/opt/stack/nova/nova/virt/hardware.py", line 1441, in get_host_numa_usage_from_instance host_numa_topology, instance_numa_topology, free=free)) File "/opt/stack/nova/nova/virt/hardware.py", line 1307, in numa_usage_from_instances newcell.unpin_cpus(pinned_cpus) File "/opt/stack/nova/nova/objects/numa.py", line 93, in unpin_cpus pinned=list(self.pinned_cpus)) CPUPinningInvalid: Cannot pin/unpin cpus [0] from the following pinned set [1] {0} tempest.api.compute.servers.test_server_actions.ServerActionsTestJSON.test_shelve_unshelve_server [29.131132s] ... ok Traceback (most recent call last): File "/opt/stack/nova/nova/compute/manager.py", line 2474, in do_terminate_instance self._delete_instance(context, instance, bdms, quotas) File "/opt/stack/nova/nova/hooks.py", line 149, in inner rv = f(*args, **kwargs) File "/opt/stack/nova/nova/compute/manager.py", line 2437, in _delete_instance quotas.rollback() File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__ self.force_reraise() File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise six.reraise(self.type_, self.value, self.tb) File "/opt/stack/nova/nova/compute/manager.py", line 2432, in _delete_instance self._update_resource_tracker(context, instance) File "/opt/stack/nova/nova/compute/manager.py", line 751, in _update_resource_tracker rt.update_usage(context, instance) File "/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 271, in inner return f(*args, **kwargs) File "/opt/stack/nova/nova/compute/resource_tracker.py", line 376, in update_usage self._update_usage_from_instance(context, instance) File "/opt/stack/nova/nova/compute/resource_tracker.py", line 863, in _update_usage_from_instance self._update_usage(instance, sign=sign) File "/opt/stack/nova/nova/compute/resource_tracker.py", line 705, in _update_usage self.compute_node, usage, free) File "/opt/stack/nova/nova/virt/hardware.py", line 1441, in get_host_numa_usage_from_instance host_numa_topology, instance_numa_topology, free=free)) File "/opt/stack/nova/nova/virt/hardware.py", line 1307, in numa_usage_from_instances newcell.unpin_cpus(pinned_cpus) File "/opt/stack/nova/nova/objects/numa.py", line 93, in unpin_cpus pinned=list(self.pinned_cpus)) CPUPinningInvalid: Cannot pin/unpin cpus [1] from the following pinned set [0, 25] The nth run (n ~= 6): CPUPinningInvalid: Cannot pin/unpin cpus [24] from the following pinned set [0, 1, 9, 8, 25] The nth+1 run: CPUPinningInvalid: Cannot pin/unpin cpus [27] from the following pinned set [0, 1, 24, 25, 8, 9] The nth+2 run: CPUPinningInvalid: Cannot pin/unpin cpus [2] from the following pinned set [0, 1, 24, 25, 8, 9, 27] To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1545675/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp