[Yahoo-eng-team] [Bug 1545675] Re: Resizing a pinned VM results in inconsistent state
** Changed in: nova/mitaka Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1545675 Title: Resizing a pinned VM results in inconsistent state Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) mitaka series: Fix Released Bug description: It appears that executing certain resize operations on a pinned instance results in inconsistencies in the "state machine" that Nova uses to track instances. This was identified using Tempest and manifests itself in failures in follow up shelve/unshelve operations. --- # Steps Testing was conducted on host containing a single-node, Fedora 23-based (4.3.5-300.fc23.x86_64) OpenStack instance (built with DevStack). The '12d224e' commit of Nova was used. The Tempest tests (commit 'e913b82') were run using modified flavors, as seen below: nova flavor-create m1.small_nfv 420 2048 0 2 nova flavor-create m1.medium_nfv 840 4096 0 4 nova flavor-key 420 set "hw:numa_nodes=2" nova flavor-key 840 set "hw:numa_nodes=2" nova flavor-key 420 set "hw:cpu_policy=dedicated" nova flavor-key 840 set "hw:cpu_policy=dedicated" cd $TEMPEST_DIR cp etc/tempest.conf etc/tempest.conf.orig sed -i "s/flavor_ref = .*/flavor_ref = 420/" etc/tempest.conf sed -i "s/flavor_ref_alt = .*/flavor_ref_alt = 840/" etc/tempest.conf Tests were run in the order given below. 1. tempest.scenario.test_shelve_instance.TestShelveInstance.test_shelve_instance 2. tempest.api.compute.servers.test_server_actions.ServerActionsTestJSON.test_shelve_unshelve_server 3. tempest.api.compute.servers.test_server_actions.ServerActionsTestJSON.test_resize_server_revert 4. tempest.scenario.test_shelve_instance.TestShelveInstance.test_shelve_instance 5. tempest.api.compute.servers.test_server_actions.ServerActionsTestJSON.test_shelve_unshelve_server Like so: ./run_tempest.sh -- tempest.scenario.test_shelve_instance.TestShelveInstance.test_shelve_instance # Expected Result The tests should pass. # Actual Result +---+--++ | # | test id | status | +---+--++ | 1 | 1164e700-0af0-4a4c-8792-35909a88743c | ok | | 2 | 77eba8e0-036e-4635-944b-f7a8f3b78dc9 | ok | | 3 | c03aab19-adb1-44f5-917d-c419577e9e68 | ok | | 4 | 1164e700-0af0-4a4c-8792-35909a88743c | FAIL | | 5 | c03aab19-adb1-44f5-917d-c419577e9e68 | ok* | * this test reports as passing but is actually generating errors. Bad test! :) One test fails while the other "passes" but raises errors. The failures, where raised, are CPUPinningInvalid exceptions: CPUPinningInvalid: Cannot pin/unpin cpus [1] from the following pinned set [0, 25] **NOTE:** I also think there are issues with the non-reverted resize test, though I've yet to investigate this: * tempest.scenario.test_server_advanced_ops.TestServerAdvancedOps.test_resize_server_confirm What's worse, this error "snowballs" on successive runs. Because of the nature of the failure (a failure to pin/unpin CPUs), we're left with a list of CPUs that Nova thinks to be pinned but which are no longer actually used. This is reflected by the resource tracker. $ openstack server list $ cat /opt/stack/logs/screen/n-cpu.log | grep 'Total usable vcpus' | tail -1 *snip* INFO nova.compute.resource_tracker [*snip*] Total usable vcpus: 40, total allocated vcpus: 8 The error messages for both are given below, along with examples of this "snowballing" CPU list: {0} tempest.scenario.test_shelve_instance.TestShelveInstance.test_shelve_instance [36.713046s] ... FAILED Setting instance vm_state to ERROR Traceback (most recent call last): File "/opt/stack/nova/nova/compute/manager.py", line 2474, in do_terminate_instance self._delete_instance(context, instance, bdms, quotas) File "/opt/stack/nova/nova/hooks.py", line 149, in inner rv = f(*args, **kwargs) File "/opt/stack/nova/nova/compute/manager.py", line 2437, in _delete_instance quotas.rollback() File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__ self.force_reraise() File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise six.reraise(self.type_, self.value, self.tb) File "/opt/stack/nova/nova/compute/manager.py", line 2432, in _delete_instance self._update_resource_tracker(context, instance) File "/opt/stack/nova/nova/compute/manager.py", line 751, in _update_resource_tracker rt.update_usage(context, instance) File
[Yahoo-eng-team] [Bug 1545675] Re: Resizing a pinned VM results in inconsistent state
** Also affects: nova/mitaka Importance: Undecided Status: New ** Changed in: nova/mitaka Assignee: (unassigned) => Stephen Finucane (stephenfinucane) ** Changed in: nova/mitaka Status: New => In Progress ** Changed in: nova Assignee: John Garbutt (johngarbutt) => Stephen Finucane (stephenfinucane) -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1545675 Title: Resizing a pinned VM results in inconsistent state Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) mitaka series: In Progress Bug description: It appears that executing certain resize operations on a pinned instance results in inconsistencies in the "state machine" that Nova uses to track instances. This was identified using Tempest and manifests itself in failures in follow up shelve/unshelve operations. --- # Steps Testing was conducted on host containing a single-node, Fedora 23-based (4.3.5-300.fc23.x86_64) OpenStack instance (built with DevStack). The '12d224e' commit of Nova was used. The Tempest tests (commit 'e913b82') were run using modified flavors, as seen below: nova flavor-create m1.small_nfv 420 2048 0 2 nova flavor-create m1.medium_nfv 840 4096 0 4 nova flavor-key 420 set "hw:numa_nodes=2" nova flavor-key 840 set "hw:numa_nodes=2" nova flavor-key 420 set "hw:cpu_policy=dedicated" nova flavor-key 840 set "hw:cpu_policy=dedicated" cd $TEMPEST_DIR cp etc/tempest.conf etc/tempest.conf.orig sed -i "s/flavor_ref = .*/flavor_ref = 420/" etc/tempest.conf sed -i "s/flavor_ref_alt = .*/flavor_ref_alt = 840/" etc/tempest.conf Tests were run in the order given below. 1. tempest.scenario.test_shelve_instance.TestShelveInstance.test_shelve_instance 2. tempest.api.compute.servers.test_server_actions.ServerActionsTestJSON.test_shelve_unshelve_server 3. tempest.api.compute.servers.test_server_actions.ServerActionsTestJSON.test_resize_server_revert 4. tempest.scenario.test_shelve_instance.TestShelveInstance.test_shelve_instance 5. tempest.api.compute.servers.test_server_actions.ServerActionsTestJSON.test_shelve_unshelve_server Like so: ./run_tempest.sh -- tempest.scenario.test_shelve_instance.TestShelveInstance.test_shelve_instance # Expected Result The tests should pass. # Actual Result +---+--++ | # | test id | status | +---+--++ | 1 | 1164e700-0af0-4a4c-8792-35909a88743c | ok | | 2 | 77eba8e0-036e-4635-944b-f7a8f3b78dc9 | ok | | 3 | c03aab19-adb1-44f5-917d-c419577e9e68 | ok | | 4 | 1164e700-0af0-4a4c-8792-35909a88743c | FAIL | | 5 | c03aab19-adb1-44f5-917d-c419577e9e68 | ok* | * this test reports as passing but is actually generating errors. Bad test! :) One test fails while the other "passes" but raises errors. The failures, where raised, are CPUPinningInvalid exceptions: CPUPinningInvalid: Cannot pin/unpin cpus [1] from the following pinned set [0, 25] **NOTE:** I also think there are issues with the non-reverted resize test, though I've yet to investigate this: * tempest.scenario.test_server_advanced_ops.TestServerAdvancedOps.test_resize_server_confirm What's worse, this error "snowballs" on successive runs. Because of the nature of the failure (a failure to pin/unpin CPUs), we're left with a list of CPUs that Nova thinks to be pinned but which are no longer actually used. This is reflected by the resource tracker. $ openstack server list $ cat /opt/stack/logs/screen/n-cpu.log | grep 'Total usable vcpus' | tail -1 *snip* INFO nova.compute.resource_tracker [*snip*] Total usable vcpus: 40, total allocated vcpus: 8 The error messages for both are given below, along with examples of this "snowballing" CPU list: {0} tempest.scenario.test_shelve_instance.TestShelveInstance.test_shelve_instance [36.713046s] ... FAILED Setting instance vm_state to ERROR Traceback (most recent call last): File "/opt/stack/nova/nova/compute/manager.py", line 2474, in do_terminate_instance self._delete_instance(context, instance, bdms, quotas) File "/opt/stack/nova/nova/hooks.py", line 149, in inner rv = f(*args, **kwargs) File "/opt/stack/nova/nova/compute/manager.py", line 2437, in _delete_instance quotas.rollback() File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__ self.force_reraise() File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise six.reraise(self.type_, self.value, self.tb) File "/opt/stack/nova/nova/compute/manager.py", line 2432, in
[Yahoo-eng-team] [Bug 1545675] Re: Resizing a pinned VM results in inconsistent state
Reviewed: https://review.openstack.org/281483 Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=c7a6673fd5621d1c121c20376634ec49644fae59 Submitter: Jenkins Branch:master commit c7a6673fd5621d1c121c20376634ec49644fae59 Author: Nikola DipanovDate: Wed Feb 17 19:27:36 2016 + RT: aborting claims clears instance host and NUMA info When the claim is aborted, this information is no longer correct for the instance, so we clear it to avoid inconsistencies. Change-Id: I83a5f06adb22c21392d5fc867728181ea4b0454d Resolves-bug: 1545675 ** Changed in: nova Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1545675 Title: Resizing a pinned VM results in inconsistent state Status in OpenStack Compute (nova): Fix Released Bug description: It appears that executing certain resize operations on a pinned instance results in inconsistencies in the "state machine" that Nova uses to track instances. This was identified using Tempest and manifests itself in failures in follow up shelve/unshelve operations. --- # Steps Testing was conducted on host containing a single-node, Fedora 23-based (4.3.5-300.fc23.x86_64) OpenStack instance (built with DevStack). The '12d224e' commit of Nova was used. The Tempest tests (commit 'e913b82') were run using modified flavors, as seen below: nova flavor-create m1.small_nfv 420 2048 0 2 nova flavor-create m1.medium_nfv 840 4096 0 4 nova flavor-key 420 set "hw:numa_nodes=2" nova flavor-key 840 set "hw:numa_nodes=2" nova flavor-key 420 set "hw:cpu_policy=dedicated" nova flavor-key 840 set "hw:cpu_policy=dedicated" cd $TEMPEST_DIR cp etc/tempest.conf etc/tempest.conf.orig sed -i "s/flavor_ref = .*/flavor_ref = 420/" etc/tempest.conf sed -i "s/flavor_ref_alt = .*/flavor_ref_alt = 840/" etc/tempest.conf Tests were run in the order given below. 1. tempest.scenario.test_shelve_instance.TestShelveInstance.test_shelve_instance 2. tempest.api.compute.servers.test_server_actions.ServerActionsTestJSON.test_shelve_unshelve_server 3. tempest.api.compute.servers.test_server_actions.ServerActionsTestJSON.test_resize_server_revert 4. tempest.scenario.test_shelve_instance.TestShelveInstance.test_shelve_instance 5. tempest.api.compute.servers.test_server_actions.ServerActionsTestJSON.test_shelve_unshelve_server Like so: ./run_tempest.sh -- tempest.scenario.test_shelve_instance.TestShelveInstance.test_shelve_instance # Expected Result The tests should pass. # Actual Result +---+--++ | # | test id | status | +---+--++ | 1 | 1164e700-0af0-4a4c-8792-35909a88743c | ok | | 2 | 77eba8e0-036e-4635-944b-f7a8f3b78dc9 | ok | | 3 | c03aab19-adb1-44f5-917d-c419577e9e68 | ok | | 4 | 1164e700-0af0-4a4c-8792-35909a88743c | FAIL | | 5 | c03aab19-adb1-44f5-917d-c419577e9e68 | ok* | * this test reports as passing but is actually generating errors. Bad test! :) One test fails while the other "passes" but raises errors. The failures, where raised, are CPUPinningInvalid exceptions: CPUPinningInvalid: Cannot pin/unpin cpus [1] from the following pinned set [0, 25] **NOTE:** I also think there are issues with the non-reverted resize test, though I've yet to investigate this: * tempest.scenario.test_server_advanced_ops.TestServerAdvancedOps.test_resize_server_confirm What's worse, this error "snowballs" on successive runs. Because of the nature of the failure (a failure to pin/unpin CPUs), we're left with a list of CPUs that Nova thinks to be pinned but which are no longer actually used. This is reflected by the resource tracker. $ openstack server list $ cat /opt/stack/logs/screen/n-cpu.log | grep 'Total usable vcpus' | tail -1 *snip* INFO nova.compute.resource_tracker [*snip*] Total usable vcpus: 40, total allocated vcpus: 8 The error messages for both are given below, along with examples of this "snowballing" CPU list: {0} tempest.scenario.test_shelve_instance.TestShelveInstance.test_shelve_instance [36.713046s] ... FAILED Setting instance vm_state to ERROR Traceback (most recent call last): File "/opt/stack/nova/nova/compute/manager.py", line 2474, in do_terminate_instance self._delete_instance(context, instance, bdms, quotas) File "/opt/stack/nova/nova/hooks.py", line 149, in inner rv = f(*args, **kwargs) File "/opt/stack/nova/nova/compute/manager.py", line 2437, in _delete_instance quotas.rollback() File