[Yahoo-eng-team] [Bug 1545675] Re: Resizing a pinned VM results in inconsistent state

2016-09-02 Thread Stephen Finucane
** Changed in: nova/mitaka
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1545675

Title:
  Resizing a pinned VM results in inconsistent state

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) mitaka series:
  Fix Released

Bug description:
  It appears that executing certain resize operations on a pinned
  instance results in inconsistencies in the "state machine" that Nova
  uses to track instances. This was identified using Tempest and
  manifests itself in failures in follow up shelve/unshelve operations.

  ---

  # Steps

  Testing was conducted on host containing a single-node, Fedora
  23-based (4.3.5-300.fc23.x86_64) OpenStack instance (built with
  DevStack). The '12d224e' commit of Nova was used. The Tempest tests
  (commit 'e913b82') were run using modified flavors, as seen below:

  nova flavor-create m1.small_nfv 420 2048 0 2
  nova flavor-create m1.medium_nfv 840 4096 0 4
  nova flavor-key 420 set "hw:numa_nodes=2"
  nova flavor-key 840 set "hw:numa_nodes=2"
  nova flavor-key 420 set "hw:cpu_policy=dedicated"
  nova flavor-key 840 set "hw:cpu_policy=dedicated"

  cd $TEMPEST_DIR
  cp etc/tempest.conf etc/tempest.conf.orig
  sed -i "s/flavor_ref = .*/flavor_ref = 420/" etc/tempest.conf
  sed -i "s/flavor_ref_alt = .*/flavor_ref_alt = 840/" etc/tempest.conf

  Tests were run in the order given below.

  1. 
tempest.scenario.test_shelve_instance.TestShelveInstance.test_shelve_instance
  2. 
tempest.api.compute.servers.test_server_actions.ServerActionsTestJSON.test_shelve_unshelve_server
  3. 
tempest.api.compute.servers.test_server_actions.ServerActionsTestJSON.test_resize_server_revert
  4. 
tempest.scenario.test_shelve_instance.TestShelveInstance.test_shelve_instance
  5. 
tempest.api.compute.servers.test_server_actions.ServerActionsTestJSON.test_shelve_unshelve_server

  Like so:

  ./run_tempest.sh --
  tempest.scenario.test_shelve_instance.TestShelveInstance.test_shelve_instance

  # Expected Result

  The tests should pass.

  # Actual Result

  +---+--++
  | # | test id  | status |
  +---+--++
  | 1 | 1164e700-0af0-4a4c-8792-35909a88743c |   ok   |
  | 2 | 77eba8e0-036e-4635-944b-f7a8f3b78dc9 |   ok   |
  | 3 | c03aab19-adb1-44f5-917d-c419577e9e68 |   ok   |
  | 4 | 1164e700-0af0-4a4c-8792-35909a88743c |  FAIL  |
  | 5 | c03aab19-adb1-44f5-917d-c419577e9e68 |   ok*  |

  * this test reports as passing but is actually generating errors. Bad
  test! :)

  One test fails while the other "passes" but raises errors. The
  failures, where raised, are CPUPinningInvalid exceptions:

  CPUPinningInvalid: Cannot pin/unpin cpus [1] from the following
  pinned set [0, 25]

  **NOTE:** I also think there are issues with the non-reverted resize
  test, though I've yet to investigate this:

  *
  
tempest.scenario.test_server_advanced_ops.TestServerAdvancedOps.test_resize_server_confirm

  What's worse, this error "snowballs" on successive runs. Because of
  the nature of the failure (a failure to pin/unpin CPUs), we're left
  with a list of CPUs that Nova thinks to be pinned but which are no
  longer actually used. This is reflected by the resource tracker.

  $ openstack server list

  $ cat /opt/stack/logs/screen/n-cpu.log | grep 'Total usable vcpus' | tail 
-1
  *snip* INFO nova.compute.resource_tracker [*snip*] Total usable vcpus: 
40, total allocated vcpus: 8

  The error messages for both are given below, along with examples of
  this "snowballing" CPU list:

  {0}
  tempest.scenario.test_shelve_instance.TestShelveInstance.test_shelve_instance
  [36.713046s] ... FAILED

   Setting instance vm_state to ERROR
   Traceback (most recent call last):
     File "/opt/stack/nova/nova/compute/manager.py", line 2474, in 
do_terminate_instance
   self._delete_instance(context, instance, bdms, quotas)
     File "/opt/stack/nova/nova/hooks.py", line 149, in inner
   rv = f(*args, **kwargs)
     File "/opt/stack/nova/nova/compute/manager.py", line 2437, in 
_delete_instance
   quotas.rollback()
     File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, 
in __exit__
   self.force_reraise()
     File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, 
in force_reraise
   six.reraise(self.type_, self.value, self.tb)
     File "/opt/stack/nova/nova/compute/manager.py", line 2432, in 
_delete_instance
   self._update_resource_tracker(context, instance)
     File "/opt/stack/nova/nova/compute/manager.py", line 751, in 
_update_resource_tracker
   rt.update_usage(context, instance)
     File 

[Yahoo-eng-team] [Bug 1545675] Re: Resizing a pinned VM results in inconsistent state

2016-08-02 Thread Matt Riedemann
** Also affects: nova/mitaka
   Importance: Undecided
   Status: New

** Changed in: nova/mitaka
 Assignee: (unassigned) => Stephen Finucane (stephenfinucane)

** Changed in: nova/mitaka
   Status: New => In Progress

** Changed in: nova
 Assignee: John Garbutt (johngarbutt) => Stephen Finucane (stephenfinucane)

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1545675

Title:
  Resizing a pinned VM results in inconsistent state

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) mitaka series:
  In Progress

Bug description:
  It appears that executing certain resize operations on a pinned
  instance results in inconsistencies in the "state machine" that Nova
  uses to track instances. This was identified using Tempest and
  manifests itself in failures in follow up shelve/unshelve operations.

  ---

  # Steps

  Testing was conducted on host containing a single-node, Fedora
  23-based (4.3.5-300.fc23.x86_64) OpenStack instance (built with
  DevStack). The '12d224e' commit of Nova was used. The Tempest tests
  (commit 'e913b82') were run using modified flavors, as seen below:

  nova flavor-create m1.small_nfv 420 2048 0 2
  nova flavor-create m1.medium_nfv 840 4096 0 4
  nova flavor-key 420 set "hw:numa_nodes=2"
  nova flavor-key 840 set "hw:numa_nodes=2"
  nova flavor-key 420 set "hw:cpu_policy=dedicated"
  nova flavor-key 840 set "hw:cpu_policy=dedicated"

  cd $TEMPEST_DIR
  cp etc/tempest.conf etc/tempest.conf.orig
  sed -i "s/flavor_ref = .*/flavor_ref = 420/" etc/tempest.conf
  sed -i "s/flavor_ref_alt = .*/flavor_ref_alt = 840/" etc/tempest.conf

  Tests were run in the order given below.

  1. 
tempest.scenario.test_shelve_instance.TestShelveInstance.test_shelve_instance
  2. 
tempest.api.compute.servers.test_server_actions.ServerActionsTestJSON.test_shelve_unshelve_server
  3. 
tempest.api.compute.servers.test_server_actions.ServerActionsTestJSON.test_resize_server_revert
  4. 
tempest.scenario.test_shelve_instance.TestShelveInstance.test_shelve_instance
  5. 
tempest.api.compute.servers.test_server_actions.ServerActionsTestJSON.test_shelve_unshelve_server

  Like so:

  ./run_tempest.sh --
  tempest.scenario.test_shelve_instance.TestShelveInstance.test_shelve_instance

  # Expected Result

  The tests should pass.

  # Actual Result

  +---+--++
  | # | test id  | status |
  +---+--++
  | 1 | 1164e700-0af0-4a4c-8792-35909a88743c |   ok   |
  | 2 | 77eba8e0-036e-4635-944b-f7a8f3b78dc9 |   ok   |
  | 3 | c03aab19-adb1-44f5-917d-c419577e9e68 |   ok   |
  | 4 | 1164e700-0af0-4a4c-8792-35909a88743c |  FAIL  |
  | 5 | c03aab19-adb1-44f5-917d-c419577e9e68 |   ok*  |

  * this test reports as passing but is actually generating errors. Bad
  test! :)

  One test fails while the other "passes" but raises errors. The
  failures, where raised, are CPUPinningInvalid exceptions:

  CPUPinningInvalid: Cannot pin/unpin cpus [1] from the following
  pinned set [0, 25]

  **NOTE:** I also think there are issues with the non-reverted resize
  test, though I've yet to investigate this:

  *
  
tempest.scenario.test_server_advanced_ops.TestServerAdvancedOps.test_resize_server_confirm

  What's worse, this error "snowballs" on successive runs. Because of
  the nature of the failure (a failure to pin/unpin CPUs), we're left
  with a list of CPUs that Nova thinks to be pinned but which are no
  longer actually used. This is reflected by the resource tracker.

  $ openstack server list

  $ cat /opt/stack/logs/screen/n-cpu.log | grep 'Total usable vcpus' | tail 
-1
  *snip* INFO nova.compute.resource_tracker [*snip*] Total usable vcpus: 
40, total allocated vcpus: 8

  The error messages for both are given below, along with examples of
  this "snowballing" CPU list:

  {0}
  tempest.scenario.test_shelve_instance.TestShelveInstance.test_shelve_instance
  [36.713046s] ... FAILED

   Setting instance vm_state to ERROR
   Traceback (most recent call last):
     File "/opt/stack/nova/nova/compute/manager.py", line 2474, in 
do_terminate_instance
   self._delete_instance(context, instance, bdms, quotas)
     File "/opt/stack/nova/nova/hooks.py", line 149, in inner
   rv = f(*args, **kwargs)
     File "/opt/stack/nova/nova/compute/manager.py", line 2437, in 
_delete_instance
   quotas.rollback()
     File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, 
in __exit__
   self.force_reraise()
     File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, 
in force_reraise
   six.reraise(self.type_, self.value, self.tb)
     File "/opt/stack/nova/nova/compute/manager.py", line 2432, in 

[Yahoo-eng-team] [Bug 1545675] Re: Resizing a pinned VM results in inconsistent state

2016-03-07 Thread OpenStack Infra
Reviewed:  https://review.openstack.org/281483
Committed: 
https://git.openstack.org/cgit/openstack/nova/commit/?id=c7a6673fd5621d1c121c20376634ec49644fae59
Submitter: Jenkins
Branch:master

commit c7a6673fd5621d1c121c20376634ec49644fae59
Author: Nikola Dipanov 
Date:   Wed Feb 17 19:27:36 2016 +

RT: aborting claims clears instance host and NUMA info

When the claim is aborted, this information is no longer correct for the
instance, so we clear it to avoid inconsistencies.

Change-Id: I83a5f06adb22c21392d5fc867728181ea4b0454d
Resolves-bug: 1545675


** Changed in: nova
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1545675

Title:
  Resizing a pinned VM results in inconsistent state

Status in OpenStack Compute (nova):
  Fix Released

Bug description:
  It appears that executing certain resize operations on a pinned
  instance results in inconsistencies in the "state machine" that Nova
  uses to track instances. This was identified using Tempest and
  manifests itself in failures in follow up shelve/unshelve operations.

  ---

  # Steps

  Testing was conducted on host containing a single-node, Fedora
  23-based (4.3.5-300.fc23.x86_64) OpenStack instance (built with
  DevStack). The '12d224e' commit of Nova was used. The Tempest tests
  (commit 'e913b82') were run using modified flavors, as seen below:

  nova flavor-create m1.small_nfv 420 2048 0 2
  nova flavor-create m1.medium_nfv 840 4096 0 4
  nova flavor-key 420 set "hw:numa_nodes=2"
  nova flavor-key 840 set "hw:numa_nodes=2"
  nova flavor-key 420 set "hw:cpu_policy=dedicated"
  nova flavor-key 840 set "hw:cpu_policy=dedicated"

  cd $TEMPEST_DIR
  cp etc/tempest.conf etc/tempest.conf.orig
  sed -i "s/flavor_ref = .*/flavor_ref = 420/" etc/tempest.conf
  sed -i "s/flavor_ref_alt = .*/flavor_ref_alt = 840/" etc/tempest.conf

  Tests were run in the order given below.

  1. 
tempest.scenario.test_shelve_instance.TestShelveInstance.test_shelve_instance
  2. 
tempest.api.compute.servers.test_server_actions.ServerActionsTestJSON.test_shelve_unshelve_server
  3. 
tempest.api.compute.servers.test_server_actions.ServerActionsTestJSON.test_resize_server_revert
  4. 
tempest.scenario.test_shelve_instance.TestShelveInstance.test_shelve_instance
  5. 
tempest.api.compute.servers.test_server_actions.ServerActionsTestJSON.test_shelve_unshelve_server

  Like so:

  ./run_tempest.sh --
  tempest.scenario.test_shelve_instance.TestShelveInstance.test_shelve_instance

  # Expected Result

  The tests should pass.

  # Actual Result

  +---+--++
  | # | test id  | status |
  +---+--++
  | 1 | 1164e700-0af0-4a4c-8792-35909a88743c |   ok   |
  | 2 | 77eba8e0-036e-4635-944b-f7a8f3b78dc9 |   ok   |
  | 3 | c03aab19-adb1-44f5-917d-c419577e9e68 |   ok   |
  | 4 | 1164e700-0af0-4a4c-8792-35909a88743c |  FAIL  |
  | 5 | c03aab19-adb1-44f5-917d-c419577e9e68 |   ok*  |

  * this test reports as passing but is actually generating errors. Bad
  test! :)

  One test fails while the other "passes" but raises errors. The
  failures, where raised, are CPUPinningInvalid exceptions:

  CPUPinningInvalid: Cannot pin/unpin cpus [1] from the following
  pinned set [0, 25]

  **NOTE:** I also think there are issues with the non-reverted resize
  test, though I've yet to investigate this:

  *
  
tempest.scenario.test_server_advanced_ops.TestServerAdvancedOps.test_resize_server_confirm

  What's worse, this error "snowballs" on successive runs. Because of
  the nature of the failure (a failure to pin/unpin CPUs), we're left
  with a list of CPUs that Nova thinks to be pinned but which are no
  longer actually used. This is reflected by the resource tracker.

  $ openstack server list

  $ cat /opt/stack/logs/screen/n-cpu.log | grep 'Total usable vcpus' | tail 
-1
  *snip* INFO nova.compute.resource_tracker [*snip*] Total usable vcpus: 
40, total allocated vcpus: 8

  The error messages for both are given below, along with examples of
  this "snowballing" CPU list:

  {0}
  tempest.scenario.test_shelve_instance.TestShelveInstance.test_shelve_instance
  [36.713046s] ... FAILED

   Setting instance vm_state to ERROR
   Traceback (most recent call last):
     File "/opt/stack/nova/nova/compute/manager.py", line 2474, in 
do_terminate_instance
   self._delete_instance(context, instance, bdms, quotas)
     File "/opt/stack/nova/nova/hooks.py", line 149, in inner
   rv = f(*args, **kwargs)
     File "/opt/stack/nova/nova/compute/manager.py", line 2437, in 
_delete_instance
   quotas.rollback()
     File