Reviewed:  https://review.opendev.org/706868
Committed: 
https://git.openstack.org/cgit/openstack/nova/commit/?id=e65d4a131a7ebc02261f5df69fa1b394a502f268
Submitter: Zuul
Branch:    master

commit e65d4a131a7ebc02261f5df69fa1b394a502f268
Author: Balazs Gibizer <[email protected]>
Date:   Mon Feb 10 15:48:04 2020 +0100

    Clean up allocation if unshelve fails due to neutron
    
    When port binding update fails during unshelve of a shelve offloaded
    instance compute manager has to catch the exception and clean up the
    destination host allocation.
    
    Change-Id: I4c3fbb213e023ac16efc0b8561f975a659311684
    Closes-Bug: #1862633


** Changed in: nova
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1862633

Title:
  unshelve leak allocation if update port fails

Status in OpenStack Compute (nova):
  Fix Released

Bug description:
  If updating the port binding during unshelve of an offloaded server
  fails then nova leaks placement allocation.

  Steps to reproduce
  ==================
  1) boot a server with a neutron port
  2) shelve and offload the server
  3) disable the original host of the server to force scheduling during 
unshelve to select a differetn host. This is important as it triggers a non 
empty port update during unshelve
  4) unshelve the server and inject network fault in the communication between 
nova and neutron. You can try to simply shut down neutron-server at the right 
moment as well. Right means just before the target compute tries to send the 
port update
  5) observer that the unshelve fails, the server goes back to offloaded state, 
but the placement allocation on the target host remains.

  Triage: the problem is cause by a missing fault handling code in the
  compute manager[1]. The compute manager has proper error handling if
  the unshelve fails in the virt driver spawn call, but it does not
  handle failure if the neutron communication fails. The compute manager
  method simply logs and re-raises the neutron exceptions. This means
  that the exception is dropped as the unshelve_instance compute RPC is
  a cast.

  [1]
  
https://github.com/openstack/nova/blob/1fcd74730d343b7cee12a0a50ea537dc4ff87f65/nova/compute/manager.py#L6473

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1862633/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

Reply via email to