** Also affects: nova/pike Importance: Undecided Status: New ** Also affects: nova/stein Importance: Undecided Status: New
** Also affects: nova/train Importance: Undecided Status: New ** Also affects: nova/queens Importance: Undecided Status: New ** Also affects: nova/rocky Importance: Undecided Status: New ** Changed in: nova/stein Status: New => Fix Released ** Changed in: nova/train Status: New => Fix Committed ** Changed in: nova/rocky Status: New => Fix Committed ** Changed in: nova/pike Status: New => Fix Committed ** Changed in: nova/queens Status: New => Fix Committed -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1862633 Title: unshelve leak allocation if update port fails Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) pike series: Fix Committed Status in OpenStack Compute (nova) queens series: Fix Committed Status in OpenStack Compute (nova) rocky series: Fix Committed Status in OpenStack Compute (nova) stein series: Fix Released Status in OpenStack Compute (nova) train series: Fix Released Bug description: If updating the port binding during unshelve of an offloaded server fails then nova leaks placement allocation. Steps to reproduce ================== 1) boot a server with a neutron port 2) shelve and offload the server 3) disable the original host of the server to force scheduling during unshelve to select a differetn host. This is important as it triggers a non empty port update during unshelve 4) unshelve the server and inject network fault in the communication between nova and neutron. You can try to simply shut down neutron-server at the right moment as well. Right means just before the target compute tries to send the port update 5) observer that the unshelve fails, the server goes back to offloaded state, but the placement allocation on the target host remains. Triage: the problem is cause by a missing fault handling code in the compute manager[1]. The compute manager has proper error handling if the unshelve fails in the virt driver spawn call, but it does not handle failure if the neutron communication fails. The compute manager method simply logs and re-raises the neutron exceptions. This means that the exception is dropped as the unshelve_instance compute RPC is a cast. [1] https://github.com/openstack/nova/blob/1fcd74730d343b7cee12a0a50ea537dc4ff87f65/nova/compute/manager.py#L6473 To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1862633/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp