Reviewed: https://review.openstack.org/543971 Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=caf167862dd82e98f0189c9598856de57dfa7d35 Submitter: Zuul Branch: master
commit caf167862dd82e98f0189c9598856de57dfa7d35 Author: Claudiu Belu <cb...@cloudbasesolutions.com> Date: Sun Feb 11 10:07:52 2018 -0800 compute: Cleans up allocations after failed resize During cold resize, the ComputeManager's prep_resize calls the rpcapi.ComputeAPI's resize_instance method, which will then do an RPC cast (async). Because the RPC cast is asynchronous, the exception branch in prep_resize will not be executed if the cold resize failed, the allocations will not be cleaned up, and the instance will not be rescheduled. This patch adds allocation cleanup in the resize_instance and finish_resize methods. Change-Id: I2d9ab06b485f76550dbbff46f79f40ff4c97d12f Closes-Bug: #1749215 ** Changed in: nova Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1749215 Title: Allocations not deleted on failed resize_instance Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) ocata series: Confirmed Status in OpenStack Compute (nova) pike series: Confirmed Status in OpenStack Compute (nova) queens series: Confirmed Bug description: Description =========== During a resize, an instance's allocations are removed and replaced by 2 sets of allocations instead. If a resize is completed sucessfully, one set of allocations is correctly removed, but in case of a failure, neither set of allocations is removed. Only one set of allocations are removed if the instance is deleted. This happens because the call self.compute_rpcapi.resize_instance [1] is an RPC cast (async), instead of a call (sync). Because of this, the Except branch [2] in which the allocation is cleared and the instance is rescheduled, is never called. Additionally, because not all of the allocations are cleared, the resources on the compute nodes will become "locked" and unusable. At some point, instances will no longer be scheduled to those compute nodes, due to all the resources being "allocated". [1] https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L4085 [2] https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L4123 Steps to reproduce ================== * Spawn an instance. * Observe that the table nova_api.allocations only has 1 set of allocations for the instance (VCPU, MEMORY, DISK). * Cold resize to an invalid flavor (e.g.: smaller disk). * Observe that the table nova_api.allocations has 2 sets of allocations for the instance. * Observe that the cold resize failed, and that the instance's task state has been reverted to its original state. * Observe that the table nova_api.allocations continues to have 2 sets of allocations. * Delete the instance. * Observe even after the instance has been destroyed, there is still 1 set of allocations for the instance. Expected result =============== After the cold resize failed, there should be only 1 set of allocations in the nova_api.allocations table, and after deleting the instance, there shouldn't be any. Actual result ============= After the cold resize failed, there are 2 sets of allocations in the nova_api.allocations table, after deleting the instance, there is 1 set of allocations. Environment =========== Branch: Queens Hypervisor: Hyper-V Server 2012 R2 (unrelated) To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1749215/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp