Reviewed: https://review.openstack.org/488510 Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=5390210a4fa46e2af6b6aec9b41c03147b52760c Submitter: Jenkins Branch: master
commit 5390210a4fa46e2af6b6aec9b41c03147b52760c Author: Jay Pipes <[email protected]> Date: Wed Aug 2 17:48:38 2017 -0400 Remove provider allocs in confirm/revert resize Now that the scheduler creates a doubled-up allocation for the duration of a move operation (with part of the allocation referring to the source and part referring to the destination host), we need to remove the source provider when confirming the resize and remove the destination provider from the allocation when reverting a resize. This patch adds this logic in the RT's drop_move_claim() method. Change-Id: I6f8afe6680f83125da9381c812016b3623503825 Co-Authored-By: Dan Smith <[email protected]> Fixes-bug: #1707071 ** Changed in: nova Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1707071 Title: Compute nodes will fight over allocations during migration Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) ocata series: Confirmed Bug description: As far back as Ocata, compute nodes that manage allocations will end up overwriting allocations from other compute nodes when doing a migration. This stems from the fact that the Resource Tracker was designed to manage a per-compute-node set of accounting, but placement is per-instance accounting. When we try to create/update/delete allocations for instances on compute nodes from the existing resource tracker code paths, we end up deleting allocations that apply to other compute nodes in the process. For example, when an instance A is running against compute1, there is an allocation for its resources against that node. When migrating that instance to compute2, the target compute (or scheduler) may create allocations for instance A against compute2, which overwrite those for compute1. Then, compute1's periodic healing task runs, and deletes the allocation for instance A against compute2, replacing it with one for compute1. When migration completes, compute2 heals again and overwrites the allocation with one for the new home of the instance. Then, compute1 may delete the allocation it thinks it owns, followed finally by another heal on compute2. While this is going on, the scheduler (via placement) does not have a consistent view of resources to make proper decisions. In order to fix this, we need a combination of changes: 1. There should be allocations against both compute nodes for an instance during a migration 2. Compute nodes should respect the double claim, and not delete allocations for instances it used to own, if the allocation has no resources for its resource provider 3. Compute nodes should not delete allocations for instances unless they own the instance _and_ the instance is in DELETED/SHELVED_OFFLOADED state To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1707071/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : [email protected] Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp

