Public bug reported: This is split off from bug 1741125 which is more about reschedules failing.
Resize / cold migrate with the CachingScheduler simply doesn't work because conductor assumes the scheduler created allocations for the instance and tries to swap them to the migration record, which fails here: https://github.com/openstack/nova/blob/397dcab684b87eba257ccbe4b24a692deb72c13d/nova/conductor/tasks/migrate.py#L53 The instance won't have an allocation on the source node created by the scheduler if using the CachingScheduler because the CachingScheduler doesn't use Placement, and once all computes are upgraded to at least Pike, the computes no longer create allocations in Placement either (because they assume the scheduler is going to do that). So in this case, we basically need to just log something and continue without swapping allocations. The compute manager code should be OK since it just no-ops if the migration record doesn't have an allocation: https://github.com/openstack/nova/blob/397dcab684b87eba257ccbe4b24a692deb72c13d/nova/compute/manager.py#L3965 That will, unfortunately, eventually lead to the compute asking the resource tracker to remove the allocation for the instance which won't exist either and we'll get an ERROR in the logs: https://github.com/openstack/nova/blob/397dcab684b87eba257ccbe4b24a692deb72c13d/nova/compute/manager.py#L4103 https://github.com/openstack/nova/blob/397dcab684b87eba257ccbe4b24a692deb72c13d/nova/compute/resource_tracker.py#L1339 ** Affects: nova Importance: High Assignee: Matt Riedemann (mriedem) Status: Triaged ** Tags: cachingscheduler queens-rc-potential resize -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1741307 Title: Resize always fails when using the CachingScheduler Status in OpenStack Compute (nova): Triaged Bug description: This is split off from bug 1741125 which is more about reschedules failing. Resize / cold migrate with the CachingScheduler simply doesn't work because conductor assumes the scheduler created allocations for the instance and tries to swap them to the migration record, which fails here: https://github.com/openstack/nova/blob/397dcab684b87eba257ccbe4b24a692deb72c13d/nova/conductor/tasks/migrate.py#L53 The instance won't have an allocation on the source node created by the scheduler if using the CachingScheduler because the CachingScheduler doesn't use Placement, and once all computes are upgraded to at least Pike, the computes no longer create allocations in Placement either (because they assume the scheduler is going to do that). So in this case, we basically need to just log something and continue without swapping allocations. The compute manager code should be OK since it just no-ops if the migration record doesn't have an allocation: https://github.com/openstack/nova/blob/397dcab684b87eba257ccbe4b24a692deb72c13d/nova/compute/manager.py#L3965 That will, unfortunately, eventually lead to the compute asking the resource tracker to remove the allocation for the instance which won't exist either and we'll get an ERROR in the logs: https://github.com/openstack/nova/blob/397dcab684b87eba257ccbe4b24a692deb72c13d/nova/compute/manager.py#L4103 https://github.com/openstack/nova/blob/397dcab684b87eba257ccbe4b24a692deb72c13d/nova/compute/resource_tracker.py#L1339 To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1741307/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp