[Yahoo-eng-team] [Bug 1741307] [NEW] Resize always fails when using the CachingScheduler

Matt Riedemann Thu, 04 Jan 2018 09:51:01 -0800

Public bug reported:

This is split off from bug 1741125 which is more about reschedules
failing.


Resize / cold migrate with the CachingScheduler simply doesn't work
because conductor assumes the scheduler created allocations for the
instance and tries to swap them to the migration record, which fails
here:

https://github.com/openstack/nova/blob/397dcab684b87eba257ccbe4b24a692deb72c13d/nova/conductor/tasks/migrate.py#L53

The instance won't have an allocation on the source node created by the
scheduler if using the CachingScheduler because the CachingScheduler
doesn't use Placement, and once all computes are upgraded to at least
Pike, the computes no longer create allocations in Placement either
(because they assume the scheduler is going to do that).

So in this case, we basically need to just log something and continue
without swapping allocations.

The compute manager code should be OK since it just no-ops if the
migration record doesn't have an allocation:

https://github.com/openstack/nova/blob/397dcab684b87eba257ccbe4b24a692deb72c13d/nova/compute/manager.py#L3965

That will, unfortunately, eventually lead to the compute asking the
resource tracker to remove the allocation for the instance which won't
exist either and we'll get an ERROR in the logs:

https://github.com/openstack/nova/blob/397dcab684b87eba257ccbe4b24a692deb72c13d/nova/compute/manager.py#L4103

https://github.com/openstack/nova/blob/397dcab684b87eba257ccbe4b24a692deb72c13d/nova/compute/resource_tracker.py#L1339

** Affects: nova
     Importance: High
     Assignee: Matt Riedemann (mriedem)
         Status: Triaged


** Tags: cachingscheduler queens-rc-potential resize

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1741307

Title:
  Resize always fails when using the CachingScheduler

Status in OpenStack Compute (nova):
  Triaged

Bug description:
  This is split off from bug 1741125 which is more about reschedules
  failing.

  Resize / cold migrate with the CachingScheduler simply doesn't work
  because conductor assumes the scheduler created allocations for the
  instance and tries to swap them to the migration record, which fails
  here:

  
https://github.com/openstack/nova/blob/397dcab684b87eba257ccbe4b24a692deb72c13d/nova/conductor/tasks/migrate.py#L53

  The instance won't have an allocation on the source node created by
  the scheduler if using the CachingScheduler because the
  CachingScheduler doesn't use Placement, and once all computes are
  upgraded to at least Pike, the computes no longer create allocations
  in Placement either (because they assume the scheduler is going to do
  that).

  So in this case, we basically need to just log something and continue
  without swapping allocations.

  The compute manager code should be OK since it just no-ops if the
  migration record doesn't have an allocation:

  
https://github.com/openstack/nova/blob/397dcab684b87eba257ccbe4b24a692deb72c13d/nova/compute/manager.py#L3965

  That will, unfortunately, eventually lead to the compute asking the
  resource tracker to remove the allocation for the instance which won't
  exist either and we'll get an ERROR in the logs:

  
https://github.com/openstack/nova/blob/397dcab684b87eba257ccbe4b24a692deb72c13d/nova/compute/manager.py#L4103

  
https://github.com/openstack/nova/blob/397dcab684b87eba257ccbe4b24a692deb72c13d/nova/compute/resource_tracker.py#L1339

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1741307/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to     : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1741307] [NEW] Resize always fails when using the CachingScheduler

Reply via email to