Reviewed: https://review.openstack.org/297387 Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=446d15568e00a483d909dc5c565d70baf29179ad Submitter: Jenkins Branch: master
commit 446d15568e00a483d909dc5c565d70baf29179ad Author: Sylvain Bauza <[email protected]> Date: Thu Mar 24 23:07:54 2016 +0100 Stop providing force_hosts to the scheduler for move ops Since now we provide the original RequestSpec for move operations (unshelve, live-migrate and evacuate), it can also provide the original force_hosts/nodes to the scheduler. In that case, it means that if an admin was asking to boot an instance forcing to an host, a later move operation could then give again the forced value and then wouldn't permit to get a different destination which is an issue. TBH, that is not a problem for live-migrate and evacuate that do provide an optional host value (which bypasses then the scheduler) but since unshelve is not having this optional value, it would mean that we could only unshelve an forced instance to the same host. Change-Id: I03c22ff757d0ee1da9d69fa48cc4bdd036e6b13f Closes-Bug: #1561357 ** Changed in: nova Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1561357 Title: VM deployed with availability-zone (force_hosts) cannot be live migrated to an untargeted host Status in OpenStack Compute (nova): Fix Released Bug description: Steps: 1) Deploy a VM to a specific host using availability zones (i.e., do a targeted deploy). 2) Attempt to live migrate the VM from (1) letting the scheduler decide what host to live migrate to (i.e., do an untargeted live migration). Outcome: The live migration will always fail. Version: mitaka This is happening because of the following recent change: https://github.com/openstack/nova/commit/111a852e79f0d9e54228d8e2724dc4183f737397. The recent change pulls the request spec from the originak deploy from the DB and uses it for the live migration. Since the initial deploy of the VM was targeted, the request spec object saved in the DB has the "force_hosts" field set to a specific host. Part of the live migrate flow will set the "ignore_hosts" field of said request spec object to said specific host since it doesn't make sense to live migrate to the source host. This results in unsolvable constraints for the scheduler. nova/compute/api.py::live_migrate(): ... try: request_spec = objects.RequestSpec.get_by_instance_uuid( <----------------------- this fetches the request spec from the DB, which will have force_hosts set context, instance.uuid) ... self.compute_task_api.live_migrate_instance(context, instance, host_name, block_migration=block_migration, disk_over_commit=disk_over_commit, request_spec=request_spec) After a lot of API plumbing, the flow ends up in nova/conductor/tasks/live_migrate.py::_find_destination(): ... attempted_hosts = [self.source] ... host = None while host is None: ... request_spec.ignore_hosts = attempted_hosts <---------------------------------- we're setting the source host to "ignore_hosts" field try: host = self.scheduler_client.select_destinations(self.context, request_spec)[0]['host'] <------------------------ we're passing an unsolvable request_spec to the scheduler now, which will never find a valid host to migrate to Example on a multi-node (2) devstack environment: stack@controller:~/devstack$ nova boot tdp-server --image 13a9f724 -36ef-46ae-896d-f4f003ac1a10 --flavor m1.tiny --availability-zone nova:host613 stack@controller:~/devstack$ nova list --fields name,status,OS-EXT-SRV-ATTR:host +--------------------------------------+------------+--------+-----------------------+ | ID | Name | Status | OS-EXT-SRV-ATTR: Host | +--------------------------------------+------------+--------+-----------------------+ | a9fe19e4-5528-40f2-af08-031eaf4c33a6 | tdp-server | ACTIVE | host613 | +--------------------------------------+------------+--------+-----------------------+ mysql> select spec from request_specs where instance_uuid="a9fe19e4-5528-40f2-af08-031eaf4c33a6"; { ... "nova_object.name":"RequestSpec", "nova_object.data":{ "instance_uuid":"a9fe19e4-5528-40f2-af08-031eaf4c33a6", ..., "availability_zone":"nova", "force_nodes":null, ..., "force_hosts":[ "host613" ], "ignore_hosts":null, ..., "scheduler_hints":{} }, ... } stack@controller:~/devstack$ nova live-migration tdp-server ERROR (BadRequest): No valid host was found. There are not enough hosts available. (HTTP 400) (Request-ID: req-78725630-e87b-426c-a4f6-dc31f9c08223) /opt/stack/logs/n-sch.log:2016-03-24 02:25:27.515 INFO nova.scheduler.host_manager [req-78725630-e87b-426c-a4f6-dc31f9c08223 admin admin] Host filter ignoring hosts: host613 ... /opt/stack/logs/n-sch.log:2016-03-24 02:25:27.515 INFO nova.scheduler.host_manager [req-78725630-e87b-426c-a4f6-dc31f9c08223 admin admin] No hosts matched due to not matching 'force_hosts' value of 'host613' This is breaking previous behavior - the force_hosts field was not "sticky" in that it did not prevent the scheduler from moving the VM to another host after initial deploy. It previously only forced the initial deploy to go to a specific host. Two possible fixes come to mind: 1) Do not save the force_hosts field in the DB. This may have unintended consequences that I have not thought through. 2) Remove the force_hosts field from the request_spec object that is used for the live migration task. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1561357/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : [email protected] Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp

