[Yahoo-eng-team] [Bug 1561357] [NEW] VM deployed with availability-zone (force_hosts) cannot be live migrated

Taylor Peoples Wed, 23 Mar 2016 23:56:25 -0700

Public bug reported:

Steps:
1) Deploy a VM to a specific host using availability zones (i.e., do a targeted 
deploy).
2) Attempt to live migrate the VM from (1) letting the scheduler decide what 
host to live migrate to (i.e., do an untargeted live migration).


Outcome:
The live migration will always fail.

Version: mitaka

This is happening because of the following recent change:
https://github.com/openstack/nova/commit/111a852e79f0d9e54228d8e2724dc4183f737397.
The recent change pulls the request spec from the originak deploy from
the DB and uses it for the live migration. Since the initial deploy of
the VM was targeted, the request spec object saved in the DB has the
"force_hosts" field set to a specific host. Part of the live migrate
flow will set the "ignore_hosts" field of said request spec object to
said specific host since it doesn't make sense to live migrate to the
source host. This results in unsolvable constraints for the scheduler.

nova/compute/api.py::live_migrate():
    ...
    try:
        request_spec = objects.RequestSpec.get_by_instance_uuid(     
<----------------------- this fetches the request spec from the DB, which will 
have force_hosts set
            context, instance.uuid)
    ...
    self.compute_task_api.live_migrate_instance(context, instance,
        host_name, block_migration=block_migration,
        disk_over_commit=disk_over_commit,
        request_spec=request_spec)

After a lot of API plumbing, the flow ends up in 
nova/conductor/tasks/live_migrate.py::_find_destination():
    ...
    attempted_hosts = [self.source]
    ...
    host = None
    while host is None:
        ...
        request_spec.ignore_hosts = attempted_hosts 
<---------------------------------- we're setting the source host to 
"ignore_hosts" field
        try:
            host = self.scheduler_client.select_destinations(self.context, 
request_spec)[0]['host']  <------------------------ we're passing an unsolvable 
request_spec to the scheduler now, which will never find a valid host to 
migrate to

Example on a multi-node (2) devstack environment:

stack@controller:~/devstack$ nova boot tdp-server --image 13a9f724-36ef-
46ae-896d-f4f003ac1a10 --flavor m1.tiny --availability-zone nova:host613

stack@controller:~/devstack$ nova list --fields name,status,OS-EXT-SRV-ATTR:host
+--------------------------------------+------------+--------+-----------------------+
| ID                                   | Name       | Status | OS-EXT-SRV-ATTR: 
Host |
+--------------------------------------+------------+--------+-----------------------+
| a9fe19e4-5528-40f2-af08-031eaf4c33a6 | tdp-server | ACTIVE | host613         |
+--------------------------------------+------------+--------+-----------------------+

mysql> select spec from request_specs where 
instance_uuid="a9fe19e4-5528-40f2-af08-031eaf4c33a6";
{  
    ...
    "nova_object.name":"RequestSpec",
    "nova_object.data":{  
        "instance_uuid":"a9fe19e4-5528-40f2-af08-031eaf4c33a6",
        ...,
        "availability_zone":"nova",
        "force_nodes":null,
        ...,
        "force_hosts":[  
            "host613"
        ],
        "ignore_hosts":null,
        ...,
        "scheduler_hints":{}
    },
        ...
}

stack@controller:~/devstack$ nova live-migration tdp-server
ERROR (BadRequest): No valid host was found. There are not enough hosts 
available. (HTTP 400) (Request-ID: req-78725630-e87b-426c-a4f6-dc31f9c08223)

/opt/stack/logs/n-sch.log:2016-03-24 02:25:27.515 INFO 
nova.scheduler.host_manager [req-78725630-e87b-426c-a4f6-dc31f9c08223 admin 
admin] Host filter ignoring hosts: host613
...
/opt/stack/logs/n-sch.log:2016-03-24 02:25:27.515 INFO 
nova.scheduler.host_manager [req-78725630-e87b-426c-a4f6-dc31f9c08223 admin 
admin] No hosts matched due to not matching 'force_hosts' value of 'host613'

This is breaking previous behavior - the force_hosts field was not
"sticky" in that it did not prevent the scheduler from moving the VM to
another host after initial deploy. It previously only forced the initial
deploy to go to a specific host.

Two possible fixes come to mind:

1) Do not save the force_hosts field in the DB. This may have unintended 
consequences that I have not thought through.
2) Remove the force_hosts field from the request_spec object that is used for 
the live migration task.

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1561357

Title:
  VM deployed with availability-zone (force_hosts) cannot be live
  migrated

Status in OpenStack Compute (nova):
  New

Bug description:
  Steps:
  1) Deploy a VM to a specific host using availability zones (i.e., do a 
targeted deploy).
  2) Attempt to live migrate the VM from (1) letting the scheduler decide what 
host to live migrate to (i.e., do an untargeted live migration).

  Outcome:
  The live migration will always fail.

  Version: mitaka

  This is happening because of the following recent change:
  
https://github.com/openstack/nova/commit/111a852e79f0d9e54228d8e2724dc4183f737397.
  The recent change pulls the request spec from the originak deploy from
  the DB and uses it for the live migration. Since the initial deploy of
  the VM was targeted, the request spec object saved in the DB has the
  "force_hosts" field set to a specific host. Part of the live migrate
  flow will set the "ignore_hosts" field of said request spec object to
  said specific host since it doesn't make sense to live migrate to the
  source host. This results in unsolvable constraints for the scheduler.

  nova/compute/api.py::live_migrate():
      ...
      try:
          request_spec = objects.RequestSpec.get_by_instance_uuid(     
<----------------------- this fetches the request spec from the DB, which will 
have force_hosts set
              context, instance.uuid)
      ...
      self.compute_task_api.live_migrate_instance(context, instance,
          host_name, block_migration=block_migration,
          disk_over_commit=disk_over_commit,
          request_spec=request_spec)

  After a lot of API plumbing, the flow ends up in 
nova/conductor/tasks/live_migrate.py::_find_destination():
      ...
      attempted_hosts = [self.source]
      ...
      host = None
      while host is None:
          ...
          request_spec.ignore_hosts = attempted_hosts 
<---------------------------------- we're setting the source host to 
"ignore_hosts" field
          try:
              host = self.scheduler_client.select_destinations(self.context, 
request_spec)[0]['host']  <------------------------ we're passing an unsolvable 
request_spec to the scheduler now, which will never find a valid host to 
migrate to

  Example on a multi-node (2) devstack environment:

  stack@controller:~/devstack$ nova boot tdp-server --image 13a9f724
  -36ef-46ae-896d-f4f003ac1a10 --flavor m1.tiny --availability-zone
  nova:host613

  stack@controller:~/devstack$ nova list --fields 
name,status,OS-EXT-SRV-ATTR:host
  
+--------------------------------------+------------+--------+-----------------------+
  | ID                                   | Name       | Status | 
OS-EXT-SRV-ATTR: Host |
  
+--------------------------------------+------------+--------+-----------------------+
  | a9fe19e4-5528-40f2-af08-031eaf4c33a6 | tdp-server | ACTIVE | host613        
 |
  
+--------------------------------------+------------+--------+-----------------------+

  mysql> select spec from request_specs where 
instance_uuid="a9fe19e4-5528-40f2-af08-031eaf4c33a6";
  {  
      ...
      "nova_object.name":"RequestSpec",
      "nova_object.data":{  
          "instance_uuid":"a9fe19e4-5528-40f2-af08-031eaf4c33a6",
          ...,
          "availability_zone":"nova",
          "force_nodes":null,
          ...,
          "force_hosts":[  
              "host613"
          ],
          "ignore_hosts":null,
          ...,
          "scheduler_hints":{}
      },
        ...
  }

  stack@controller:~/devstack$ nova live-migration tdp-server
  ERROR (BadRequest): No valid host was found. There are not enough hosts 
available. (HTTP 400) (Request-ID: req-78725630-e87b-426c-a4f6-dc31f9c08223)

  /opt/stack/logs/n-sch.log:2016-03-24 02:25:27.515 INFO 
nova.scheduler.host_manager [req-78725630-e87b-426c-a4f6-dc31f9c08223 admin 
admin] Host filter ignoring hosts: host613
  ...
  /opt/stack/logs/n-sch.log:2016-03-24 02:25:27.515 INFO 
nova.scheduler.host_manager [req-78725630-e87b-426c-a4f6-dc31f9c08223 admin 
admin] No hosts matched due to not matching 'force_hosts' value of 'host613'

  This is breaking previous behavior - the force_hosts field was not
  "sticky" in that it did not prevent the scheduler from moving the VM
  to another host after initial deploy. It previously only forced the
  initial deploy to go to a specific host.

  Two possible fixes come to mind:

  1) Do not save the force_hosts field in the DB. This may have unintended 
consequences that I have not thought through.
  2) Remove the force_hosts field from the request_spec object that is used for 
the live migration task.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1561357/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1561357] [NEW] VM deployed with availability-zone (force_hosts) cannot be live migrated

Reply via email to