It looks like the _destroy_evacuated_instances method in the compute
manager has always filtered migrations on the 'accepted' status since
originally this code was just meant to cleanup local resources once an
evacuation from the source host has started, which is fine. The problem
is with removing the source node allocations if the evacuation failed,
but if it failed in the conductor, we can fix that here:

https://review.openstack.org/#/c/499237/

If it failed in the destination compute service, the migration status
should be set to 'failed' and the migration filter in
_destroy_evacuated_instances would filter it out.

** Also affects: nova/ocata
   Importance: Undecided
       Status: New

** Also affects: nova/pike
   Importance: Undecided
       Status: New

** Changed in: nova
   Importance: Undecided => High

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1713783

Title:
  After failed evacuation the recovered source compute tries to delete
  the instance

Status in OpenStack Compute (nova):
  Triaged
Status in OpenStack Compute (nova) newton series:
  Triaged
Status in OpenStack Compute (nova) ocata series:
  Triaged
Status in OpenStack Compute (nova) pike series:
  Triaged

Bug description:
  Description
  ===========
  In case of a failed evacuation attempt the status of the migration is 
'accepted' instead of 'failed' so when source compute is recovered the compute 
manager tries to delete the instance from the source host. However a secondary 
fault prevents deleting the allocation in placement so the actual deletion of 
the instance fails as well.

  Steps to reproduce
  ==================
  The following functional test reproduces the bug: 
https://review.openstack.org/#/c/498482/
  What it does: initiate evacuation when no valid host is available and 
evacuation fails, but nova manager still tries to delete the instance.
  Logs:

      2017-08-29 19:11:15,751 ERROR [oslo_messaging.rpc.server] Exception 
during message handling
      NoValidHost: No valid host was found. There are not enough hosts 
available.
      2017-08-29 19:11:16,103 INFO [nova.tests.functional.test_servers] Running 
periodic for compute1 (host1)
      2017-08-29 19:11:16,115 INFO [nova.api.openstack.placement.requestlog] 
127.0.0.1 "GET 
/placement/resource_providers/4e8e23ff-0c52-4cf7-8356-d9fa88536316/aggregates" 
status: 200 len: 18 microversion: 1.1
      2017-08-29 19:11:16,120 INFO [nova.api.openstack.placement.requestlog] 
127.0.0.1 "GET 
/placement/resource_providers/4e8e23ff-0c52-4cf7-8356-d9fa88536316/inventories" 
status: 200 len: 401 microversion: 1.0
      2017-08-29 19:11:16,131 INFO [nova.api.openstack.placement.requestlog] 
127.0.0.1 "GET 
/placement/resource_providers/4e8e23ff-0c52-4cf7-8356-d9fa88536316/allocations" 
status: 200 len: 152 microversion: 1.0
      2017-08-29 19:11:16,138 INFO [nova.compute.resource_tracker] Final 
resource view: name=host1 phys_ram=8192MB used_ram=1024MB phys_disk=1028GB 
used_disk=1GB total_vcpus=10 used_vcpus=1 pci_stats=[]
      2017-08-29 19:11:16,146 INFO [nova.api.openstack.placement.requestlog] 
127.0.0.1 "GET 
/placement/resource_providers/4e8e23ff-0c52-4cf7-8356-d9fa88536316/aggregates" 
status: 200 len: 18 microversion: 1.1
      2017-08-29 19:11:16,151 INFO [nova.api.openstack.placement.requestlog] 
127.0.0.1 "GET 
/placement/resource_providers/4e8e23ff-0c52-4cf7-8356-d9fa88536316/inventories" 
status: 200 len: 401 microversion: 1.0
      2017-08-29 19:11:16,152 INFO [nova.tests.functional.test_servers] Running 
periodic for compute2 (host2)
      2017-08-29 19:11:16,163 INFO [nova.api.openstack.placement.requestlog] 
127.0.0.1 "GET 
/placement/resource_providers/531b1ce8-def1-455d-95b3-4140665d956f/aggregates" 
status: 200 len: 18 microversion: 1.1
      2017-08-29 19:11:16,168 INFO [nova.api.openstack.placement.requestlog] 
127.0.0.1 "GET 
/placement/resource_providers/531b1ce8-def1-455d-95b3-4140665d956f/inventories" 
status: 200 len: 401 microversion: 1.0
      2017-08-29 19:11:16,176 INFO [nova.api.openstack.placement.requestlog] 
127.0.0.1 "GET 
/placement/resource_providers/531b1ce8-def1-455d-95b3-4140665d956f/allocations" 
status: 200 len: 54 microversion: 1.0
      2017-08-29 19:11:16,184 INFO [nova.compute.resource_tracker] Final 
resource view: name=host2 phys_ram=8192MB used_ram=512MB phys_disk=1028GB 
used_disk=0GB total_vcpus=10 used_vcpus=0 pci_stats=[]
      2017-08-29 19:11:16,192 INFO [nova.api.openstack.placement.requestlog] 
127.0.0.1 "GET 
/placement/resource_providers/531b1ce8-def1-455d-95b3-4140665d956f/aggregates" 
status: 200 len: 18 microversion: 1.1
      2017-08-29 19:11:16,197 INFO [nova.api.openstack.placement.requestlog] 
127.0.0.1 "GET 
/placement/resource_providers/531b1ce8-def1-455d-95b3-4140665d956f/inventories" 
status: 200 len: 401 microversion: 1.0
      2017-08-29 19:11:16,198 INFO [nova.tests.functional.test_servers] 
Finished with periodics
      2017-08-29 19:11:16,255 INFO [nova.api.openstack.requestlog] 127.0.0.1 
"GET 
/v2.1/6f70656e737461636b20342065766572/servers/5058200c-478e-4449-88c1-906fdd572662"
 status: 200 len: 1875 microversion: 2.53 time: 0.056198
      2017-08-29 19:11:16,262 INFO [nova.api.openstack.requestlog] 127.0.0.1 
"GET /v2.1/6f70656e737461636b20342065766572/os-migrations" status: 200 len: 373 
microversion: 2.53 time: 0.004618
      2017-08-29 19:11:16,280 INFO [nova.api.openstack.requestlog] 127.0.0.1 
"PUT 
/v2.1/6f70656e737461636b20342065766572/os-services/c269bc74-4720-4de4-a6e5-889080b892a0"
 status: 200 len: 245 microversion: 2.53 time: 0.016442
      2017-08-29 19:11:16,281 INFO [nova.service] Starting compute node 
(version 16.0.0)
      2017-08-29 19:11:16,296 INFO [nova.compute.manager] Deleting instance as 
it has been evacuated from this host

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1713783/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

Reply via email to