http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22Unable%20to%20find%20existing%20allocations%20for%20instance%5C%22%20AND%20tags%3A%5C%22screen-n
-super-cond.txt%5C%22&from=7d
** Also affects: nova/pike
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1729371
Title:
ResourceTracker races to delete instance allocations before instance
is mapped to a cell
Status in OpenStack Compute (nova):
Triaged
Status in OpenStack Compute (nova) pike series:
Triaged
Bug description:
We hit this in queens CI where we go to swap instance allocations to
the migration uuid during a resize and the instance allocations on the
source node are not found, which shouldn't happen:
http://logs.openstack.org/08/516708/3/gate/legacy-tempest-dsvm-
py35/4d8d6a3/logs/screen-n-super-cond.txt.gz#_Oct_31_23_18_04_391235
Oct 31 23:18:04.391235 ubuntu-xenial-rax-iad-0000635032
nova-conductor[22376]: ERROR nova.conductor.tasks.migrate [None
req-2bd7178b-307e-4342-a156-f9645b7f75a5 tempest-MigrationsAdminTest-834796780
tempest-MigrationsAdminTest-834796780] [instance:
87329b8a-0fa5-4467-b69e-6c43d4633f54] Unable to find existing allocations for
instance
Oct 31 23:18:04.421480 ubuntu-xenial-rax-iad-0000635032
nova-conductor[22376]: WARNING nova.scheduler.utils [None
req-2bd7178b-307e-4342-a156-f9645b7f75a5 tempest-MigrationsAdminTest-834796780
tempest-MigrationsAdminTest-834796780] Failed to compute_task_migrate_server:
Instance 87329b8a-0fa5-4467-b69e-6c43d4633f54 is unacceptable: Instance has no
source node allocation: nova.exception.InstanceUnacceptable: Instance
87329b8a-0fa5-4467-b69e-6c43d4633f54 is unacceptable: Instance has no source
node allocation
Looking in Placement for that instance, it's allocations are created
by the scheduler here:
http://logs.openstack.org/08/516708/3/gate/legacy-tempest-dsvm-
py35/4d8d6a3/logs/screen-placement-api.txt.gz#_Oct_31_23_18_00_805083
Oct 31 23:18:00.637846 ubuntu-xenial-rax-iad-0000635032
[email protected][15777]: DEBUG
nova.api.openstack.placement.requestlog [None
req-d285a5a5-704b-4f46-98d5-f6e5ae7f6b4f service placement] Starting request:
104.239.175.193 "PUT
/placement/allocations/87329b8a-0fa5-4467-b69e-6c43d4633f54" {{(pid=15780)
__call__ /opt/stack/new/nova/nova/api/openstack/placement/requestlog.py:38}}
Oct 31 23:18:00.684152 ubuntu-xenial-rax-iad-0000635032
[email protected][15777]: DEBUG
nova.api.openstack.placement.handlers.allocation [None
req-d285a5a5-704b-4f46-98d5-f6e5ae7f6b4f service placement] Successfully wrote
allocations
AllocationList[Allocation(consumer_id=87329b8a-0fa5-4467-b69e-6c43d4633f54,id=23,project_id='d349ceecbff744a2943294be5bb7e427',resource_class='MEMORY_MB',resource_provider=ResourceProvider(ec18595d-7007-47c9-bf13-38c8cf8a8bb0),used=64,user_id='d44d813424704df8996b7d77840283c9'),
Allocation(consumer_id=87329b8a-0fa5-4467-b69e-6c43d4633f54,id=24,project_id='d349ceecbff744a2943294be5bb7e427',resource_class='VCPU',resource_provider=ResourceProvider(ec18595d-7007-47c9-bf13-38c8cf8a8bb0),used=1,user_id='d44d813424704df8996b7d77840283c9')]
{{(pid=15780) _set_allocations
/opt/stack/new/nova/nova/api/openstack/placement/handlers/allocation.py:249}}
And shortly after that we see them deleted:
http://logs.openstack.org/08/516708/3/gate/legacy-tempest-dsvm-
py35/4d8d6a3/logs/screen-placement-api.txt.gz#_Oct_31_23_18_00_805083
Oct 31 23:18:00.805083 ubuntu-xenial-rax-iad-0000635032
[email protected][15777]: DEBUG
nova.api.openstack.placement.requestlog [None
req-1fc99e13-67ca-4180-9a7f-d3a01a219b15 service placement] Starting request:
104.239.175.193 "DELETE
/placement/allocations/87329b8a-0fa5-4467-b69e-6c43d4633f54" {{(pid=15780)
__call__ /opt/stack/new/nova/nova/api/openstack/placement/requestlog.py:38}}
Oct 31 23:18:00.814342 ubuntu-xenial-rax-iad-0000635032
[email protected][15777]: DEBUG
nova.api.openstack.placement.handlers.allocation [None
req-1fc99e13-67ca-4180-9a7f-d3a01a219b15 service placement] Successfully
deleted allocations
AllocationList[Allocation(consumer_id=87329b8a-0fa5-4467-b69e-6c43d4633f54,id=<?>,project_id=<?>,resource_class='MEMORY_MB',resource_provider=ResourceProvider(ec18595d-7007-47c9-bf13-38c8cf8a8bb0),used=64,user_id=<?>),
Allocation(consumer_id=87329b8a-0fa5-4467-b69e-6c43d4633f54,id=<?>,project_id=<?>,resource_class='VCPU',resource_provider=ResourceProvider(ec18595d-7007-47c9-bf13-38c8cf8a8bb0),used=1,user_id=<?>)]
{{(pid=15780) delete_allocations
/opt/stack/new/nova/nova/api/openstack/placement/handlers/allocation.py:307}}
Looking at the compute service logs, it looks like what is happening
is the update_available_resource periodic task on the compute is
running in between the time that the allocations are created on the
compute node via the scheduler and before the instance is created in
the cell database:
http://logs.openstack.org/08/516708/3/gate/legacy-tempest-dsvm-
py35/4d8d6a3/logs/screen-n-cpu.txt.gz#_Oct_31_23_18_00_165850
http://logs.openstack.org/08/516708/3/gate/legacy-tempest-dsvm-
py35/4d8d6a3/logs/screen-n-cpu.txt.gz#_Oct_31_23_18_00_780729
Oct 31 23:18:00.780729 ubuntu-xenial-rax-iad-0000635032 nova-compute[23251]:
DEBUG nova.compute.resource_tracker [None
req-9743bef5-318d-4e7e-a71b-26c62cd0af2d tempest-MigrationsAdminTest-834796780
tempest-MigrationsAdminTest-834796780] Instance
87329b8a-0fa5-4467-b69e-6c43d4633f54 has been deleted (perhaps locally).
Deleting allocations that remained for this instance against this compute host:
{'resources': {'VCPU': 1, 'MEMORY_MB': 64}}. {{(pid=23251)
_remove_deleted_instances_allocations
/opt/stack/new/nova/nova/compute/resource_tracker.py:1180}}
Oct 31 23:18:00.817854 ubuntu-xenial-rax-iad-0000635032 nova-compute[23251]:
INFO nova.scheduler.client.report [None
req-9743bef5-318d-4e7e-a71b-26c62cd0af2d tempest-MigrationsAdminTest-834796780
tempest-MigrationsAdminTest-834796780] Deleted allocation for instance
87329b8a-0fa5-4467-b69e-6c43d4633f54
That's because of this code that gets an InstanceNotFound and assumes
the instance was deleted:
https://github.com/openstack/nova/blob/f974e3c3566f379211d7fdc790d07b5680925584/nova/compute/resource_tracker.py#L1169-L1181
The compute service shouldn't have access to the nova_api database so
we can't check to see if the instance is still represented as a build
request, so we'll have to redo this deleted instance logic in the
resource tracker to fix this race bug.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1729371/+subscriptions
--
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : [email protected]
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help : https://help.launchpad.net/ListHelp