Reviewed: https://review.openstack.org/606122 Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=05cd8d128211adbbfb3cf5d626034ccd0f75a452 Submitter: Zuul Branch: master
commit 05cd8d128211adbbfb3cf5d626034ccd0f75a452 Author: Matt Riedemann <[email protected]> Date: Fri Sep 28 11:18:14 2018 -0400 Fix InstanceNotFound during _destroy_evacuated_instances The _destroy_evacuated_instances method on compute startup tries to cleanup guests on the hypervisor and allocations held against that compute node resource provider by evacuated instances, but doesn't take into account that those evacuated instances could have been deleted in the meantime which leads to a lazy-load InstanceNotFound error that kills the startup of the compute service. This change does two things in the _destroy_evacuated_instances method: 1. Loads the evacuated instances with a read_deleted='yes' context when calling _get_instances_on_driver(). This should be fine since _get_instances_on_driver() is already returning deleted instances anyway (InstanceList.get_by_filters defaults to read deleted instances unless the filters tell it otherwise - which we don't in this case). This is needed so that things like driver.destroy() don't raise InstanceNotFound while lazy-loading fields on the instance. 2. Skips the call to remove_allocation_from_compute() if the evacuated instance is already deleted. If the instance is already deleted, its allocations should have been cleaned up by its hosting compute service (or the API). The functional regression test is updated to show the bug is now fixed. Change-Id: I1f4b3540dd453650f94333b36d7504ba164192f7 Closes-Bug: #1794996 ** Changed in: nova Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1794996 Title: _destroy_evacuated_instances fails and kills n-cpu startup if lazy- loading flavor on a deleted instance Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) pike series: Triaged Status in OpenStack Compute (nova) queens series: Triaged Status in OpenStack Compute (nova) rocky series: Triaged Bug description: Seen here: http://logs.openstack.org/00/604400/5/check/nova-live- migration/6aa7a4b/logs/subnode-2/screen-n-cpu.txt.gz#_Sep_26_23_53_06_475005 Sep 26 23:53:06.475005 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: DEBUG nova.objects.instance [None req-0b213f62-b666-4a02-b465-7f36d24e2fbf None None] Lazy-loading 'flavor' on Instance uuid 1d4eaa12-1477-4528-a06b-52e5672c6c61 {{(pid=31767) obj_load_attr /opt/stack/new/nova/nova/objects/instance.py:1109}} Sep 26 23:53:06.547498 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service [None req-0b213f62-b666-4a02-b465-7f36d24e2fbf None None] Error starting thread.: InstanceNotFound_Remote: Instance 1d4eaa12-1477-4528-a06b-52e5672c6c61 could not be found. Sep 26 23:53:06.547786 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: Traceback (most recent call last): Sep 26 23:53:06.548012 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: File "/opt/stack/new/nova/nova/conductor/manager.py", line 126, in _object_dispatch Sep 26 23:53:06.548241 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: return getattr(target, method)(*args, **kwargs) Sep 26 23:53:06.548456 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: File "/usr/local/lib/python2.7/dist-packages/oslo_versionedobjects/base.py", line 184, in wrapper Sep 26 23:53:06.548670 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: result = fn(cls, context, *args, **kwargs) Sep 26 23:53:06.548881 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: File "/opt/stack/new/nova/nova/objects/instance.py", line 503, in get_by_uuid Sep 26 23:53:06.549100 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: use_slave=use_slave) Sep 26 23:53:06.549319 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: File "/opt/stack/new/nova/nova/db/sqlalchemy/api.py", line 210, in wrapper Sep 26 23:53:06.549530 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: return f(*args, **kwargs) Sep 26 23:53:06.549743 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: File "/opt/stack/new/nova/nova/objects/instance.py", line 495, in _db_instance_get_by_uuid Sep 26 23:53:06.549991 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: columns_to_join=columns_to_join) Sep 26 23:53:06.550272 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: File "/opt/stack/new/nova/nova/db/api.py", line 758, in instance_get_by_uuid Sep 26 23:53:06.550515 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: return IMPL.instance_get_by_uuid(context, uuid, columns_to_join) Sep 26 23:53:06.551049 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: File "/opt/stack/new/nova/nova/db/sqlalchemy/api.py", line 168, in wrapper Sep 26 23:53:06.551268 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: return f(*args, **kwargs) Sep 26 23:53:06.551480 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: File "/opt/stack/new/nova/nova/db/sqlalchemy/api.py", line 255, in wrapped Sep 26 23:53:06.551696 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: return f(context, *args, **kwargs) Sep 26 23:53:06.551908 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: File "/opt/stack/new/nova/nova/db/sqlalchemy/api.py", line 1843, in instance_get_by_uuid Sep 26 23:53:06.552125 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: columns_to_join=columns_to_join) Sep 26 23:53:06.552344 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: File "/opt/stack/new/nova/nova/db/sqlalchemy/api.py", line 1852, in _instance_get_by_uuid Sep 26 23:53:06.552561 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: raise exception.InstanceNotFound(instance_id=uuid) Sep 26 23:53:06.552772 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: InstanceNotFound: Instance 1d4eaa12-1477-4528-a06b-52e5672c6c61 could not be found. Sep 26 23:53:06.552989 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service Traceback (most recent call last): Sep 26 23:53:06.553210 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service File "/usr/local/lib/python2.7/dist-packages/oslo_service/service.py", line 796, in run_service Sep 26 23:53:06.553420 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service service.start() Sep 26 23:53:06.553628 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service File "/opt/stack/new/nova/nova/service.py", line 162, in start Sep 26 23:53:06.553878 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service self.manager.init_host() Sep 26 23:53:06.554108 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service File "/opt/stack/new/nova/nova/compute/manager.py", line 1204, in init_host Sep 26 23:53:06.554318 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service evacuated_instances = self._destroy_evacuated_instances(context) Sep 26 23:53:06.554527 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service File "/opt/stack/new/nova/nova/compute/manager.py", line 710, in _destroy_evacuated_instances Sep 26 23:53:06.554741 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service context, instance, cn_uuid, self.reportclient): Sep 26 23:53:06.554950 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service File "/opt/stack/new/nova/nova/scheduler/utils.py", line 998, in remove_allocation_from_compute Sep 26 23:53:06.555159 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service flavor = instance.flavor Sep 26 23:53:06.555383 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service File "/usr/local/lib/python2.7/dist-packages/oslo_versionedobjects/base.py", line 67, in getter Sep 26 23:53:06.555665 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service self.obj_load_attr(name) Sep 26 23:53:06.555876 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service File "/opt/stack/new/nova/nova/objects/instance.py", line 1139, in obj_load_attr Sep 26 23:53:06.556086 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service self._load_flavor() Sep 26 23:53:06.556883 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service File "/opt/stack/new/nova/nova/objects/instance.py", line 961, in _load_flavor Sep 26 23:53:06.557104 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service expected_attrs=['flavor']) Sep 26 23:53:06.557321 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service File "/usr/local/lib/python2.7/dist-packages/oslo_versionedobjects/base.py", line 177, in wrapper Sep 26 23:53:06.557531 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service args, kwargs) Sep 26 23:53:06.557741 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service File "/opt/stack/new/nova/nova/conductor/rpcapi.py", line 241, in object_class_action_versions Sep 26 23:53:06.557979 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service args=args, kwargs=kwargs) Sep 26 23:53:06.558191 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 179, in call Sep 26 23:53:06.558409 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service retry=self.retry) Sep 26 23:53:06.558638 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/transport.py", line 133, in _send Sep 26 23:53:06.558853 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service retry=retry) Sep 26 23:53:06.559062 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 584, in send Sep 26 23:53:06.559271 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service call_monitor_timeout, retry=retry) Sep 26 23:53:06.559485 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 575, in _send Sep 26 23:53:06.559707 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service raise result Sep 26 23:53:06.559918 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service InstanceNotFound_Remote: Instance 1d4eaa12-1477-4528-a06b-52e5672c6c61 could not be found. This is a test for evacuate where we take down nova-compute on one host and evacuate instances from it. Then those instances are deleted and then the compute service is re-enabled and restarted. On restart, nova-compute is trying to cleanup allocations in placement for the evacuated instances, and to do that it tries to lazy-load the flavor from the deleted instance which fails because we're not using read_deleted='yes' on the context. This is similar to bug 1745977 which was fixed with change: https://review.openstack.org/#/q/Ide6cc5bb1fce2c9aea9fa3efdf940e8308cd9ed0 But that only handled loading of generic attributes, in that case system_metadata. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1794996/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : [email protected] Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp

