Reviewed: https://review.openstack.org/454426 Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=aa943499128ba77b062dff75ec9b48e54f7d5021 Submitter: Jenkins Branch: master
commit aa943499128ba77b062dff75ec9b48e54f7d5021 Author: Matt Riedemann <[email protected]> Date: Thu Apr 6 22:08:52 2017 -0400 Handle new hosts for updating instance info in scheduler As of change 791cf0643401f72ce834e580938057e325945169 we have to go through host and cell mappings to get to a cell database for host (compute node) and instance information. When a new compute service starts up it casts to update_instance_info and sends an empty list, which triggers the scheduler to try and lookup all instances on that host. If the new compute host is not yet mapped in a cell, we'll get a HostMappingNotFound error. We can handle this case in both the HostManager and ComputeManager by not casting from the compute if there is no information to send, and we can also handle it in the HostManager (for older computes) but just dealing with the HostMappingNotFound gracefully. This information eventually self-heals via normal operation. Change-Id: I7cec2eff35c0615534fcb4d5148f75721824172e Closes-Bug: #1680661 ** Changed in: nova Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1680661 Title: HostMappingNotFound during update_instance_info on n-cpu startup Status in OpenStack Compute (nova): Fix Released Bug description: Noticed this in the nova-scheduler logs during a failed test run today: http://logs.openstack.org/16/453916/4/check/gate-tempest-dsvm-py35 -ubuntu- xenial/09850e6/logs/screen-n-sch.txt.gz?level=TRACE#_2017-04-06_23_11_45_834 2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server [req-36e4db2a-38e1-4810-9309-8893598e195a - -] Exception during message handling 2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server Traceback (most recent call last): 2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.5/dist-packages/oslo_messaging/rpc/server.py", line 157, in _process_incoming 2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message) 2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.5/dist-packages/oslo_messaging/rpc/dispatcher.py", line 213, in dispatch 2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args) 2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.5/dist-packages/oslo_messaging/rpc/dispatcher.py", line 183, in _do_dispatch 2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args) 2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server File "/opt/stack/new/nova/nova/scheduler/manager.py", line 125, in update_instance_info 2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server instance_info) 2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.5/dist-packages/oslo_concurrency/lockutils.py", line 271, in inner 2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server return f(*args, **kwargs) 2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server File "/opt/stack/new/nova/nova/scheduler/host_manager.py", line 774, in update_instance_info 2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server self._recreate_instance_info(context, host_name) 2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server File "/opt/stack/new/nova/nova/scheduler/host_manager.py", line 745, in _recreate_instance_info 2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server inst_dict = self._get_instances_by_host(context, host_name) 2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server File "/opt/stack/new/nova/nova/scheduler/host_manager.py", line 717, in _get_instances_by_host 2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server hm = objects.HostMapping.get_by_host(context, host_name) 2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.5/dist-packages/oslo_versionedobjects/base.py", line 184, in wrapper 2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server result = fn(cls, context, *args, **kwargs) 2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server File "/opt/stack/new/nova/nova/objects/host_mapping.py", line 100, in get_by_host 2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server db_mapping = cls._get_by_host_from_db(context, host) 2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.5/dist-packages/oslo_db/sqlalchemy/enginefacade.py", line 963, in wrapper 2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server return fn(*args, **kwargs) 2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server File "/opt/stack/new/nova/nova/objects/host_mapping.py", line 95, in _get_by_host_from_db 2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server raise exception.HostMappingNotFound(name=host) 2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server nova.exception.HostMappingNotFound: Host 'ubuntu-xenial-internap-mtl01-8316357' is not mapped to any cell 2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server This is most likely a side effect of this change: https://review.openstack.org/#/c/439891/ This could be a race on startup where the compute node is sending it's instance information to the scheduler before the compute host is mapped to a cell, in which case it wouldn't have any instances to send so the scheduler will try to look them up from the database and blow up because the host isn't mapped yet. I see from the log: Total number of compute nodes: 0 _async_init_instance_info /opt/stack/new/nova/nova/scheduler/host_manager.py:439 Adding 0 instances for hosts 10-20 _async_init_instance_info /opt/stack/new/nova/nova/scheduler/host_manager.py:459 It's definitely happening on startup. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1680661/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : [email protected] Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp

