[Yahoo-eng-team] [Bug 1767139] Re: TypeError in _get_inventory_and_update_provider_generation

Matt Riedemann Thu, 03 May 2018 08:16:42 -0700

I know what the problem is:

(9:59:34 AM) mriedem: set_inventory_for_provider -> _ensure_resource_provider 
-> _create_resource_provider -> safe_connect returns None because it can't talk 
to placement yet
(9:59:41 AM) mriedem: 
https://review.openstack.org/#/c/524618/2/nova/scheduler/client/report.py@516
(9:59:44 AM) mriedem: so we put None in the cache


** Changed in: nova
       Status: New => Triaged

** Also affects: nova/pike
   Importance: Undecided
       Status: New

** Also affects: nova/queens
   Importance: Undecided
       Status: New

** Changed in: nova
     Assignee: (unassigned) => Matt Riedemann (mriedem)

** Changed in: nova
   Importance: Undecided => High

** Changed in: nova/queens
       Status: New => Triaged

** Changed in: nova/pike
   Importance: Undecided => Medium

** Changed in: nova/queens
   Importance: Undecided => Medium

** Changed in: nova/pike
       Status: New => Triaged

** Changed in: nova
   Importance: High => Medium

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1767139

Title:
  TypeError in _get_inventory_and_update_provider_generation

Status in OpenStack Compute (nova):
  Triaged
Status in OpenStack Compute (nova) pike series:
  Triaged
Status in OpenStack Compute (nova) queens series:
  Triaged

Bug description:
  Description
  ===========

  Bringing up a new cluster as part of our CI after switch from 16.1.0
  to 16.1.1 on Centos, I'm seeing this error on some computes:

  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager Traceback (most 
recent call last):
  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager   File 
"/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 6752, in 
update_available_resource_for_node
  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager     
rt.update_available_resource(context, nodename)
  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager   File 
"/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 704, 
in update_available_resource
  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager     
self._update_available_resource(context, resources)
  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager   File 
"/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 271, in 
inner
  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager     return f(*args, 
**kwargs)
  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager   File 
"/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 728, 
in _update_available_resource
  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager     
self._init_compute_node(context, resources)
  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager   File 
"/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 585, 
in _init_compute_node
  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager     
self._update(context, cn)
  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager   File 
"/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 886, 
in _update
  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager     inv_data,
  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager   File 
"/usr/lib/python2.7/site-packages/nova/scheduler/client/__init__.py", line 64, 
in set_inventory_for_provider
  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager     inv_data,
  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager   File 
"/usr/lib/python2.7/site-packages/nova/scheduler/client/__init__.py", line 37, 
in __run_method
  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager     return 
getattr(self.instance, __name)(*args, **kwargs)
  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager   File 
"/usr/lib/python2.7/site-packages/nova/scheduler/client/report.py", line 789, 
in set_inventory_for_provider
  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager     
self._update_inventory(rp_uuid, inv_data)
  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager   File 
"/usr/lib/python2.7/site-packages/nova/scheduler/client/report.py", line 56, in 
wrapper
  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager     return f(self, 
*a, **k)
  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager   File 
"/usr/lib/python2.7/site-packages/nova/scheduler/client/report.py", line 675, 
in _update_inventory
  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager     if 
self._update_inventory_attempt(rp_uuid, inv_data):
  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager   File 
"/usr/lib/python2.7/site-packages/nova/scheduler/client/report.py", line 562, 
in _update_inventory_attempt
  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager     curr = 
self._get_inventory_and_update_provider_generation(rp_uuid)
  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager   File 
"/usr/lib/python2.7/site-packages/nova/scheduler/client/report.py", line 546, 
in _get_inventory_and_update_provider_generation
  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager     if server_gen != 
my_rp['generation']:
  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager TypeError: 
'NoneType' object has no attribute '__getitem__'

  The error seems persistent for a single run of nova-compute.

  Steps to reproduce
  ==================

  Nodes were started by our CI infrastructure.  We start 3 computes and
  a single control node.  In 50% of cases, one of the computes comes up
  in this bad state.

  Expected result
  ===============

  Working cluster.

  Actual result
  =============

  At least one of 3 nodes fails to join the cluster, it's not picked up
  by discover_hosts and I see the above stack trace repeated in the
  nova-compute logs.

  Environment
  ===========
  1. Exact version of OpenStack you are running. See the following
    list for all releases: http://docs.openstack.org/releases/

  $ rpm -qa | grep nova
  python-nova-16.1.1-1.el7.noarch
  openstack-nova-common-16.1.1-1.el7.noarch
  python2-novaclient-9.1.1-1.el7.noarch
  openstack-nova-api-16.1.1-1.el7.noarch
  openstack-nova-compute-16.1.1-1.el7.noarch

  
  2. Which hypervisor did you use?
     (For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...)
     What's the version of that?

  $ rpm -qa | grep kvm
  libvirt-daemon-kvm-3.2.0-14.el7_4.9.x86_64
  qemu-kvm-common-ev-2.9.0-16.el7_4.14.1.x86_64
  qemu-kvm-ev-2.9.0-16.el7_4.14.1.x86_64

  2. Which storage type did you use?
     (For example: Ceph, LVM, GPFS, ...)
     What's the version of that?

  Not sure

  3. Which networking type did you use?
     (For example: nova-network, Neutron with OpenVSwitch, ...)

  Neutron with Calico (I work on Calico, this is our CI system)

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1767139/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1767139] Re: TypeError in _get_inventory_and_update_provider_generation

Reply via email to