[Yahoo-eng-team] [Bug 1605804] Re: Instance creation sometimes fails after host aggregate deletion

2016-10-15 Thread Matt Riedemann
** Also affects: nova/mitaka
   Importance: Undecided
   Status: New

** Changed in: nova/mitaka
 Assignee: (unassigned) => Roman Podoliaka (rpodolyaka)

** Changed in: nova/mitaka
   Status: New => In Progress

** Changed in: nova/mitaka
   Importance: Undecided => Low

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1605804

Title:
  Instance creation sometimes fails after host aggregate deletion

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) mitaka series:
  In Progress

Bug description:
  Instance creation starts failing if nova scheduler gets in an inconsistent 
state wrt host aggregates. If remove_host_from_aggregate operation is invoked 
for multiple hosts in quick succession, followed by aggregate deletion, the 
nova scheduler host_manager maps (host_aggregates_map and aggs_by_id) get out 
of sync, as there are some stale references left behind in the 
host_aggregates_map for an aggregate that is deleted from the aggs_by_id map. 
  This is because it cleans up state based on aggregate.hosts which is empty 
when aggregate is deleted, but the prior aggregate updates to remove individual 
hosts could have incorrect list of hosts added to the host_aggregates_map.

  Instance creation fails with below error once scheduler gets in this state:
  2016-07-21 18:20:16.780 15692 ERROR oslo_messaging.rpc.dispatcher 
[req-7f29701b-0272-444c-8650-a1035777e642 d2c755daa21e451e86c1d2b5be705aa2 
0546d7f9c747456aa0ffb306cfe5627d - - -] Exception during message handling: 1
  2016-07-21 18:20:16.780 15692 ERROR oslo_messaging.rpc.dispatcher Traceback 
(most recent call last):
  2016-07-21 18:20:16.780 15692 ERROR oslo_messaging.rpc.dispatcher   File 
"/opt/pf9/nova/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", 
line 138, in _dispatch_and_reply
  2016-07-21 18:20:16.780 15692 ERROR oslo_messaging.rpc.dispatcher 
incoming.message))
  2016-07-21 18:20:16.780 15692 ERROR oslo_messaging.rpc.dispatcher   File 
"/opt/pf9/nova/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", 
line 183, in _dispatch
  2016-07-21 18:20:16.780 15692 ERROR oslo_messaging.rpc.dispatcher return 
self._do_dispatch(endpoint, method, ctxt, args)
  2016-07-21 18:20:16.780 15692 ERROR oslo_messaging.rpc.dispatcher   File 
"/opt/pf9/nova/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", 
line 127, in _do_dispatch
  2016-07-21 18:20:16.780 15692 ERROR oslo_messaging.rpc.dispatcher result 
= func(ctxt, **new_args)
  2016-07-21 18:20:16.780 15692 ERROR oslo_messaging.rpc.dispatcher   File 
"/opt/pf9/nova/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 
150, in inner
  2016-07-21 18:20:16.780 15692 ERROR oslo_messaging.rpc.dispatcher return 
func(*args, **kwargs)
  2016-07-21 18:20:16.780 15692 ERROR oslo_messaging.rpc.dispatcher   File 
"/opt/pf9/nova/lib/python2.7/site-packages/nova/scheduler/manager.py", line 84, 
in select_destinations
  2016-07-21 18:20:16.780 15692 ERROR oslo_messaging.rpc.dispatcher 
filter_properties)
  2016-07-21 18:20:16.780 15692 ERROR oslo_messaging.rpc.dispatcher   File 
"/opt/pf9/nova/lib/python2.7/site-packages/nova/scheduler/filter_scheduler.py", 
line 72, in select_destinations
  2016-07-21 18:20:16.780 15692 ERROR oslo_messaging.rpc.dispatcher 
filter_properties)
  2016-07-21 18:20:16.780 15692 ERROR oslo_messaging.rpc.dispatcher   File 
"/opt/pf9/nova/lib/python2.7/site-packages/nova/scheduler/filter_scheduler.py", 
line 164, in _schedule
  2016-07-21 18:20:16.780 15692 ERROR oslo_messaging.rpc.dispatcher hosts = 
self._get_all_host_states(elevated)
  2016-07-21 18:20:16.780 15692 ERROR oslo_messaging.rpc.dispatcher   File 
"/opt/pf9/nova/lib/python2.7/site-packages/nova/scheduler/filter_scheduler.py", 
line 222, in _get_all_host_states
  2016-07-21 18:20:16.780 15692 ERROR oslo_messaging.rpc.dispatcher return 
self.host_manager.get_all_host_states(context)
  2016-07-21 18:20:16.780 15692 ERROR oslo_messaging.rpc.dispatcher   File 
"/opt/pf9/nova/lib/python2.7/site-packages/nova/scheduler/host_manager.py", 
line 585, in get_all_host_states
  2016-07-21 18:20:16.780 15692 ERROR oslo_messaging.rpc.dispatcher 
host_state.host]]
  2016-07-21 18:20:16.780 15692 ERROR oslo_messaging.rpc.dispatcher KeyError: 1
  2016-07-21 18:20:16.780 15692 ERROR oslo_messaging.rpc.dispatcher
  2016-07-21 18:20:16.784 15692 ERROR oslo_messaging._drivers.common 
[req-7f29701b-0272-444c-8650-a1035777e642 d2c755daa21e451e86c1d2b5be705aa2 
0546d7f9c747456aa0ffb306cfe5627d - - -] Returning exception 1 to caller

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1605804/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help  

[Yahoo-eng-team] [Bug 1605804] Re: Instance creation sometimes fails after host aggregate deletion

2016-09-06 Thread OpenStack Infra
Reviewed:  https://review.openstack.org/352344
Committed: 
https://git.openstack.org/cgit/openstack/nova/commit/?id=f0dd4d6bdd286ea155cf55eb62662993577d8892
Submitter: Jenkins
Branch:master

commit f0dd4d6bdd286ea155cf55eb62662993577d8892
Author: Markus Zoeller 
Date:   Mon Aug 8 12:46:43 2016 +0200

Fix corrupt "host_aggregates_map" in host_manager

A host can be in multiple host-aggregates at the same time. When a
host gets removed from an aggregate in thread A and this aggregate
gets deleted in thread B, there can be a race-condition where the
mapping data in the host_manager can get out of sync for a moment.

This change simulates this condition in a unit test and fixes the bug
by iterating over the mapping itself instead of the out-of-date list
"aggregates.hosts".

Closes-Bug: 1605804
Change-Id: I59861f03f0c681f7118782fb017af377e07552aa


** Changed in: nova
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1605804

Title:
  Instance creation sometimes fails after host aggregate deletion

Status in OpenStack Compute (nova):
  Fix Released

Bug description:
  Instance creation starts failing if nova scheduler gets in an inconsistent 
state wrt host aggregates. If remove_host_from_aggregate operation is invoked 
for multiple hosts in quick succession, followed by aggregate deletion, the 
nova scheduler host_manager maps (host_aggregates_map and aggs_by_id) get out 
of sync, as there are some stale references left behind in the 
host_aggregates_map for an aggregate that is deleted from the aggs_by_id map. 
  This is because it cleans up state based on aggregate.hosts which is empty 
when aggregate is deleted, but the prior aggregate updates to remove individual 
hosts could have incorrect list of hosts added to the host_aggregates_map.

  Instance creation fails with below error once scheduler gets in this state:
  2016-07-21 18:20:16.780 15692 ERROR oslo_messaging.rpc.dispatcher 
[req-7f29701b-0272-444c-8650-a1035777e642 d2c755daa21e451e86c1d2b5be705aa2 
0546d7f9c747456aa0ffb306cfe5627d - - -] Exception during message handling: 1
  2016-07-21 18:20:16.780 15692 ERROR oslo_messaging.rpc.dispatcher Traceback 
(most recent call last):
  2016-07-21 18:20:16.780 15692 ERROR oslo_messaging.rpc.dispatcher   File 
"/opt/pf9/nova/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", 
line 138, in _dispatch_and_reply
  2016-07-21 18:20:16.780 15692 ERROR oslo_messaging.rpc.dispatcher 
incoming.message))
  2016-07-21 18:20:16.780 15692 ERROR oslo_messaging.rpc.dispatcher   File 
"/opt/pf9/nova/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", 
line 183, in _dispatch
  2016-07-21 18:20:16.780 15692 ERROR oslo_messaging.rpc.dispatcher return 
self._do_dispatch(endpoint, method, ctxt, args)
  2016-07-21 18:20:16.780 15692 ERROR oslo_messaging.rpc.dispatcher   File 
"/opt/pf9/nova/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", 
line 127, in _do_dispatch
  2016-07-21 18:20:16.780 15692 ERROR oslo_messaging.rpc.dispatcher result 
= func(ctxt, **new_args)
  2016-07-21 18:20:16.780 15692 ERROR oslo_messaging.rpc.dispatcher   File 
"/opt/pf9/nova/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 
150, in inner
  2016-07-21 18:20:16.780 15692 ERROR oslo_messaging.rpc.dispatcher return 
func(*args, **kwargs)
  2016-07-21 18:20:16.780 15692 ERROR oslo_messaging.rpc.dispatcher   File 
"/opt/pf9/nova/lib/python2.7/site-packages/nova/scheduler/manager.py", line 84, 
in select_destinations
  2016-07-21 18:20:16.780 15692 ERROR oslo_messaging.rpc.dispatcher 
filter_properties)
  2016-07-21 18:20:16.780 15692 ERROR oslo_messaging.rpc.dispatcher   File 
"/opt/pf9/nova/lib/python2.7/site-packages/nova/scheduler/filter_scheduler.py", 
line 72, in select_destinations
  2016-07-21 18:20:16.780 15692 ERROR oslo_messaging.rpc.dispatcher 
filter_properties)
  2016-07-21 18:20:16.780 15692 ERROR oslo_messaging.rpc.dispatcher   File 
"/opt/pf9/nova/lib/python2.7/site-packages/nova/scheduler/filter_scheduler.py", 
line 164, in _schedule
  2016-07-21 18:20:16.780 15692 ERROR oslo_messaging.rpc.dispatcher hosts = 
self._get_all_host_states(elevated)
  2016-07-21 18:20:16.780 15692 ERROR oslo_messaging.rpc.dispatcher   File 
"/opt/pf9/nova/lib/python2.7/site-packages/nova/scheduler/filter_scheduler.py", 
line 222, in _get_all_host_states
  2016-07-21 18:20:16.780 15692 ERROR oslo_messaging.rpc.dispatcher return 
self.host_manager.get_all_host_states(context)
  2016-07-21 18:20:16.780 15692 ERROR oslo_messaging.rpc.dispatcher   File 
"/opt/pf9/nova/lib/python2.7/site-packages/nova/scheduler/host_manager.py", 
line 585, in get_all_host_states
  2016-07-21 18:20:16.780 15692 ERROR oslo_messaging.rpc.dispatcher