[Yahoo-eng-team] [Bug 1781710] Re: ServersOnMultiNodesTest.test_create_server_with_scheduler_hint_group_anti_affinity failing with "Servers are on the same host"

OpenStack Infra Mon, 23 Jul 2018 20:57:02 -0700

Reviewed:  https://review.openstack.org/583347
Committed: 
https://git.openstack.org/cgit/openstack/nova/commit/?id=c22b53c2481bac518a6b32cdee7b7df23d91251e
Submitter: Zuul
Branch:    master


commit c22b53c2481bac518a6b32cdee7b7df23d91251e
Author: Matt Riedemann <[email protected]>
Date:   Tue Jul 17 17:43:37 2018 -0400

    Update RequestSpec.instance_uuid during scheduling
    
    Before change I4b67ec9dd4ce846a704d0f75ad64c41e693de0fb
    the ServerGroupAntiAffinityFilter did not rely on the
    HostState.instances dict to determine **within the same
    request** how many members of the same anti-affinity
    group were on a given host. In fact, at that time, the
    HostState.instances dict wasn't updated between filter
    runs in the same multi-create request. That was fixed with
    change Iacc636fa8a59a9e8670a8d683c10bdbb0dc8237b so that
    as we select a host for each group member being created
    within a single request, we also update the HostState.instances
    dict so the ServerGroupAntiAffinityFilter would have
    accurate tracking of which group members were on which
    hosts.
    
    However, that did not account for a wrinkle in the filter
    added in change Ie016f59f5b98bb9c70b3e33556bd747f79fc77bd
    which is needed to allow resizing a server to the same host
    when that server is in an anti-affinity group. That wrinkle,
    combined with the fact the RequestSpec the filter is acting
    upon for a given instance in a multi-create request might
    not actually have the same instance_uuid can cause the filter
    to think it's in a resize situation and accept a host which
    already has another member from the same anti-affinity group
    on it, which breaks the anti-affinity policy.
    
    For background, during a multi-create request, we create a
    RequestSpec per instance being created, but conductor only
    sends the first RequestSpec for the first instance to the
    scheduler. This means RequestSpec.num_instances can be >1
    and we can be processing the Nth instance in the list during
    scheduling but working on a RequestSpec for the first instance.
    That is what breaks the resize check in ServerGroupAntiAffinityFilter
    with regard to multi-create.
    
    To resolve this, we update the RequestSpec.instance_uuid when
    filtering hosts for a given instance but we don't persist that
    change to the RequestSpec.
    
    This is a bit clunky, but it kind of comes with the territory of
    how we hack scheduling requests together using a single RequestSpec
    for multi-create requests. An alternative to this is found in change
    I0dd1fa5a70ac169efd509a50b5d69ee5deb8deb7 where we rely on the
    RequestSpec.num_instances field to determine if we're in a multi-create
    situation, but that has its own flaws because num_instances is
    persisted with the RequestSpec which might cause us to re-introduce
    bug 1558532 if we're not careful. In that case we'd have to either
    (1) stop persisting RequestSpec.num_instances or (2) temporarily
    unset it like we do using RequestSpec.reset_forced_destinations()
    during move operations.
    
    Co-Authored-By: Sean Mooney <[email protected]>
    
    Closes-Bug: #1781710
    
    Change-Id: Icba22060cb379ebd5e906981ec283667350b8c5a


** Changed in: nova
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1781710

Title:
  
ServersOnMultiNodesTest.test_create_server_with_scheduler_hint_group_anti_affinity
  failing with "Servers are on the same host"

Status in OpenStack Compute (nova):
  Fix Released

Bug description:
  Started seeing this recently which looks like a regression:

  http://logs.openstack.org/44/564444/14/check/neutron-tempest-
  multinode-full/dba40b9/job-output.txt.gz#_2018-07-13_19_53_15_275866

  2018-07-13 19:53:15.275866 | primary | {1} 
tempest.api.compute.admin.test_servers_on_multinodes.ServersOnMultiNodesTest.test_create_server_with_scheduler_hint_group_anti_affinity
 [7.164074s] ... FAILED
  2018-07-13 19:53:15.275944 | primary |
  2018-07-13 19:53:15.276012 | primary | Captured traceback:
  2018-07-13 19:53:15.276075 | primary | ~~~~~~~~~~~~~~~~~~~
  2018-07-13 19:53:15.276171 | primary |     Traceback (most recent call last):
  2018-07-13 19:53:15.276452 | primary |       File 
"tempest/api/compute/admin/test_servers_on_multinodes.py", line 115, in 
test_create_server_with_scheduler_hint_group_anti_affinity
  2018-07-13 19:53:15.276598 | primary |         'Servers are on the same host: 
%s' % hosts)
  2018-07-13 19:53:15.276857 | primary |       File 
"/opt/stack/new/tempest/.tox/tempest/local/lib/python2.7/site-packages/unittest2/case.py",
 line 845, in assertNotEqual
  2018-07-13 19:53:15.276965 | primary |         raise 
self.failureException(msg)
  2018-07-13 19:53:15.277830 | primary |     AssertionError: 
u'ubuntu-xenial-rax-dfw-0000714118' == u'ubuntu-xenial-rax-dfw-0000714118' : 
Servers are on the same host: {u'c166e283-477c-4ecf-9c1c-2dcd731a6d6a': 
u'ubuntu-xenial-rax-dfw-0000714118', u'6eb63e79-122e-45f9-931f-0750047116d1': 
u'ubuntu-xenial-rax-dfw-0000714118'}

  
http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22AssertionError%5C%22%20AND%20message%3A%5C%22Servers%20are%20on%20the%20same%20host%3A%5C%22%20AND%20tags%3A%5C%22console%5C%22&from=7d

  According to logstash, it looks like this started around July 9.

  These changes merged to nova on July 9 but shouldn't impact this
  scheduling behavior yet:

  
https://github.com/openstack/nova/commit/57b0bb374963bdbf0aef910feaccb8f536641c41

  
https://github.com/openstack/nova/commit/afc7650e64753ab7687ae2c4f2714d4bb78a4e5a

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1781710/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1781710] Re: ServersOnMultiNodesTest.test_create_server_with_scheduler_hint_group_anti_affinity failing with "Servers are on the same host"

Reply via email to