Reviewed: https://review.openstack.org/583347 Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=c22b53c2481bac518a6b32cdee7b7df23d91251e Submitter: Zuul Branch: master
commit c22b53c2481bac518a6b32cdee7b7df23d91251e Author: Matt Riedemann <[email protected]> Date: Tue Jul 17 17:43:37 2018 -0400 Update RequestSpec.instance_uuid during scheduling Before change I4b67ec9dd4ce846a704d0f75ad64c41e693de0fb the ServerGroupAntiAffinityFilter did not rely on the HostState.instances dict to determine **within the same request** how many members of the same anti-affinity group were on a given host. In fact, at that time, the HostState.instances dict wasn't updated between filter runs in the same multi-create request. That was fixed with change Iacc636fa8a59a9e8670a8d683c10bdbb0dc8237b so that as we select a host for each group member being created within a single request, we also update the HostState.instances dict so the ServerGroupAntiAffinityFilter would have accurate tracking of which group members were on which hosts. However, that did not account for a wrinkle in the filter added in change Ie016f59f5b98bb9c70b3e33556bd747f79fc77bd which is needed to allow resizing a server to the same host when that server is in an anti-affinity group. That wrinkle, combined with the fact the RequestSpec the filter is acting upon for a given instance in a multi-create request might not actually have the same instance_uuid can cause the filter to think it's in a resize situation and accept a host which already has another member from the same anti-affinity group on it, which breaks the anti-affinity policy. For background, during a multi-create request, we create a RequestSpec per instance being created, but conductor only sends the first RequestSpec for the first instance to the scheduler. This means RequestSpec.num_instances can be >1 and we can be processing the Nth instance in the list during scheduling but working on a RequestSpec for the first instance. That is what breaks the resize check in ServerGroupAntiAffinityFilter with regard to multi-create. To resolve this, we update the RequestSpec.instance_uuid when filtering hosts for a given instance but we don't persist that change to the RequestSpec. This is a bit clunky, but it kind of comes with the territory of how we hack scheduling requests together using a single RequestSpec for multi-create requests. An alternative to this is found in change I0dd1fa5a70ac169efd509a50b5d69ee5deb8deb7 where we rely on the RequestSpec.num_instances field to determine if we're in a multi-create situation, but that has its own flaws because num_instances is persisted with the RequestSpec which might cause us to re-introduce bug 1558532 if we're not careful. In that case we'd have to either (1) stop persisting RequestSpec.num_instances or (2) temporarily unset it like we do using RequestSpec.reset_forced_destinations() during move operations. Co-Authored-By: Sean Mooney <[email protected]> Closes-Bug: #1781710 Change-Id: Icba22060cb379ebd5e906981ec283667350b8c5a ** Changed in: nova Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1781710 Title: ServersOnMultiNodesTest.test_create_server_with_scheduler_hint_group_anti_affinity failing with "Servers are on the same host" Status in OpenStack Compute (nova): Fix Released Bug description: Started seeing this recently which looks like a regression: http://logs.openstack.org/44/564444/14/check/neutron-tempest- multinode-full/dba40b9/job-output.txt.gz#_2018-07-13_19_53_15_275866 2018-07-13 19:53:15.275866 | primary | {1} tempest.api.compute.admin.test_servers_on_multinodes.ServersOnMultiNodesTest.test_create_server_with_scheduler_hint_group_anti_affinity [7.164074s] ... FAILED 2018-07-13 19:53:15.275944 | primary | 2018-07-13 19:53:15.276012 | primary | Captured traceback: 2018-07-13 19:53:15.276075 | primary | ~~~~~~~~~~~~~~~~~~~ 2018-07-13 19:53:15.276171 | primary | Traceback (most recent call last): 2018-07-13 19:53:15.276452 | primary | File "tempest/api/compute/admin/test_servers_on_multinodes.py", line 115, in test_create_server_with_scheduler_hint_group_anti_affinity 2018-07-13 19:53:15.276598 | primary | 'Servers are on the same host: %s' % hosts) 2018-07-13 19:53:15.276857 | primary | File "/opt/stack/new/tempest/.tox/tempest/local/lib/python2.7/site-packages/unittest2/case.py", line 845, in assertNotEqual 2018-07-13 19:53:15.276965 | primary | raise self.failureException(msg) 2018-07-13 19:53:15.277830 | primary | AssertionError: u'ubuntu-xenial-rax-dfw-0000714118' == u'ubuntu-xenial-rax-dfw-0000714118' : Servers are on the same host: {u'c166e283-477c-4ecf-9c1c-2dcd731a6d6a': u'ubuntu-xenial-rax-dfw-0000714118', u'6eb63e79-122e-45f9-931f-0750047116d1': u'ubuntu-xenial-rax-dfw-0000714118'} http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22AssertionError%5C%22%20AND%20message%3A%5C%22Servers%20are%20on%20the%20same%20host%3A%5C%22%20AND%20tags%3A%5C%22console%5C%22&from=7d According to logstash, it looks like this started around July 9. These changes merged to nova on July 9 but shouldn't impact this scheduling behavior yet: https://github.com/openstack/nova/commit/57b0bb374963bdbf0aef910feaccb8f536641c41 https://github.com/openstack/nova/commit/afc7650e64753ab7687ae2c4f2714d4bb78a4e5a To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1781710/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : [email protected] Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp

