[openstack-dev] [nova] Fixing race condition with server groups and affinity policy
According to [1]: """ 1) It's possible to hit a similar race condition for server groups with the "affinity" policy. Suppose we create a new group and then create two instances simultaneously. The scheduler sees an empty group for each, assigns them to different compute nodes, and the policy is violated. We should add a check in _validate_instance_group_policy() to cover the "affinity" case. 2) It's possible to create two instances simultaneously, have them be scheduled to conflicting hosts, both of them detect the problem in _validate_instance_group_policy(), both of them get sent back for rescheduling, and both of them get assigned to conflicting hosts *again*, resulting in an error. In order to fix this I propose that instead of checking against all other instances in the group, we only check against instances that were created before the current instance. """ I've been trying to improve upon Chris' solution here[2], but honestly i'm not exactly sure if I'm approaching this correctly. Chris' solution is to only consider the older members when validating group policy(ignoring any members younger than the instance we are validating), eliminating the possibility for the two cases mentioned above. I don't really know enough about the scheduler and instance group code to validate the integrity of the solution, hence my plea for help here :) I've attached a git format of my attempt to make the implementation a little cleaner. It's the same solution Chris implemented, just moved to the setup_hosts() method to avoid creating a new remotable method. I haven't gotten the tests to pass, i'm having trouble getting the expected filters to work. I'm pretty new to the openstack code base, is there anyway anyone here could give me some direction? Is the solution correct? What am I missing? How can I fill in the gaps? Is this even still a valid issue? Thanks! [1] https://bugs.launchpad.net/nova/+bug/1423648 [2] https://review.openstack.org/#/c/164762/ From b6ea4e26d5feac8c57d6eb0619f29c22c1f917ca Mon Sep 17 00:00:00 2001 From: Miguel Alex Cantu <miguel.ca...@rackspace.com> Date: Wed, 17 Aug 2016 22:32:55 + Subject: [PATCH] Fix race in server group policy validation This is a follow up to Chris Friesen's patch here: https://review.openstack.org/#/c/164762/9 There is a race condition when validating instances being simultaneously created as part of the same server group. It's possible to create two instances simultaneously, have them be scheduled to hosts that would violate the group policy, have both of them detect the problem in _validate_instance_group_policy(), both of them get sent back for rescheduling, and both of them get assigned to conflicting hosts *again*, resulting in an error. The fix is to modify the validation code to ignore group members created after the instance being validated. This ensures that at least one of the instances will run on the chosen host, and the other will reschedule and adapt to the first. The main difference between my change and Chris' change is that instead of getting instances that are older than an instance given, the younger instances (instances created before instance_uuid) are retrieved, and added to the exclude list in get_hosts(). For this to work, I've added a kwarg parameter to get_hosts so the instance_uuid that is being validated can be passed. Closes-Bug: #1423648 Signed-off-by: Miguel Alex Cantu <miguel.ca...@rackspace.com> --- nova/compute/manager.py| 8 -- nova/objects/instance_group.py | 39 +- nova/tests/unit/objects/test_instance_group.py | 21 ++ 3 files changed, 65 insertions(+), 3 deletions(-) diff --git a/nova/compute/manager.py b/nova/compute/manager.py index 2112c33..8b05aee 100644 --- a/nova/compute/manager.py +++ b/nova/compute/manager.py @@ -1288,7 +1288,9 @@ class ComputeManager(manager.Manager): def _do_validation(context, instance, group_hint): group = objects.InstanceGroup.get_by_hint(context, group_hint) if 'anti-affinity' in group.policies: -group_hosts = group.get_hosts(exclude=[instance.uuid]) +group_hosts = group.get_hosts( +exclude=[instance.uuid], +**{'instance_uuid': instance.uuid}) if self.host in group_hosts: msg = _("Anti-affinity instance group policy " "was violated.") @@ -1296,7 +1298,9 @@ class ComputeManager(manager.Manager): instance_uuid=instance.uuid, reason=msg) elif 'affinity' in group.policies: -group_hosts = group.get_hosts(exclude=[instance.uuid]) +group_hosts = group.get_hosts( +exclude=[instance.uuid], +
Re: [openstack-dev] [ceilometer] [openstack-ansible] OpenStack-Ansible Ceilometer Configuration
Nate, That's right. Initially there wasn't any work done to the Ansible playbooks to turn off Aodh alarming when deploying Ceilometer. Ideally the playbooks would check to see if any alarm hosts are defined. If so, then turn on the Aodh configurations within Ceilometer. If not, then leave those configurations out. It's worth noting that Ceilometer Alarms are deprecated in Liberty in favor of Aodh. If you turn off Aodh, then that feature will not be available to you. Feel free to file a bug explaining the situation, and if you are feeling up for it -- add the logic in to check for Aodh hosts :). -Alex From: Potter, Nathaniel <nathaniel.pot...@intel.com> Sent: Tuesday, February 23, 2016 5:56 PM To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [ceilometer] [openstack-ansible] OpenStack-Ansible Ceilometer Configuration Hi Alex, So it's looking to me like my problem was being caused by openstack-ansible trying to set up aodh although I didn't configure it and didn't want to use it. In ceilometer.conf I found that in the [database] section the metering and event connections were correctly looking for mongodb at the IP I set as my bind IP, but it was also adding an alarm connection looking for an aodh user in the database at localhost. This was causing the ceilometer API to time out repeatedly looking for the connection that didn't exist. I don't have any aodh configuration set up in /etc/openstack_deploy, so should that line not have been put into my ceilometer.conf? Thanks, Nate From: Alex Cantu [mailto:miguel.ca...@rackspace.com] Sent: Wednesday, February 17, 2016 4:48 PM To: openstack-dev@lists.openstack.org Subject: Re: [openstack-dev] [ceilometer] [openstack-ansible] OpenStack-Ansible Ceilometer Configuration Nate, The mongodb host can be anywhere, so long as it can reached by the ceilometer containers (on the same network). What branch are you working from? Master and Liberty should have no problems as far as I'm aware. There is a bug open in regards to authentication with swift, but everything else should work fine. Feel free to send over your ceilometer-api, ceilometer-notification-agent, and ceilometer-pollster logs on a pastebin that way I can take a look. -Alex From: Potter, Nathaniel <nathaniel.pot...@intel.com<mailto:nathaniel.pot...@intel.com>> Sent: Wednesday, February 17, 2016 4:17 PM To: openstack-dev@lists.openstack.org<mailto:openstack-dev@lists.openstack.org> Subject: [openstack-dev] [ceilometer] [openstack-ansible] OpenStack-Ansible Ceilometer Configuration Hi everyone, I've been working on setting up a 10 node OpenStack installation with ceilometer using openstack-ansible, but the way I've configured it isn't working for me. I've tried following these instructions http://docs.openstack.org/developer/openstack-ansible/install-guide/configure-ceilometer.html, doing these steps - -I set up MongoDB on the metering-infra_host, making the bind_ip the br-mgmt IP of that host and creating the ceilometer user. -In /etc/openstack_deploy/conf.d/ceilometer.yml I have a compute host under metering-compute_hosts and the infra host that I configured MongoDB on in my metering-infra_hosts. -I also set the ceilometer_db_ip in user_variables to be equal to the bind_ip set on the infra host. Running the ceilometer installation playbook is successful, but when I log into the utility container and try to run ceilometer meter-list it times out and says 'Service Unavailable (HTTP 503)'. Does anyone see anywhere that I went wrong in these steps, should bind-ip be set to something else, or should I be configuring this database on the compute host rather than the infra? The documentation wasn't entirely clear on that point. Thanks, Nate __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [ceilometer] [openstack-ansible] OpenStack-Ansible Ceilometer Configuration
Nate, The mongodb host can be anywhere, so long as it can reached by the ceilometer containers (on the same network). What branch are you working from? Master and Liberty should have no problems as far as I'm aware. There is a bug open in regards to authentication with swift, but everything else should work fine. Feel free to send over your ceilometer-api, ceilometer-notification-agent, and ceilometer-pollster logs on a pastebin that way I can take a look. -Alex From: Potter, NathanielSent: Wednesday, February 17, 2016 4:17 PM To: openstack-dev@lists.openstack.org Subject: [openstack-dev] [ceilometer] [openstack-ansible] OpenStack-Ansible Ceilometer Configuration Hi everyone, I've been working on setting up a 10 node OpenStack installation with ceilometer using openstack-ansible, but the way I've configured it isn't working for me. I've tried following these instructions http://docs.openstack.org/developer/openstack-ansible/install-guide/configure-ceilometer.html, doing these steps - -I set up MongoDB on the metering-infra_host, making the bind_ip the br-mgmt IP of that host and creating the ceilometer user. -In /etc/openstack_deploy/conf.d/ceilometer.yml I have a compute host under metering-compute_hosts and the infra host that I configured MongoDB on in my metering-infra_hosts. -I also set the ceilometer_db_ip in user_variables to be equal to the bind_ip set on the infra host. Running the ceilometer installation playbook is successful, but when I log into the utility container and try to run ceilometer meter-list it times out and says 'Service Unavailable (HTTP 503)'. Does anyone see anywhere that I went wrong in these steps, should bind-ip be set to something else, or should I be configuring this database on the compute host rather than the infra? The documentation wasn't entirely clear on that point. Thanks, Nate __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev