[openstack-dev] [nova] Fixing race condition with server groups and affinity policy

2016-08-22 Thread Alex Cantu
According to [1]:

"""

1) It's possible to hit a similar race condition for server groups with the 
"affinity" policy. Suppose we create a new group and then create two instances 
simultaneously. The scheduler sees an empty group for each, assigns them to 
different compute nodes, and the policy is violated. We should add a check in 
_validate_instance_group_policy() to cover the "affinity" case.


2) It's possible to create two instances simultaneously, have them be scheduled 
to conflicting hosts, both of them detect the problem in 
_validate_instance_group_policy(), both of them get sent back for rescheduling, 
and both of them get assigned to conflicting hosts *again*, resulting in an 
error. In order to fix this I propose that instead of checking against all 
other instances in the group, we only check against instances that were created 
before the current instance.

"""


I've been trying to improve upon Chris' solution here[2], but honestly i'm not 
exactly sure if I'm approaching this correctly. Chris' solution is to only 
consider the older members when validating group policy(ignoring any members 
younger than the instance we are validating), eliminating the possibility for 
the two cases mentioned above. I don't really know enough about the scheduler 
and instance group code to validate the integrity of the solution, hence my 
plea for help here :)


I've attached a git format of my attempt to make the implementation a little 
cleaner. It's the same solution Chris implemented, just moved to the 
setup_hosts() method to avoid creating a new remotable method.
I haven't gotten the tests to pass, i'm having trouble getting the expected 
filters to work.


I'm pretty new to the openstack code base, is there anyway anyone here could 
give me some direction? Is the solution correct? What am I missing? How can I 
fill in the gaps? Is this even still a valid issue?


Thanks!


[1]

https://bugs.launchpad.net/nova/+bug/1423648

[2]

https://review.openstack.org/#/c/164762/
From b6ea4e26d5feac8c57d6eb0619f29c22c1f917ca Mon Sep 17 00:00:00 2001
From: Miguel Alex Cantu <miguel.ca...@rackspace.com>
Date: Wed, 17 Aug 2016 22:32:55 +
Subject: [PATCH] Fix race in server group policy validation

This is a follow up to Chris Friesen's patch here:
https://review.openstack.org/#/c/164762/9

There is a race condition when validating instances being
simultaneously created as part of the same server group.

It's possible to create two instances simultaneously, have them be
scheduled to hosts that would violate the group policy, have both
of them detect the problem in _validate_instance_group_policy(),
both of them get sent back for rescheduling, and both of them get
assigned to conflicting hosts *again*, resulting in an error.

The fix is to modify the validation code to ignore
group members created after the instance being validated.  This
ensures that at least one of the instances will run on the chosen
host, and the other will reschedule and adapt to the first.

The main difference between my change and Chris' change is that
instead of getting instances that are older than an instance given,
the younger instances (instances created before instance_uuid) are
retrieved, and added to the exclude list in get_hosts().

For this to work, I've added a kwarg parameter to get_hosts so
the instance_uuid that is being validated can be passed.

Closes-Bug: #1423648

Signed-off-by: Miguel Alex Cantu <miguel.ca...@rackspace.com>
---
 nova/compute/manager.py|  8 --
 nova/objects/instance_group.py | 39 +-
 nova/tests/unit/objects/test_instance_group.py | 21 ++
 3 files changed, 65 insertions(+), 3 deletions(-)

diff --git a/nova/compute/manager.py b/nova/compute/manager.py
index 2112c33..8b05aee 100644
--- a/nova/compute/manager.py
+++ b/nova/compute/manager.py
@@ -1288,7 +1288,9 @@ class ComputeManager(manager.Manager):
 def _do_validation(context, instance, group_hint):
 group = objects.InstanceGroup.get_by_hint(context, group_hint)
 if 'anti-affinity' in group.policies:
-group_hosts = group.get_hosts(exclude=[instance.uuid])
+group_hosts = group.get_hosts(
+exclude=[instance.uuid],
+**{'instance_uuid': instance.uuid})
 if self.host in group_hosts:
 msg = _("Anti-affinity instance group policy "
 "was violated.")
@@ -1296,7 +1298,9 @@ class ComputeManager(manager.Manager):
 instance_uuid=instance.uuid,
 reason=msg)
 elif 'affinity' in group.policies:
-group_hosts = group.get_hosts(exclude=[instance.uuid])
+group_hosts = group.get_hosts(
+exclude=[instance.uuid],
+ 

Re: [openstack-dev] [ceilometer] [openstack-ansible] OpenStack-Ansible Ceilometer Configuration

2016-02-23 Thread Alex Cantu
Nate,


That's right. Initially there wasn't any work done to the Ansible playbooks to 
turn off Aodh alarming when deploying Ceilometer.

Ideally the playbooks would check to see if any alarm hosts are defined. If so, 
then turn on the Aodh configurations within Ceilometer. If not, then leave 
those configurations out.


It's worth noting that Ceilometer Alarms are deprecated in Liberty in favor of 
Aodh. If you turn off Aodh, then that feature will not be available to you.

Feel free to file a bug explaining the situation, and if you are feeling up for 
it -- add the logic in to check for Aodh hosts :).


-Alex


From: Potter, Nathaniel <nathaniel.pot...@intel.com>
Sent: Tuesday, February 23, 2016 5:56 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [ceilometer] [openstack-ansible] OpenStack-Ansible 
Ceilometer Configuration

Hi Alex,

So it's looking to me like my problem was being caused by openstack-ansible 
trying to set up aodh although I didn't configure it and didn't want to use it. 
In ceilometer.conf I found that in the [database] section the metering and 
event connections were correctly looking for mongodb at the IP I set as my bind 
IP, but it was also adding an alarm connection looking for an aodh user in the 
database at localhost. This was causing the ceilometer API to time out 
repeatedly looking for the connection that didn't exist. I don't have any aodh 
configuration set up in /etc/openstack_deploy, so should that line not have 
been put into my ceilometer.conf?

Thanks,
Nate
From: Alex Cantu [mailto:miguel.ca...@rackspace.com]
Sent: Wednesday, February 17, 2016 4:48 PM
To: openstack-dev@lists.openstack.org
Subject: Re: [openstack-dev] [ceilometer] [openstack-ansible] OpenStack-Ansible 
Ceilometer Configuration


Nate,



The mongodb host can be anywhere, so long as it can reached by the ceilometer 
containers (on the same network).

What branch are you working from? Master and Liberty should have no problems as 
far as I'm aware. There is a bug open in regards to authentication with swift, 
but everything else should work fine.



Feel free to send over your ceilometer-api, ceilometer-notification-agent, and 
ceilometer-pollster logs on a pastebin that way I can take a look.



-Alex


From: Potter, Nathaniel 
<nathaniel.pot...@intel.com<mailto:nathaniel.pot...@intel.com>>
Sent: Wednesday, February 17, 2016 4:17 PM
To: openstack-dev@lists.openstack.org<mailto:openstack-dev@lists.openstack.org>
Subject: [openstack-dev] [ceilometer] [openstack-ansible] OpenStack-Ansible 
Ceilometer Configuration

Hi everyone,

I've been working on setting up a 10 node OpenStack installation with 
ceilometer using openstack-ansible, but the way I've configured it isn't 
working for me. I've tried following these instructions 
http://docs.openstack.org/developer/openstack-ansible/install-guide/configure-ceilometer.html,
 doing these steps -


-I set up MongoDB on the metering-infra_host, making the bind_ip the 
br-mgmt IP of that host and creating the ceilometer user.

-In /etc/openstack_deploy/conf.d/ceilometer.yml I have a compute host 
under metering-compute_hosts and the infra host that I configured MongoDB on in 
my metering-infra_hosts.

-I also set the ceilometer_db_ip in user_variables to be equal to the 
bind_ip set on the infra host.

Running the ceilometer installation playbook is successful, but when I log into 
the utility container and try to run ceilometer meter-list it times out and 
says 'Service Unavailable (HTTP 503)'.

Does anyone see anywhere that I went wrong in these steps, should bind-ip be 
set to something else, or should I be configuring this database on the compute 
host rather than the infra? The documentation wasn't entirely clear on that 
point.

Thanks,
Nate

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [ceilometer] [openstack-ansible] OpenStack-Ansible Ceilometer Configuration

2016-02-17 Thread Alex Cantu
Nate,


The mongodb host can be anywhere, so long as it can reached by the ceilometer 
containers (on the same network).

What branch are you working from? Master and Liberty should have no problems as 
far as I'm aware. There is a bug open in regards to authentication with swift, 
but everything else should work fine.


Feel free to send over your ceilometer-api, ceilometer-notification-agent, and 
ceilometer-pollster logs on a pastebin that way I can take a look.


-Alex


From: Potter, Nathaniel 
Sent: Wednesday, February 17, 2016 4:17 PM
To: openstack-dev@lists.openstack.org
Subject: [openstack-dev] [ceilometer] [openstack-ansible] OpenStack-Ansible 
Ceilometer Configuration

Hi everyone,

I've been working on setting up a 10 node OpenStack installation with 
ceilometer using openstack-ansible, but the way I've configured it isn't 
working for me. I've tried following these instructions 
http://docs.openstack.org/developer/openstack-ansible/install-guide/configure-ceilometer.html,
 doing these steps -


-I set up MongoDB on the metering-infra_host, making the bind_ip the 
br-mgmt IP of that host and creating the ceilometer user.

-In /etc/openstack_deploy/conf.d/ceilometer.yml I have a compute host 
under metering-compute_hosts and the infra host that I configured MongoDB on in 
my metering-infra_hosts.

-I also set the ceilometer_db_ip in user_variables to be equal to the 
bind_ip set on the infra host.

Running the ceilometer installation playbook is successful, but when I log into 
the utility container and try to run ceilometer meter-list it times out and 
says 'Service Unavailable (HTTP 503)'.

Does anyone see anywhere that I went wrong in these steps, should bind-ip be 
set to something else, or should I be configuring this database on the compute 
host rather than the infra? The documentation wasn't entirely clear on that 
point.

Thanks,
Nate

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev