Re: [openstack-dev] [vmware][nova] Prevent HA configuration with different hostnames

2015-02-11 Thread Matthew Booth
On 11/02/15 16:40, Gary Kotton wrote:
> 
> 
> On 2/11/15, 6:35 PM, "Sylvain Bauza"  wrote:
> 
>>
>> Le 11/02/2015 17:04, Gary Kotton a écrit :
>>> I posted a fix that does not break things and supports HA.
>>> https://review.openstack.org/154029
>>
>>
>> Just let's be clear, HA is *not* supported by Nova now.
> 
> That is not correct. It is actually support if the host_ip is the same. If
> the host_ip is not the same then there is an issue when one of the compute
> nodes restarts - it will delete all instances that do not have its host_ip.

FWIW, I suspect you're correct. My patch enforces that.

So, check out the init code from compute manager:

def init_host(self):
"""Initialization for a standalone compute service."""
self.driver.init_host(host=self.host)

The driver is being configured with the Compute service's 'host' config
variable. Let's scan through and look what else uses that:


def _update_resource_tracker(self, context, instance):
"""Let the resource tracker know that an instance has changed
state."""

if (instance['host'] == self.host and
self.driver.node_is_available(instance['node'])):

Instances with the old host value will be ignored by
_update_resource_tracker. That doesn't sound good.

There's _destroy_evacuated_instances(), which you already found :)

There's this in _validate_instance_group_policy:

group_hosts = group.get_hosts(context, exclude=[instance.uuid])
if self.host in group_hosts:
msg = _("Anti-affinity instance group policy was violated.")

You've changed group affinity.

There's _check_instance_build_time:

filters = {'vm_state': vm_states.BUILDING,
   'host': self.host}

Nova won't check instances created by the old host to see if they're stuck.

There's this in _heal_instance_info_cache:

db_instances = objects.InstanceList.get_by_host(
context, self.host, expected_attrs=[], use_slave=True)

This cleanup job won't find instances created by the old host.

There's this in _poll_rebooting_instances:

filters = {'task_state':
   [task_states.REBOOTING,
task_states.REBOOT_STARTED,
task_states.REBOOT_PENDING],
   'host': self.host}

Nova won't poll instances created by the old host.

This is just a cursory flick through. I'm fairly sure this is going to
be a lot of work to fix. My patch just ensures that Nova refuses to
start instead of letting bad things happen. If you ensure that
'self.host' in the above code is the same for all HA nodes I don't see
why it shouldn't work, though. My patch won't prevent that.

Matt

> 
>>
>> The main reason is that compute *nodes* are considered given by the
>> hypervisor (ie. the virt driver ran by the compute manager worker), so
>> if 2 or more hypervisors on two distinct machines are getting the same
>> list of nodes, then you would have duplicates.
> 
> No. There are no duplicates.
> 
>>
>> -Sylvain
>>
>>> On 2/11/15, 5:55 PM, "Matthew Booth"  wrote:
>>>
 On 11/02/15 15:49, Gary Kotton wrote:
> Hi,
> I do not think that that is a healthy solution. That effectively would
> render a cluster down if the compute node goes down. That would be a
> real
> disaster. The ugly work around is setting the host names to be the
> same
> value.
 I don't think that's an ugly work around. I think that's the only
 currently viable solution.

> This is something that we should discuss at the next summit and I
> would
> hope to propose a topic to talk about.
 Sounds like a good plan. However, given that the bug is marked Critical
 I was assuming we wanted a more expedient fix, which is what I've
 proposed.

 Matt

> Thanks
> Gary
>
> On 2/11/15, 5:31 PM, "Matthew Booth"  wrote:
>
>> I just posted this:
>>
>> https://review.openstack.org/#/c/154907/
>>
>> as an alternative fix for critical bug:
>>
>> https://bugs.launchpad.net/nova/+bug/1419785
>>
>> I've just knocked this up quickly for illustration: it obviously
>> needs
>> plenty of cleanup. I have confirmed that it works, though.
>>
>> Before I take it any further, though, I'd like to get some feedback
>> on
>> the approach. I prefer this to the alternative, because the
>> underlying
>> problem is deeper than supporting evacuate. I'd prefer to be honest
>> with
>> the user and just say it ain't gonna work. The alternative would
>> leave
>> Nova running in a broken state, leaving inconsistent state in its
>> wake
>> as it runs.
>>
>> Matt
>>
>> -- 
>> Matthew Booth
>> Red Hat Engineering, Virtualisation Team
>>
>> Phone: +442070094448 (UK)
>> GPG ID:  D33C3490
>> GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490
>>
>>

Re: [openstack-dev] [vmware][nova] Prevent HA configuration with different hostnames

2015-02-11 Thread Gary Kotton


On 2/11/15, 6:35 PM, "Sylvain Bauza"  wrote:

>
>Le 11/02/2015 17:04, Gary Kotton a écrit :
>> I posted a fix that does not break things and supports HA.
>> https://review.openstack.org/154029
>
>
>Just let's be clear, HA is *not* supported by Nova now.

That is not correct. It is actually support if the host_ip is the same. If
the host_ip is not the same then there is an issue when one of the compute
nodes restarts - it will delete all instances that do not have its host_ip.

>
>The main reason is that compute *nodes* are considered given by the
>hypervisor (ie. the virt driver ran by the compute manager worker), so
>if 2 or more hypervisors on two distinct machines are getting the same
>list of nodes, then you would have duplicates.

No. There are no duplicates.

>
>-Sylvain
>
>> On 2/11/15, 5:55 PM, "Matthew Booth"  wrote:
>>
>>> On 11/02/15 15:49, Gary Kotton wrote:
 Hi,
 I do not think that that is a healthy solution. That effectively would
 render a cluster down if the compute node goes down. That would be a
 real
 disaster. The ugly work around is setting the host names to be the
same
 value.
>>> I don't think that's an ugly work around. I think that's the only
>>> currently viable solution.
>>>
 This is something that we should discuss at the next summit and I
would
 hope to propose a topic to talk about.
>>> Sounds like a good plan. However, given that the bug is marked Critical
>>> I was assuming we wanted a more expedient fix, which is what I've
>>> proposed.
>>>
>>> Matt
>>>
 Thanks
 Gary

 On 2/11/15, 5:31 PM, "Matthew Booth"  wrote:

> I just posted this:
>
> https://review.openstack.org/#/c/154907/
>
> as an alternative fix for critical bug:
>
> https://bugs.launchpad.net/nova/+bug/1419785
>
> I've just knocked this up quickly for illustration: it obviously
>needs
> plenty of cleanup. I have confirmed that it works, though.
>
> Before I take it any further, though, I'd like to get some feedback
>on
> the approach. I prefer this to the alternative, because the
>underlying
> problem is deeper than supporting evacuate. I'd prefer to be honest
> with
> the user and just say it ain't gonna work. The alternative would
>leave
> Nova running in a broken state, leaving inconsistent state in its
>wake
> as it runs.
>
> Matt
>
> -- 
> Matthew Booth
> Red Hat Engineering, Virtualisation Team
>
> Phone: +442070094448 (UK)
> GPG ID:  D33C3490
> GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490
>
>
> 
>__
>__
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe:
> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


 
___
__
 _
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe:
 openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

>>>
>>> -- 
>>> Matthew Booth
>>> Red Hat Engineering, Virtualisation Team
>>>
>>> Phone: +442070094448 (UK)
>>> GPG ID:  D33C3490
>>> GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490
>>>
>>> 
>>>
>>>__
>>> OpenStack Development Mailing List (not for usage questions)
>>> Unsubscribe: 
>>>openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>> 
>>_
>>_
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: 
>>openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
>__
>OpenStack Development Mailing List (not for usage questions)
>Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [vmware][nova] Prevent HA configuration with different hostnames

2015-02-11 Thread Sylvain Bauza


Le 11/02/2015 17:04, Gary Kotton a écrit :

I posted a fix that does not break things and supports HA.
https://review.openstack.org/154029



Just let's be clear, HA is *not* supported by Nova now.

The main reason is that compute *nodes* are considered given by the 
hypervisor (ie. the virt driver ran by the compute manager worker), so 
if 2 or more hypervisors on two distinct machines are getting the same 
list of nodes, then you would have duplicates.


-Sylvain


On 2/11/15, 5:55 PM, "Matthew Booth"  wrote:


On 11/02/15 15:49, Gary Kotton wrote:

Hi,
I do not think that that is a healthy solution. That effectively would
render a cluster down if the compute node goes down. That would be a
real
disaster. The ugly work around is setting the host names to be the same
value.

I don't think that's an ugly work around. I think that's the only
currently viable solution.


This is something that we should discuss at the next summit and I would
hope to propose a topic to talk about.

Sounds like a good plan. However, given that the bug is marked Critical
I was assuming we wanted a more expedient fix, which is what I've
proposed.

Matt


Thanks
Gary

On 2/11/15, 5:31 PM, "Matthew Booth"  wrote:


I just posted this:

https://review.openstack.org/#/c/154907/

as an alternative fix for critical bug:

https://bugs.launchpad.net/nova/+bug/1419785

I've just knocked this up quickly for illustration: it obviously needs
plenty of cleanup. I have confirmed that it works, though.

Before I take it any further, though, I'd like to get some feedback on
the approach. I prefer this to the alternative, because the underlying
problem is deeper than supporting evacuate. I'd prefer to be honest
with
the user and just say it ain't gonna work. The alternative would leave
Nova running in a broken state, leaving inconsistent state in its wake
as it runs.

Matt

--
Matthew Booth
Red Hat Engineering, Virtualisation Team

Phone: +442070094448 (UK)
GPG ID:  D33C3490
GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



_
_
OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



--
Matthew Booth
Red Hat Engineering, Virtualisation Team

Phone: +442070094448 (UK)
GPG ID:  D33C3490
GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [vmware][nova] Prevent HA configuration with different hostnames

2015-02-11 Thread Gary Kotton
I posted a fix that does not break things and supports HA.
https://review.openstack.org/154029

On 2/11/15, 5:55 PM, "Matthew Booth"  wrote:

>On 11/02/15 15:49, Gary Kotton wrote:
>> Hi,
>> I do not think that that is a healthy solution. That effectively would
>> render a cluster down if the compute node goes down. That would be a
>>real
>> disaster. The ugly work around is setting the host names to be the same
>> value.
>
>I don't think that's an ugly work around. I think that's the only
>currently viable solution.
>
>> This is something that we should discuss at the next summit and I would
>> hope to propose a topic to talk about.
>
>Sounds like a good plan. However, given that the bug is marked Critical
>I was assuming we wanted a more expedient fix, which is what I've
>proposed.
>
>Matt
>
>> Thanks
>> Gary
>> 
>> On 2/11/15, 5:31 PM, "Matthew Booth"  wrote:
>> 
>>> I just posted this:
>>>
>>> https://review.openstack.org/#/c/154907/
>>>
>>> as an alternative fix for critical bug:
>>>
>>> https://bugs.launchpad.net/nova/+bug/1419785
>>>
>>> I've just knocked this up quickly for illustration: it obviously needs
>>> plenty of cleanup. I have confirmed that it works, though.
>>>
>>> Before I take it any further, though, I'd like to get some feedback on
>>> the approach. I prefer this to the alternative, because the underlying
>>> problem is deeper than supporting evacuate. I'd prefer to be honest
>>>with
>>> the user and just say it ain't gonna work. The alternative would leave
>>> Nova running in a broken state, leaving inconsistent state in its wake
>>> as it runs.
>>>
>>> Matt
>>>
>>> -- 
>>> Matthew Booth
>>> Red Hat Engineering, Virtualisation Team
>>>
>>> Phone: +442070094448 (UK)
>>> GPG ID:  D33C3490
>>> GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490
>>>
>>> 
>>>
>>>__
>>> OpenStack Development Mailing List (not for usage questions)
>>> Unsubscribe: 
>>>openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>> 
>> 
>> 
>>_
>>_
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: 
>>openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>> 
>
>
>-- 
>Matthew Booth
>Red Hat Engineering, Virtualisation Team
>
>Phone: +442070094448 (UK)
>GPG ID:  D33C3490
>GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490
>
>__
>OpenStack Development Mailing List (not for usage questions)
>Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [vmware][nova] Prevent HA configuration with different hostnames

2015-02-11 Thread Matthew Booth
On 11/02/15 15:49, Gary Kotton wrote:
> Hi,
> I do not think that that is a healthy solution. That effectively would
> render a cluster down if the compute node goes down. That would be a real
> disaster. The ugly work around is setting the host names to be the same
> value.

I don't think that's an ugly work around. I think that's the only
currently viable solution.

> This is something that we should discuss at the next summit and I would
> hope to propose a topic to talk about.

Sounds like a good plan. However, given that the bug is marked Critical
I was assuming we wanted a more expedient fix, which is what I've proposed.

Matt

> Thanks
> Gary
> 
> On 2/11/15, 5:31 PM, "Matthew Booth"  wrote:
> 
>> I just posted this:
>>
>> https://review.openstack.org/#/c/154907/
>>
>> as an alternative fix for critical bug:
>>
>> https://bugs.launchpad.net/nova/+bug/1419785
>>
>> I've just knocked this up quickly for illustration: it obviously needs
>> plenty of cleanup. I have confirmed that it works, though.
>>
>> Before I take it any further, though, I'd like to get some feedback on
>> the approach. I prefer this to the alternative, because the underlying
>> problem is deeper than supporting evacuate. I'd prefer to be honest with
>> the user and just say it ain't gonna work. The alternative would leave
>> Nova running in a broken state, leaving inconsistent state in its wake
>> as it runs.
>>
>> Matt
>>
>> -- 
>> Matthew Booth
>> Red Hat Engineering, Virtualisation Team
>>
>> Phone: +442070094448 (UK)
>> GPG ID:  D33C3490
>> GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490
>>
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 


-- 
Matthew Booth
Red Hat Engineering, Virtualisation Team

Phone: +442070094448 (UK)
GPG ID:  D33C3490
GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [vmware][nova] Prevent HA configuration with different hostnames

2015-02-11 Thread Gary Kotton
Hi,
I do not think that that is a healthy solution. That effectively would
render a cluster down if the compute node goes down. That would be a real
disaster. The ugly work around is setting the host names to be the same
value.
This is something that we should discuss at the next summit and I would
hope to propose a topic to talk about.
Thanks
Gary

On 2/11/15, 5:31 PM, "Matthew Booth"  wrote:

>I just posted this:
>
>https://review.openstack.org/#/c/154907/
>
>as an alternative fix for critical bug:
>
>https://bugs.launchpad.net/nova/+bug/1419785
>
>I've just knocked this up quickly for illustration: it obviously needs
>plenty of cleanup. I have confirmed that it works, though.
>
>Before I take it any further, though, I'd like to get some feedback on
>the approach. I prefer this to the alternative, because the underlying
>problem is deeper than supporting evacuate. I'd prefer to be honest with
>the user and just say it ain't gonna work. The alternative would leave
>Nova running in a broken state, leaving inconsistent state in its wake
>as it runs.
>
>Matt
>
>-- 
>Matthew Booth
>Red Hat Engineering, Virtualisation Team
>
>Phone: +442070094448 (UK)
>GPG ID:  D33C3490
>GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490
>
>__
>OpenStack Development Mailing List (not for usage questions)
>Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [vmware][nova] Prevent HA configuration with different hostnames

2015-02-11 Thread Matthew Booth
I just posted this:

https://review.openstack.org/#/c/154907/

as an alternative fix for critical bug:

https://bugs.launchpad.net/nova/+bug/1419785

I've just knocked this up quickly for illustration: it obviously needs
plenty of cleanup. I have confirmed that it works, though.

Before I take it any further, though, I'd like to get some feedback on
the approach. I prefer this to the alternative, because the underlying
problem is deeper than supporting evacuate. I'd prefer to be honest with
the user and just say it ain't gonna work. The alternative would leave
Nova running in a broken state, leaving inconsistent state in its wake
as it runs.

Matt

-- 
Matthew Booth
Red Hat Engineering, Virtualisation Team

Phone: +442070094448 (UK)
GPG ID:  D33C3490
GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev