Re: [openstack-dev] [vmware][nova] Prevent HA configuration with different hostnames

2015-02-11 Thread Gary Kotton
I posted a fix that does not break things and supports HA.
https://review.openstack.org/154029

On 2/11/15, 5:55 PM, Matthew Booth mbo...@redhat.com wrote:

On 11/02/15 15:49, Gary Kotton wrote:
 Hi,
 I do not think that that is a healthy solution. That effectively would
 render a cluster down if the compute node goes down. That would be a
real
 disaster. The ugly work around is setting the host names to be the same
 value.

I don't think that's an ugly work around. I think that's the only
currently viable solution.

 This is something that we should discuss at the next summit and I would
 hope to propose a topic to talk about.

Sounds like a good plan. However, given that the bug is marked Critical
I was assuming we wanted a more expedient fix, which is what I've
proposed.

Matt

 Thanks
 Gary
 
 On 2/11/15, 5:31 PM, Matthew Booth mbo...@redhat.com wrote:
 
 I just posted this:

 https://review.openstack.org/#/c/154907/

 as an alternative fix for critical bug:

 https://bugs.launchpad.net/nova/+bug/1419785

 I've just knocked this up quickly for illustration: it obviously needs
 plenty of cleanup. I have confirmed that it works, though.

 Before I take it any further, though, I'd like to get some feedback on
 the approach. I prefer this to the alternative, because the underlying
 problem is deeper than supporting evacuate. I'd prefer to be honest
with
 the user and just say it ain't gonna work. The alternative would leave
 Nova running in a broken state, leaving inconsistent state in its wake
 as it runs.

 Matt

 -- 
 Matthew Booth
 Red Hat Engineering, Virtualisation Team

 Phone: +442070094448 (UK)
 GPG ID:  D33C3490
 GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490

 

__
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe: 
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 
 
 
_
_
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe: 
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 


-- 
Matthew Booth
Red Hat Engineering, Virtualisation Team

Phone: +442070094448 (UK)
GPG ID:  D33C3490
GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [vmware][nova] Prevent HA configuration with different hostnames

2015-02-11 Thread Sylvain Bauza


Le 11/02/2015 17:04, Gary Kotton a écrit :

I posted a fix that does not break things and supports HA.
https://review.openstack.org/154029



Just let's be clear, HA is *not* supported by Nova now.

The main reason is that compute *nodes* are considered given by the 
hypervisor (ie. the virt driver ran by the compute manager worker), so 
if 2 or more hypervisors on two distinct machines are getting the same 
list of nodes, then you would have duplicates.


-Sylvain


On 2/11/15, 5:55 PM, Matthew Booth mbo...@redhat.com wrote:


On 11/02/15 15:49, Gary Kotton wrote:

Hi,
I do not think that that is a healthy solution. That effectively would
render a cluster down if the compute node goes down. That would be a
real
disaster. The ugly work around is setting the host names to be the same
value.

I don't think that's an ugly work around. I think that's the only
currently viable solution.


This is something that we should discuss at the next summit and I would
hope to propose a topic to talk about.

Sounds like a good plan. However, given that the bug is marked Critical
I was assuming we wanted a more expedient fix, which is what I've
proposed.

Matt


Thanks
Gary

On 2/11/15, 5:31 PM, Matthew Booth mbo...@redhat.com wrote:


I just posted this:

https://review.openstack.org/#/c/154907/

as an alternative fix for critical bug:

https://bugs.launchpad.net/nova/+bug/1419785

I've just knocked this up quickly for illustration: it obviously needs
plenty of cleanup. I have confirmed that it works, though.

Before I take it any further, though, I'd like to get some feedback on
the approach. I prefer this to the alternative, because the underlying
problem is deeper than supporting evacuate. I'd prefer to be honest
with
the user and just say it ain't gonna work. The alternative would leave
Nova running in a broken state, leaving inconsistent state in its wake
as it runs.

Matt

--
Matthew Booth
Red Hat Engineering, Virtualisation Team

Phone: +442070094448 (UK)
GPG ID:  D33C3490
GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



_
_
OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



--
Matthew Booth
Red Hat Engineering, Virtualisation Team

Phone: +442070094448 (UK)
GPG ID:  D33C3490
GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [vmware][nova] Prevent HA configuration with different hostnames

2015-02-11 Thread Matthew Booth
On 11/02/15 15:49, Gary Kotton wrote:
 Hi,
 I do not think that that is a healthy solution. That effectively would
 render a cluster down if the compute node goes down. That would be a real
 disaster. The ugly work around is setting the host names to be the same
 value.

I don't think that's an ugly work around. I think that's the only
currently viable solution.

 This is something that we should discuss at the next summit and I would
 hope to propose a topic to talk about.

Sounds like a good plan. However, given that the bug is marked Critical
I was assuming we wanted a more expedient fix, which is what I've proposed.

Matt

 Thanks
 Gary
 
 On 2/11/15, 5:31 PM, Matthew Booth mbo...@redhat.com wrote:
 
 I just posted this:

 https://review.openstack.org/#/c/154907/

 as an alternative fix for critical bug:

 https://bugs.launchpad.net/nova/+bug/1419785

 I've just knocked this up quickly for illustration: it obviously needs
 plenty of cleanup. I have confirmed that it works, though.

 Before I take it any further, though, I'd like to get some feedback on
 the approach. I prefer this to the alternative, because the underlying
 problem is deeper than supporting evacuate. I'd prefer to be honest with
 the user and just say it ain't gonna work. The alternative would leave
 Nova running in a broken state, leaving inconsistent state in its wake
 as it runs.

 Matt

 -- 
 Matthew Booth
 Red Hat Engineering, Virtualisation Team

 Phone: +442070094448 (UK)
 GPG ID:  D33C3490
 GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490

 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 
 
 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 


-- 
Matthew Booth
Red Hat Engineering, Virtualisation Team

Phone: +442070094448 (UK)
GPG ID:  D33C3490
GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [vmware][nova] Prevent HA configuration with different hostnames

2015-02-11 Thread Gary Kotton


On 2/11/15, 6:35 PM, Sylvain Bauza sba...@redhat.com wrote:


Le 11/02/2015 17:04, Gary Kotton a écrit :
 I posted a fix that does not break things and supports HA.
 https://review.openstack.org/154029


Just let's be clear, HA is *not* supported by Nova now.

That is not correct. It is actually support if the host_ip is the same. If
the host_ip is not the same then there is an issue when one of the compute
nodes restarts - it will delete all instances that do not have its host_ip.


The main reason is that compute *nodes* are considered given by the
hypervisor (ie. the virt driver ran by the compute manager worker), so
if 2 or more hypervisors on two distinct machines are getting the same
list of nodes, then you would have duplicates.

No. There are no duplicates.


-Sylvain

 On 2/11/15, 5:55 PM, Matthew Booth mbo...@redhat.com wrote:

 On 11/02/15 15:49, Gary Kotton wrote:
 Hi,
 I do not think that that is a healthy solution. That effectively would
 render a cluster down if the compute node goes down. That would be a
 real
 disaster. The ugly work around is setting the host names to be the
same
 value.
 I don't think that's an ugly work around. I think that's the only
 currently viable solution.

 This is something that we should discuss at the next summit and I
would
 hope to propose a topic to talk about.
 Sounds like a good plan. However, given that the bug is marked Critical
 I was assuming we wanted a more expedient fix, which is what I've
 proposed.

 Matt

 Thanks
 Gary

 On 2/11/15, 5:31 PM, Matthew Booth mbo...@redhat.com wrote:

 I just posted this:

 https://review.openstack.org/#/c/154907/

 as an alternative fix for critical bug:

 https://bugs.launchpad.net/nova/+bug/1419785

 I've just knocked this up quickly for illustration: it obviously
needs
 plenty of cleanup. I have confirmed that it works, though.

 Before I take it any further, though, I'd like to get some feedback
on
 the approach. I prefer this to the alternative, because the
underlying
 problem is deeper than supporting evacuate. I'd prefer to be honest
 with
 the user and just say it ain't gonna work. The alternative would
leave
 Nova running in a broken state, leaving inconsistent state in its
wake
 as it runs.

 Matt

 -- 
 Matthew Booth
 Red Hat Engineering, Virtualisation Team

 Phone: +442070094448 (UK)
 GPG ID:  D33C3490
 GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490


 
__
__
 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe:
 openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


 
___
__
 _
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe:
 openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


 -- 
 Matthew Booth
 Red Hat Engineering, Virtualisation Team

 Phone: +442070094448 (UK)
 GPG ID:  D33C3490
 GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490

 

__
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe: 
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

 
_
_
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe: 
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [vmware][nova] Prevent HA configuration with different hostnames

2015-02-11 Thread Matthew Booth
On 11/02/15 16:40, Gary Kotton wrote:
 
 
 On 2/11/15, 6:35 PM, Sylvain Bauza sba...@redhat.com wrote:
 

 Le 11/02/2015 17:04, Gary Kotton a écrit :
 I posted a fix that does not break things and supports HA.
 https://review.openstack.org/154029


 Just let's be clear, HA is *not* supported by Nova now.
 
 That is not correct. It is actually support if the host_ip is the same. If
 the host_ip is not the same then there is an issue when one of the compute
 nodes restarts - it will delete all instances that do not have its host_ip.

FWIW, I suspect you're correct. My patch enforces that.

So, check out the init code from compute manager:

def init_host(self):
Initialization for a standalone compute service.
self.driver.init_host(host=self.host)

The driver is being configured with the Compute service's 'host' config
variable. Let's scan through and look what else uses that:


def _update_resource_tracker(self, context, instance):
Let the resource tracker know that an instance has changed
state.

if (instance['host'] == self.host and
self.driver.node_is_available(instance['node'])):

Instances with the old host value will be ignored by
_update_resource_tracker. That doesn't sound good.

There's _destroy_evacuated_instances(), which you already found :)

There's this in _validate_instance_group_policy:

group_hosts = group.get_hosts(context, exclude=[instance.uuid])
if self.host in group_hosts:
msg = _(Anti-affinity instance group policy was violated.)

You've changed group affinity.

There's _check_instance_build_time:

filters = {'vm_state': vm_states.BUILDING,
   'host': self.host}

Nova won't check instances created by the old host to see if they're stuck.

There's this in _heal_instance_info_cache:

db_instances = objects.InstanceList.get_by_host(
context, self.host, expected_attrs=[], use_slave=True)

This cleanup job won't find instances created by the old host.

There's this in _poll_rebooting_instances:

filters = {'task_state':
   [task_states.REBOOTING,
task_states.REBOOT_STARTED,
task_states.REBOOT_PENDING],
   'host': self.host}

Nova won't poll instances created by the old host.

This is just a cursory flick through. I'm fairly sure this is going to
be a lot of work to fix. My patch just ensures that Nova refuses to
start instead of letting bad things happen. If you ensure that
'self.host' in the above code is the same for all HA nodes I don't see
why it shouldn't work, though. My patch won't prevent that.

Matt

 

 The main reason is that compute *nodes* are considered given by the
 hypervisor (ie. the virt driver ran by the compute manager worker), so
 if 2 or more hypervisors on two distinct machines are getting the same
 list of nodes, then you would have duplicates.
 
 No. There are no duplicates.
 

 -Sylvain

 On 2/11/15, 5:55 PM, Matthew Booth mbo...@redhat.com wrote:

 On 11/02/15 15:49, Gary Kotton wrote:
 Hi,
 I do not think that that is a healthy solution. That effectively would
 render a cluster down if the compute node goes down. That would be a
 real
 disaster. The ugly work around is setting the host names to be the
 same
 value.
 I don't think that's an ugly work around. I think that's the only
 currently viable solution.

 This is something that we should discuss at the next summit and I
 would
 hope to propose a topic to talk about.
 Sounds like a good plan. However, given that the bug is marked Critical
 I was assuming we wanted a more expedient fix, which is what I've
 proposed.

 Matt

 Thanks
 Gary

 On 2/11/15, 5:31 PM, Matthew Booth mbo...@redhat.com wrote:

 I just posted this:

 https://review.openstack.org/#/c/154907/

 as an alternative fix for critical bug:

 https://bugs.launchpad.net/nova/+bug/1419785

 I've just knocked this up quickly for illustration: it obviously
 needs
 plenty of cleanup. I have confirmed that it works, though.

 Before I take it any further, though, I'd like to get some feedback
 on
 the approach. I prefer this to the alternative, because the
 underlying
 problem is deeper than supporting evacuate. I'd prefer to be honest
 with
 the user and just say it ain't gonna work. The alternative would
 leave
 Nova running in a broken state, leaving inconsistent state in its
 wake
 as it runs.

 Matt

 -- 
 Matthew Booth
 Red Hat Engineering, Virtualisation Team

 Phone: +442070094448 (UK)
 GPG ID:  D33C3490
 GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490



 __
 __
 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe:
 openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



 

Re: [openstack-dev] [vmware][nova] Prevent HA configuration with different hostnames

2015-02-11 Thread Gary Kotton
Hi,
I do not think that that is a healthy solution. That effectively would
render a cluster down if the compute node goes down. That would be a real
disaster. The ugly work around is setting the host names to be the same
value.
This is something that we should discuss at the next summit and I would
hope to propose a topic to talk about.
Thanks
Gary

On 2/11/15, 5:31 PM, Matthew Booth mbo...@redhat.com wrote:

I just posted this:

https://review.openstack.org/#/c/154907/

as an alternative fix for critical bug:

https://bugs.launchpad.net/nova/+bug/1419785

I've just knocked this up quickly for illustration: it obviously needs
plenty of cleanup. I have confirmed that it works, though.

Before I take it any further, though, I'd like to get some feedback on
the approach. I prefer this to the alternative, because the underlying
problem is deeper than supporting evacuate. I'd prefer to be honest with
the user and just say it ain't gonna work. The alternative would leave
Nova running in a broken state, leaving inconsistent state in its wake
as it runs.

Matt

-- 
Matthew Booth
Red Hat Engineering, Virtualisation Team

Phone: +442070094448 (UK)
GPG ID:  D33C3490
GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev