Re: [openstack-dev] [nova][vmware][ironic] Configuring active/passive HA Nova compute

2015-02-26 Thread Matthew Booth
On 25/02/15 20:18, Joe Gordon wrote:
 
 
 On Fri, Feb 20, 2015 at 3:48 AM, Matthew Booth mbo...@redhat.com
 mailto:mbo...@redhat.com wrote:
 
 Gary Kotton came across a doozy of a bug recently:
 
 https://bugs.launchpad.net/nova/+bug/1419785
 
 In short, when you start a Nova compute, it will query the driver for
 instances and compare that against the expected host of the the instance
 according to the DB. If the driver is reporting an instance the DB
 thinks is on a different host, it assumes the instance was evacuated
 while Nova compute was down, and deletes it on the hypervisor. However,
 Gary found that you trigger this when starting up a backup HA node which
 has a different `host` config setting. i.e. You fail over, and the first
 thing it does is delete all your instances.
 
 Gary and I both agree on a couple of things:
 
 1. Deleting all your instances is bad
 2. HA nova compute is highly desirable for some drivers
 
 
 There is a deeper issue here, that we are trying to work around.  Nova
 was never designed to have entire systems running behind a nova-compute.
 It was designed to have one nova-compute per 'physical box that runs
 instances'
 
 There have been many discussions in the past on how to fix this issue
 (by adding a new point in nova where clustered systems can plug in), but
 if I remember correctly the gotcha was no one was willing to step up to
 do it.

There are 2 unrelated concepts of clusters here. The VMware driver has
both, which seems to result in some confusion. As it happens, this issue
doesn't relate to either of them.

Firstly, there's a VMware cluster. This presents itself, and is managed
as one, single hypervisor. The only issue Nova has with VMware clusters
is in resource tracker, because its resources aren't contiguous. i.e.
It's an accounting issue. It would be good to have a solution to this,
but it doesn't seem to be causing many real world problems.

Secondly there's the concept of 'nodes', whereby 1 nova compute can
manage multiple hypervisors. On VMware this means managing multiple
clusters, because 1 VMware cluster == 1 hypervisor. Both the Ironic and
VMware drivers can do this.

This issue relates to the co-location of nova compute with the
hypervisor. In the case of both Ironic and VMware, it is not possible to
co-locate nova compute with the hypervisor. That means that nova compute
must exist separately and be pointed at the hypervisor, which raises the
possibility that 2 different nova computes might accidentally be pointed
at the same hypervisor. As Gary discovered, this makes bad things
happen. Note that no clusters of either kind described above are
required to trigger this bug.

I have a new patch for this here, btw:
https://review.openstack.org/#/c/158269/ . I'd be grateful for more eyes
on it.

Thanks,

Matt
-- 
Matthew Booth
Red Hat Engineering, Virtualisation Team

Phone: +442070094448 (UK)
GPG ID:  D33C3490
GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][vmware][ironic] Configuring active/passive HA Nova compute

2015-02-25 Thread Matthew Booth
On 25/02/15 11:51, Radoslav Gerganov wrote:
 On 02/23/2015 03:18 PM, Matthew Booth wrote:
 On 23/02/15 12:13, Gary Kotton wrote:


 On 2/23/15, 2:05 PM, Matthew Booth mbo...@redhat.com wrote:

 On 20/02/15 11:48, Matthew Booth wrote:
 Gary Kotton came across a doozy of a bug recently:

 https://bugs.launchpad.net/nova/+bug/1419785

 In short, when you start a Nova compute, it will query the driver for
 instances and compare that against the expected host of the the
 instance
 according to the DB. If the driver is reporting an instance the DB
 thinks is on a different host, it assumes the instance was evacuated
 while Nova compute was down, and deletes it on the hypervisor.
 However,
 Gary found that you trigger this when starting up a backup HA node
 which
 has a different `host` config setting. i.e. You fail over, and the
 first
 thing it does is delete all your instances.

 Gary and I both agree on a couple of things:

 1. Deleting all your instances is bad
 2. HA nova compute is highly desirable for some drivers

 We disagree on the approach to fixing it, though. Gary posted this:

 https://review.openstack.org/#/c/154029/

 I've already outlined my objections to this approach elsewhere, but to
 summarise I think this fixes 1 symptom of a design problem, and leaves
 the rest untouched. If the value of nova compute's `host` changes,
 then
 the assumption that instances associated with that compute can be
 identified by the value of instance.host becomes invalid. This
 assumption is pervasive, so it breaks a lot of stuff. The worst one is
 _destroy_evacuated_instances(), which Gary found, but if you scan
 nova/compute/manager for the string 'self.host' you'll find lots of
 them. For example, all the periodic tasks are broken, including image
 cache management, and the state of ResourceTracker will be unusual.
 Worse, whenever a new instance is created it will have a different
 value
 of instance.host, so instances running on a single hypervisor will
 become partitioned based on which nova compute was used to create
 them.

 In short, the system may appear to function superficially, but it's
 unsupportable.

 I had an alternative idea. The current assumption is that the `host`
 managing a single hypervisor never changes. If we break that
 assumption,
 we break Nova, so we could assert it at startup and refuse to start if
 it's violated. I posted this VMware-specific POC:

 https://review.openstack.org/#/c/154907/

 However, I think I've had a better idea. Nova creates ComputeNode
 objects for its current configuration at startup which, amongst other
 things, are a map of host:hypervisor_hostname. We could assert when
 creating a ComputeNode that hypervisor_hostname is not already
 associated with a different host, and refuse to start if it is. We
 would
 give an appropriate error message explaining that this is a
 misconfiguration. This would prevent the user from hitting any of the
 associated problems, including the deletion of all their instances.

 I have posted a patch implementing the above for review here:

 https://review.openstack.org/#/c/158269/

 I have to look at what you have posted. I think that this topic is
 something that we should speak about at the summit and this should fall
 under some BP and well defined spec. I really would not like to see
 existing installations being broken if and when this patch lands. It may
 also affect Ironic as it works on the same model.

 This patch will only affect installations configured with multiple
 compute hosts for a single hypervisor. These are already broken, so this
 patch will at least let them know if they haven't already noticed.

 It won't affect Ironic, because they configure all compute hosts to have
 the same 'host' value. An Ironic user would only notice this patch if
 they accidentally misconfigured it, which is the intended behaviour.

 Incidentally, I also support more focus on the design here. Until we
 come up with a better design, though, we need to do our best to prevent
 non-trivial corruption from a trivial misconfiguration. I think we need
 to merge this, or something like it, now and still have a summit
 discussion.

 Matt

 
 Hi Matt,
 
 I already posted a comment on your patch but I'd like to reiterate here
 as well.  Currently the VMware driver is using the cluster name as
 hypervisor_hostname which is a problem because you can have different
 clusters with the same name.  We already have a critical bug filed for
 this:
 
 https://bugs.launchpad.net/nova/+bug/1329261
 
 There was an attempt to fix this by using a combination of vCenter UUID
 + cluster_name but it was rejected because this combination was not
 considered a 'real' hostname.  I think that if we go for a DB schema
 change we can fix both issues by renaming hypervisor_hostname to
 hypervisor_id and make it unique.  What do you think?

Well, I think hypervisor_id makes more sense than hypervisor_hostname.
The latter is a confusing. However, I'd prefer not to 

Re: [openstack-dev] [nova][vmware][ironic] Configuring active/passive HA Nova compute

2015-02-25 Thread Radoslav Gerganov

On 02/23/2015 03:18 PM, Matthew Booth wrote:

On 23/02/15 12:13, Gary Kotton wrote:



On 2/23/15, 2:05 PM, Matthew Booth mbo...@redhat.com wrote:


On 20/02/15 11:48, Matthew Booth wrote:

Gary Kotton came across a doozy of a bug recently:

https://bugs.launchpad.net/nova/+bug/1419785

In short, when you start a Nova compute, it will query the driver for
instances and compare that against the expected host of the the instance
according to the DB. If the driver is reporting an instance the DB
thinks is on a different host, it assumes the instance was evacuated
while Nova compute was down, and deletes it on the hypervisor. However,
Gary found that you trigger this when starting up a backup HA node which
has a different `host` config setting. i.e. You fail over, and the first
thing it does is delete all your instances.

Gary and I both agree on a couple of things:

1. Deleting all your instances is bad
2. HA nova compute is highly desirable for some drivers

We disagree on the approach to fixing it, though. Gary posted this:

https://review.openstack.org/#/c/154029/

I've already outlined my objections to this approach elsewhere, but to
summarise I think this fixes 1 symptom of a design problem, and leaves
the rest untouched. If the value of nova compute's `host` changes, then
the assumption that instances associated with that compute can be
identified by the value of instance.host becomes invalid. This
assumption is pervasive, so it breaks a lot of stuff. The worst one is
_destroy_evacuated_instances(), which Gary found, but if you scan
nova/compute/manager for the string 'self.host' you'll find lots of
them. For example, all the periodic tasks are broken, including image
cache management, and the state of ResourceTracker will be unusual.
Worse, whenever a new instance is created it will have a different value
of instance.host, so instances running on a single hypervisor will
become partitioned based on which nova compute was used to create them.

In short, the system may appear to function superficially, but it's
unsupportable.

I had an alternative idea. The current assumption is that the `host`
managing a single hypervisor never changes. If we break that assumption,
we break Nova, so we could assert it at startup and refuse to start if
it's violated. I posted this VMware-specific POC:

https://review.openstack.org/#/c/154907/

However, I think I've had a better idea. Nova creates ComputeNode
objects for its current configuration at startup which, amongst other
things, are a map of host:hypervisor_hostname. We could assert when
creating a ComputeNode that hypervisor_hostname is not already
associated with a different host, and refuse to start if it is. We would
give an appropriate error message explaining that this is a
misconfiguration. This would prevent the user from hitting any of the
associated problems, including the deletion of all their instances.


I have posted a patch implementing the above for review here:

https://review.openstack.org/#/c/158269/


I have to look at what you have posted. I think that this topic is
something that we should speak about at the summit and this should fall
under some BP and well defined spec. I really would not like to see
existing installations being broken if and when this patch lands. It may
also affect Ironic as it works on the same model.


This patch will only affect installations configured with multiple
compute hosts for a single hypervisor. These are already broken, so this
patch will at least let them know if they haven't already noticed.

It won't affect Ironic, because they configure all compute hosts to have
the same 'host' value. An Ironic user would only notice this patch if
they accidentally misconfigured it, which is the intended behaviour.

Incidentally, I also support more focus on the design here. Until we
come up with a better design, though, we need to do our best to prevent
non-trivial corruption from a trivial misconfiguration. I think we need
to merge this, or something like it, now and still have a summit discussion.

Matt



Hi Matt,

I already posted a comment on your patch but I'd like to reiterate here 
as well.  Currently the VMware driver is using the cluster name as 
hypervisor_hostname which is a problem because you can have different 
clusters with the same name.  We already have a critical bug filed for this:


https://bugs.launchpad.net/nova/+bug/1329261

There was an attempt to fix this by using a combination of vCenter UUID 
+ cluster_name but it was rejected because this combination was not 
considered a 'real' hostname.  I think that if we go for a DB schema 
change we can fix both issues by renaming hypervisor_hostname to 
hypervisor_id and make it unique.  What do you think?


Rado

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe

Re: [openstack-dev] [nova][vmware][ironic] Configuring active/passive HA Nova compute

2015-02-25 Thread Joe Gordon
On Fri, Feb 20, 2015 at 3:48 AM, Matthew Booth mbo...@redhat.com wrote:

 Gary Kotton came across a doozy of a bug recently:

 https://bugs.launchpad.net/nova/+bug/1419785

 In short, when you start a Nova compute, it will query the driver for
 instances and compare that against the expected host of the the instance
 according to the DB. If the driver is reporting an instance the DB
 thinks is on a different host, it assumes the instance was evacuated
 while Nova compute was down, and deletes it on the hypervisor. However,
 Gary found that you trigger this when starting up a backup HA node which
 has a different `host` config setting. i.e. You fail over, and the first
 thing it does is delete all your instances.

 Gary and I both agree on a couple of things:

 1. Deleting all your instances is bad
 2. HA nova compute is highly desirable for some drivers


There is a deeper issue here, that we are trying to work around.  Nova was
never designed to have entire systems running behind a nova-compute. It was
designed to have one nova-compute per 'physical box that runs instances'

There have been many discussions in the past on how to fix this issue (by
adding a new point in nova where clustered systems can plug in), but if I
remember correctly the gotcha was no one was willing to step up to do it.



 We disagree on the approach to fixing it, though. Gary posted this:

 https://review.openstack.org/#/c/154029/

 I've already outlined my objections to this approach elsewhere, but to
 summarise I think this fixes 1 symptom of a design problem, and leaves
 the rest untouched. If the value of nova compute's `host` changes, then
 the assumption that instances associated with that compute can be
 identified by the value of instance.host becomes invalid. This
 assumption is pervasive, so it breaks a lot of stuff. The worst one is
 _destroy_evacuated_instances(), which Gary found, but if you scan
 nova/compute/manager for the string 'self.host' you'll find lots of
 them. For example, all the periodic tasks are broken, including image
 cache management, and the state of ResourceTracker will be unusual.
 Worse, whenever a new instance is created it will have a different value
 of instance.host, so instances running on a single hypervisor will
 become partitioned based on which nova compute was used to create them.

 In short, the system may appear to function superficially, but it's
 unsupportable.

 I had an alternative idea. The current assumption is that the `host`
 managing a single hypervisor never changes. If we break that assumption,
 we break Nova, so we could assert it at startup and refuse to start if
 it's violated. I posted this VMware-specific POC:

 https://review.openstack.org/#/c/154907/

 However, I think I've had a better idea. Nova creates ComputeNode
 objects for its current configuration at startup which, amongst other
 things, are a map of host:hypervisor_hostname. We could assert when
 creating a ComputeNode that hypervisor_hostname is not already
 associated with a different host, and refuse to start if it is. We would
 give an appropriate error message explaining that this is a
 misconfiguration. This would prevent the user from hitting any of the
 associated problems, including the deletion of all their instances.

 We can still do active/passive HA!

 If we configure both nodes in the active/passive cluster identically,
 including with the same value of `host`, I don't see why this shouldn't
 work today. I don't even think the configuration is onerous. All we
 would be doing is preventing the user from accidentally running a
 misconfigured HA which leads to inconsistent state, and will eventually
 require manual cleanup.

 We would still have to be careful that we don't bring up both nova
 computes simultaneously. The VMware driver, at least, has hardcoded
 assumptions that it is the only writer in certain circumstances. That
 problem would have to be handled separately, perhaps at the messaging
 layer.

 Matt
 --
 Matthew Booth
 Red Hat Engineering, Virtualisation Team

 Phone: +442070094448 (UK)
 GPG ID:  D33C3490
 GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490

 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][vmware][ironic] Configuring active/passive HA Nova compute

2015-02-23 Thread Matthew Booth
On 20/02/15 11:48, Matthew Booth wrote:
 Gary Kotton came across a doozy of a bug recently:
 
 https://bugs.launchpad.net/nova/+bug/1419785
 
 In short, when you start a Nova compute, it will query the driver for
 instances and compare that against the expected host of the the instance
 according to the DB. If the driver is reporting an instance the DB
 thinks is on a different host, it assumes the instance was evacuated
 while Nova compute was down, and deletes it on the hypervisor. However,
 Gary found that you trigger this when starting up a backup HA node which
 has a different `host` config setting. i.e. You fail over, and the first
 thing it does is delete all your instances.
 
 Gary and I both agree on a couple of things:
 
 1. Deleting all your instances is bad
 2. HA nova compute is highly desirable for some drivers
 
 We disagree on the approach to fixing it, though. Gary posted this:
 
 https://review.openstack.org/#/c/154029/
 
 I've already outlined my objections to this approach elsewhere, but to
 summarise I think this fixes 1 symptom of a design problem, and leaves
 the rest untouched. If the value of nova compute's `host` changes, then
 the assumption that instances associated with that compute can be
 identified by the value of instance.host becomes invalid. This
 assumption is pervasive, so it breaks a lot of stuff. The worst one is
 _destroy_evacuated_instances(), which Gary found, but if you scan
 nova/compute/manager for the string 'self.host' you'll find lots of
 them. For example, all the periodic tasks are broken, including image
 cache management, and the state of ResourceTracker will be unusual.
 Worse, whenever a new instance is created it will have a different value
 of instance.host, so instances running on a single hypervisor will
 become partitioned based on which nova compute was used to create them.
 
 In short, the system may appear to function superficially, but it's
 unsupportable.
 
 I had an alternative idea. The current assumption is that the `host`
 managing a single hypervisor never changes. If we break that assumption,
 we break Nova, so we could assert it at startup and refuse to start if
 it's violated. I posted this VMware-specific POC:
 
 https://review.openstack.org/#/c/154907/
 
 However, I think I've had a better idea. Nova creates ComputeNode
 objects for its current configuration at startup which, amongst other
 things, are a map of host:hypervisor_hostname. We could assert when
 creating a ComputeNode that hypervisor_hostname is not already
 associated with a different host, and refuse to start if it is. We would
 give an appropriate error message explaining that this is a
 misconfiguration. This would prevent the user from hitting any of the
 associated problems, including the deletion of all their instances.

I have posted a patch implementing the above for review here:

https://review.openstack.org/#/c/158269/

Matt
-- 
Matthew Booth
Red Hat Engineering, Virtualisation Team

Phone: +442070094448 (UK)
GPG ID:  D33C3490
GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][vmware][ironic] Configuring active/passive HA Nova compute

2015-02-23 Thread Gary Kotton


On 2/23/15, 2:05 PM, Matthew Booth mbo...@redhat.com wrote:

On 20/02/15 11:48, Matthew Booth wrote:
 Gary Kotton came across a doozy of a bug recently:
 
 https://bugs.launchpad.net/nova/+bug/1419785
 
 In short, when you start a Nova compute, it will query the driver for
 instances and compare that against the expected host of the the instance
 according to the DB. If the driver is reporting an instance the DB
 thinks is on a different host, it assumes the instance was evacuated
 while Nova compute was down, and deletes it on the hypervisor. However,
 Gary found that you trigger this when starting up a backup HA node which
 has a different `host` config setting. i.e. You fail over, and the first
 thing it does is delete all your instances.
 
 Gary and I both agree on a couple of things:
 
 1. Deleting all your instances is bad
 2. HA nova compute is highly desirable for some drivers
 
 We disagree on the approach to fixing it, though. Gary posted this:
 
 https://review.openstack.org/#/c/154029/
 
 I've already outlined my objections to this approach elsewhere, but to
 summarise I think this fixes 1 symptom of a design problem, and leaves
 the rest untouched. If the value of nova compute's `host` changes, then
 the assumption that instances associated with that compute can be
 identified by the value of instance.host becomes invalid. This
 assumption is pervasive, so it breaks a lot of stuff. The worst one is
 _destroy_evacuated_instances(), which Gary found, but if you scan
 nova/compute/manager for the string 'self.host' you'll find lots of
 them. For example, all the periodic tasks are broken, including image
 cache management, and the state of ResourceTracker will be unusual.
 Worse, whenever a new instance is created it will have a different value
 of instance.host, so instances running on a single hypervisor will
 become partitioned based on which nova compute was used to create them.
 
 In short, the system may appear to function superficially, but it's
 unsupportable.
 
 I had an alternative idea. The current assumption is that the `host`
 managing a single hypervisor never changes. If we break that assumption,
 we break Nova, so we could assert it at startup and refuse to start if
 it's violated. I posted this VMware-specific POC:
 
 https://review.openstack.org/#/c/154907/
 
 However, I think I've had a better idea. Nova creates ComputeNode
 objects for its current configuration at startup which, amongst other
 things, are a map of host:hypervisor_hostname. We could assert when
 creating a ComputeNode that hypervisor_hostname is not already
 associated with a different host, and refuse to start if it is. We would
 give an appropriate error message explaining that this is a
 misconfiguration. This would prevent the user from hitting any of the
 associated problems, including the deletion of all their instances.

I have posted a patch implementing the above for review here:

https://review.openstack.org/#/c/158269/

I have to look at what you have posted. I think that this topic is
something that we should speak about at the summit and this should fall
under some BP and well defined spec. I really would not like to see
existing installations being broken if and when this patch lands. It may
also affect Ironic as it works on the same model.


Matt
-- 
Matthew Booth
Red Hat Engineering, Virtualisation Team

Phone: +442070094448 (UK)
GPG ID:  D33C3490
GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][vmware][ironic] Configuring active/passive HA Nova compute

2015-02-23 Thread Matthew Booth
On 23/02/15 12:13, Gary Kotton wrote:
 
 
 On 2/23/15, 2:05 PM, Matthew Booth mbo...@redhat.com wrote:
 
 On 20/02/15 11:48, Matthew Booth wrote:
 Gary Kotton came across a doozy of a bug recently:

 https://bugs.launchpad.net/nova/+bug/1419785

 In short, when you start a Nova compute, it will query the driver for
 instances and compare that against the expected host of the the instance
 according to the DB. If the driver is reporting an instance the DB
 thinks is on a different host, it assumes the instance was evacuated
 while Nova compute was down, and deletes it on the hypervisor. However,
 Gary found that you trigger this when starting up a backup HA node which
 has a different `host` config setting. i.e. You fail over, and the first
 thing it does is delete all your instances.

 Gary and I both agree on a couple of things:

 1. Deleting all your instances is bad
 2. HA nova compute is highly desirable for some drivers

 We disagree on the approach to fixing it, though. Gary posted this:

 https://review.openstack.org/#/c/154029/

 I've already outlined my objections to this approach elsewhere, but to
 summarise I think this fixes 1 symptom of a design problem, and leaves
 the rest untouched. If the value of nova compute's `host` changes, then
 the assumption that instances associated with that compute can be
 identified by the value of instance.host becomes invalid. This
 assumption is pervasive, so it breaks a lot of stuff. The worst one is
 _destroy_evacuated_instances(), which Gary found, but if you scan
 nova/compute/manager for the string 'self.host' you'll find lots of
 them. For example, all the periodic tasks are broken, including image
 cache management, and the state of ResourceTracker will be unusual.
 Worse, whenever a new instance is created it will have a different value
 of instance.host, so instances running on a single hypervisor will
 become partitioned based on which nova compute was used to create them.

 In short, the system may appear to function superficially, but it's
 unsupportable.

 I had an alternative idea. The current assumption is that the `host`
 managing a single hypervisor never changes. If we break that assumption,
 we break Nova, so we could assert it at startup and refuse to start if
 it's violated. I posted this VMware-specific POC:

 https://review.openstack.org/#/c/154907/

 However, I think I've had a better idea. Nova creates ComputeNode
 objects for its current configuration at startup which, amongst other
 things, are a map of host:hypervisor_hostname. We could assert when
 creating a ComputeNode that hypervisor_hostname is not already
 associated with a different host, and refuse to start if it is. We would
 give an appropriate error message explaining that this is a
 misconfiguration. This would prevent the user from hitting any of the
 associated problems, including the deletion of all their instances.

 I have posted a patch implementing the above for review here:

 https://review.openstack.org/#/c/158269/
 
 I have to look at what you have posted. I think that this topic is
 something that we should speak about at the summit and this should fall
 under some BP and well defined spec. I really would not like to see
 existing installations being broken if and when this patch lands. It may
 also affect Ironic as it works on the same model.

This patch will only affect installations configured with multiple
compute hosts for a single hypervisor. These are already broken, so this
patch will at least let them know if they haven't already noticed.

It won't affect Ironic, because they configure all compute hosts to have
the same 'host' value. An Ironic user would only notice this patch if
they accidentally misconfigured it, which is the intended behaviour.

Incidentally, I also support more focus on the design here. Until we
come up with a better design, though, we need to do our best to prevent
non-trivial corruption from a trivial misconfiguration. I think we need
to merge this, or something like it, now and still have a summit discussion.

Matt
-- 
Matthew Booth
Red Hat Engineering, Virtualisation Team

Phone: +442070094448 (UK)
GPG ID:  D33C3490
GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev