Re: [openstack-dev] [vmware][nova] Prevent HA configuration with different hostnames
On 11/02/15 16:40, Gary Kotton wrote: > > > On 2/11/15, 6:35 PM, "Sylvain Bauza" wrote: > >> >> Le 11/02/2015 17:04, Gary Kotton a écrit : >>> I posted a fix that does not break things and supports HA. >>> https://review.openstack.org/154029 >> >> >> Just let's be clear, HA is *not* supported by Nova now. > > That is not correct. It is actually support if the host_ip is the same. If > the host_ip is not the same then there is an issue when one of the compute > nodes restarts - it will delete all instances that do not have its host_ip. FWIW, I suspect you're correct. My patch enforces that. So, check out the init code from compute manager: def init_host(self): """Initialization for a standalone compute service.""" self.driver.init_host(host=self.host) The driver is being configured with the Compute service's 'host' config variable. Let's scan through and look what else uses that: def _update_resource_tracker(self, context, instance): """Let the resource tracker know that an instance has changed state.""" if (instance['host'] == self.host and self.driver.node_is_available(instance['node'])): Instances with the old host value will be ignored by _update_resource_tracker. That doesn't sound good. There's _destroy_evacuated_instances(), which you already found :) There's this in _validate_instance_group_policy: group_hosts = group.get_hosts(context, exclude=[instance.uuid]) if self.host in group_hosts: msg = _("Anti-affinity instance group policy was violated.") You've changed group affinity. There's _check_instance_build_time: filters = {'vm_state': vm_states.BUILDING, 'host': self.host} Nova won't check instances created by the old host to see if they're stuck. There's this in _heal_instance_info_cache: db_instances = objects.InstanceList.get_by_host( context, self.host, expected_attrs=[], use_slave=True) This cleanup job won't find instances created by the old host. There's this in _poll_rebooting_instances: filters = {'task_state': [task_states.REBOOTING, task_states.REBOOT_STARTED, task_states.REBOOT_PENDING], 'host': self.host} Nova won't poll instances created by the old host. This is just a cursory flick through. I'm fairly sure this is going to be a lot of work to fix. My patch just ensures that Nova refuses to start instead of letting bad things happen. If you ensure that 'self.host' in the above code is the same for all HA nodes I don't see why it shouldn't work, though. My patch won't prevent that. Matt > >> >> The main reason is that compute *nodes* are considered given by the >> hypervisor (ie. the virt driver ran by the compute manager worker), so >> if 2 or more hypervisors on two distinct machines are getting the same >> list of nodes, then you would have duplicates. > > No. There are no duplicates. > >> >> -Sylvain >> >>> On 2/11/15, 5:55 PM, "Matthew Booth" wrote: >>> On 11/02/15 15:49, Gary Kotton wrote: > Hi, > I do not think that that is a healthy solution. That effectively would > render a cluster down if the compute node goes down. That would be a > real > disaster. The ugly work around is setting the host names to be the > same > value. I don't think that's an ugly work around. I think that's the only currently viable solution. > This is something that we should discuss at the next summit and I > would > hope to propose a topic to talk about. Sounds like a good plan. However, given that the bug is marked Critical I was assuming we wanted a more expedient fix, which is what I've proposed. Matt > Thanks > Gary > > On 2/11/15, 5:31 PM, "Matthew Booth" wrote: > >> I just posted this: >> >> https://review.openstack.org/#/c/154907/ >> >> as an alternative fix for critical bug: >> >> https://bugs.launchpad.net/nova/+bug/1419785 >> >> I've just knocked this up quickly for illustration: it obviously >> needs >> plenty of cleanup. I have confirmed that it works, though. >> >> Before I take it any further, though, I'd like to get some feedback >> on >> the approach. I prefer this to the alternative, because the >> underlying >> problem is deeper than supporting evacuate. I'd prefer to be honest >> with >> the user and just say it ain't gonna work. The alternative would >> leave >> Nova running in a broken state, leaving inconsistent state in its >> wake >> as it runs. >> >> Matt >> >> -- >> Matthew Booth >> Red Hat Engineering, Virtualisation Team >> >> Phone: +442070094448 (UK) >> GPG ID: D33C3490 >> GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490 >> >>
Re: [openstack-dev] [vmware][nova] Prevent HA configuration with different hostnames
On 2/11/15, 6:35 PM, "Sylvain Bauza" wrote: > >Le 11/02/2015 17:04, Gary Kotton a écrit : >> I posted a fix that does not break things and supports HA. >> https://review.openstack.org/154029 > > >Just let's be clear, HA is *not* supported by Nova now. That is not correct. It is actually support if the host_ip is the same. If the host_ip is not the same then there is an issue when one of the compute nodes restarts - it will delete all instances that do not have its host_ip. > >The main reason is that compute *nodes* are considered given by the >hypervisor (ie. the virt driver ran by the compute manager worker), so >if 2 or more hypervisors on two distinct machines are getting the same >list of nodes, then you would have duplicates. No. There are no duplicates. > >-Sylvain > >> On 2/11/15, 5:55 PM, "Matthew Booth" wrote: >> >>> On 11/02/15 15:49, Gary Kotton wrote: Hi, I do not think that that is a healthy solution. That effectively would render a cluster down if the compute node goes down. That would be a real disaster. The ugly work around is setting the host names to be the same value. >>> I don't think that's an ugly work around. I think that's the only >>> currently viable solution. >>> This is something that we should discuss at the next summit and I would hope to propose a topic to talk about. >>> Sounds like a good plan. However, given that the bug is marked Critical >>> I was assuming we wanted a more expedient fix, which is what I've >>> proposed. >>> >>> Matt >>> Thanks Gary On 2/11/15, 5:31 PM, "Matthew Booth" wrote: > I just posted this: > > https://review.openstack.org/#/c/154907/ > > as an alternative fix for critical bug: > > https://bugs.launchpad.net/nova/+bug/1419785 > > I've just knocked this up quickly for illustration: it obviously >needs > plenty of cleanup. I have confirmed that it works, though. > > Before I take it any further, though, I'd like to get some feedback >on > the approach. I prefer this to the alternative, because the >underlying > problem is deeper than supporting evacuate. I'd prefer to be honest > with > the user and just say it ain't gonna work. The alternative would >leave > Nova running in a broken state, leaving inconsistent state in its >wake > as it runs. > > Matt > > -- > Matthew Booth > Red Hat Engineering, Virtualisation Team > > Phone: +442070094448 (UK) > GPG ID: D33C3490 > GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490 > > > >__ >__ > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: > openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ __ _ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >>> >>> -- >>> Matthew Booth >>> Red Hat Engineering, Virtualisation Team >>> >>> Phone: +442070094448 (UK) >>> GPG ID: D33C3490 >>> GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490 >>> >>> >>> >>>__ >>> OpenStack Development Mailing List (not for usage questions) >>> Unsubscribe: >>>openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> >> >>_ >>_ >> OpenStack Development Mailing List (not for usage questions) >> Unsubscribe: >>openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > >__ >OpenStack Development Mailing List (not for usage questions) >Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [vmware][nova] Prevent HA configuration with different hostnames
Le 11/02/2015 17:04, Gary Kotton a écrit : I posted a fix that does not break things and supports HA. https://review.openstack.org/154029 Just let's be clear, HA is *not* supported by Nova now. The main reason is that compute *nodes* are considered given by the hypervisor (ie. the virt driver ran by the compute manager worker), so if 2 or more hypervisors on two distinct machines are getting the same list of nodes, then you would have duplicates. -Sylvain On 2/11/15, 5:55 PM, "Matthew Booth" wrote: On 11/02/15 15:49, Gary Kotton wrote: Hi, I do not think that that is a healthy solution. That effectively would render a cluster down if the compute node goes down. That would be a real disaster. The ugly work around is setting the host names to be the same value. I don't think that's an ugly work around. I think that's the only currently viable solution. This is something that we should discuss at the next summit and I would hope to propose a topic to talk about. Sounds like a good plan. However, given that the bug is marked Critical I was assuming we wanted a more expedient fix, which is what I've proposed. Matt Thanks Gary On 2/11/15, 5:31 PM, "Matthew Booth" wrote: I just posted this: https://review.openstack.org/#/c/154907/ as an alternative fix for critical bug: https://bugs.launchpad.net/nova/+bug/1419785 I've just knocked this up quickly for illustration: it obviously needs plenty of cleanup. I have confirmed that it works, though. Before I take it any further, though, I'd like to get some feedback on the approach. I prefer this to the alternative, because the underlying problem is deeper than supporting evacuate. I'd prefer to be honest with the user and just say it ain't gonna work. The alternative would leave Nova running in a broken state, leaving inconsistent state in its wake as it runs. Matt -- Matthew Booth Red Hat Engineering, Virtualisation Team Phone: +442070094448 (UK) GPG ID: D33C3490 GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490 __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev _ _ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Matthew Booth Red Hat Engineering, Virtualisation Team Phone: +442070094448 (UK) GPG ID: D33C3490 GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490 __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [vmware][nova] Prevent HA configuration with different hostnames
I posted a fix that does not break things and supports HA. https://review.openstack.org/154029 On 2/11/15, 5:55 PM, "Matthew Booth" wrote: >On 11/02/15 15:49, Gary Kotton wrote: >> Hi, >> I do not think that that is a healthy solution. That effectively would >> render a cluster down if the compute node goes down. That would be a >>real >> disaster. The ugly work around is setting the host names to be the same >> value. > >I don't think that's an ugly work around. I think that's the only >currently viable solution. > >> This is something that we should discuss at the next summit and I would >> hope to propose a topic to talk about. > >Sounds like a good plan. However, given that the bug is marked Critical >I was assuming we wanted a more expedient fix, which is what I've >proposed. > >Matt > >> Thanks >> Gary >> >> On 2/11/15, 5:31 PM, "Matthew Booth" wrote: >> >>> I just posted this: >>> >>> https://review.openstack.org/#/c/154907/ >>> >>> as an alternative fix for critical bug: >>> >>> https://bugs.launchpad.net/nova/+bug/1419785 >>> >>> I've just knocked this up quickly for illustration: it obviously needs >>> plenty of cleanup. I have confirmed that it works, though. >>> >>> Before I take it any further, though, I'd like to get some feedback on >>> the approach. I prefer this to the alternative, because the underlying >>> problem is deeper than supporting evacuate. I'd prefer to be honest >>>with >>> the user and just say it ain't gonna work. The alternative would leave >>> Nova running in a broken state, leaving inconsistent state in its wake >>> as it runs. >>> >>> Matt >>> >>> -- >>> Matthew Booth >>> Red Hat Engineering, Virtualisation Team >>> >>> Phone: +442070094448 (UK) >>> GPG ID: D33C3490 >>> GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490 >>> >>> >>> >>>__ >>> OpenStack Development Mailing List (not for usage questions) >>> Unsubscribe: >>>openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> >> >> >>_ >>_ >> OpenStack Development Mailing List (not for usage questions) >> Unsubscribe: >>openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> > > >-- >Matthew Booth >Red Hat Engineering, Virtualisation Team > >Phone: +442070094448 (UK) >GPG ID: D33C3490 >GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490 > >__ >OpenStack Development Mailing List (not for usage questions) >Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [vmware][nova] Prevent HA configuration with different hostnames
On 11/02/15 15:49, Gary Kotton wrote: > Hi, > I do not think that that is a healthy solution. That effectively would > render a cluster down if the compute node goes down. That would be a real > disaster. The ugly work around is setting the host names to be the same > value. I don't think that's an ugly work around. I think that's the only currently viable solution. > This is something that we should discuss at the next summit and I would > hope to propose a topic to talk about. Sounds like a good plan. However, given that the bug is marked Critical I was assuming we wanted a more expedient fix, which is what I've proposed. Matt > Thanks > Gary > > On 2/11/15, 5:31 PM, "Matthew Booth" wrote: > >> I just posted this: >> >> https://review.openstack.org/#/c/154907/ >> >> as an alternative fix for critical bug: >> >> https://bugs.launchpad.net/nova/+bug/1419785 >> >> I've just knocked this up quickly for illustration: it obviously needs >> plenty of cleanup. I have confirmed that it works, though. >> >> Before I take it any further, though, I'd like to get some feedback on >> the approach. I prefer this to the alternative, because the underlying >> problem is deeper than supporting evacuate. I'd prefer to be honest with >> the user and just say it ain't gonna work. The alternative would leave >> Nova running in a broken state, leaving inconsistent state in its wake >> as it runs. >> >> Matt >> >> -- >> Matthew Booth >> Red Hat Engineering, Virtualisation Team >> >> Phone: +442070094448 (UK) >> GPG ID: D33C3490 >> GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490 >> >> __ >> OpenStack Development Mailing List (not for usage questions) >> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > -- Matthew Booth Red Hat Engineering, Virtualisation Team Phone: +442070094448 (UK) GPG ID: D33C3490 GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490 __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [vmware][nova] Prevent HA configuration with different hostnames
Hi, I do not think that that is a healthy solution. That effectively would render a cluster down if the compute node goes down. That would be a real disaster. The ugly work around is setting the host names to be the same value. This is something that we should discuss at the next summit and I would hope to propose a topic to talk about. Thanks Gary On 2/11/15, 5:31 PM, "Matthew Booth" wrote: >I just posted this: > >https://review.openstack.org/#/c/154907/ > >as an alternative fix for critical bug: > >https://bugs.launchpad.net/nova/+bug/1419785 > >I've just knocked this up quickly for illustration: it obviously needs >plenty of cleanup. I have confirmed that it works, though. > >Before I take it any further, though, I'd like to get some feedback on >the approach. I prefer this to the alternative, because the underlying >problem is deeper than supporting evacuate. I'd prefer to be honest with >the user and just say it ain't gonna work. The alternative would leave >Nova running in a broken state, leaving inconsistent state in its wake >as it runs. > >Matt > >-- >Matthew Booth >Red Hat Engineering, Virtualisation Team > >Phone: +442070094448 (UK) >GPG ID: D33C3490 >GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490 > >__ >OpenStack Development Mailing List (not for usage questions) >Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev