Re: [openstack-dev] [heat][nova] VM restarting on host failure in convergence

Russell Bryant Wed, 17 Sep 2014 10:42:06 -0700

On 09/17/2014 09:03 AM, Jastrzebski, Michal wrote:
> In short, what we'll need from nova is to have 100% reliable
> host-health monitor and equally reliable rebuild/evacuate mechanism
> with fencing and scheduler. In heat we need scallable and reliable
> event listener and engine to decide which action to perform in given
> situation.


Unfortunately, I don't think Nova can provide this alone.  Nova only
knows about whether or not the nova-compute daemon is current
communicating with the rest of the system.  Even if the nova-compute
daemon drops out, the compute node may still be running all instances
just fine.  We certainly don't want to impact those running workloads
unless absolutely necessary.

I understand that you're suggesting that we enhance Nova to be able to
provide that level of knowledge and control.  I actually don't think
Nova should have this knowledge of its underlying infrastructure.

I would put the host monitoring infrastructure (to determine if a host
is down) and fencing capability as out of scope for Nova and as a part
of the supporting infrastructure.  Assuming those pieces can properly
detect that a host is down and fence it, then all that's needed from
Nova is the evacuate capability, which is already there.  There may be
some enhancements that could be done to it, but surely it's quite close.

There's also the part where a notification needs to go out saying that
the instance has failed.  Some thing (which could be Heat in the case of
this proposal) can react to that, either directly or via ceilometer, for
example.  There is an API today to hard reset the state of an instance
to ERROR.  After a host is fenced, you could use this API to mark all
instances on that host as dead.  I'm not sure if there's an easy way to
do that for all instances on a host today.  That's likely an enhancement
we could make to python-novaclient, similar to the "evacuate all
instances on a host" enhancement that was done in novaclient.

-- 
Russell Bryant

_______________________________________________
OpenStack-dev mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [heat][nova] VM restarting on host failure in convergence

Reply via email to