> -----Original Message-----
> From: Florian Haas [mailto:flor...@hastexo.com]
> Sent: Thursday, October 16, 2014 10:53 AM
> To: OpenStack Development Mailing List (not for usage questions)
> Subject: Re: [openstack-dev] [Nova] Automatic evacuate
> On Thu, Oct 16, 2014 at 9:25 AM, Jastrzebski, Michal
> <michal.jastrzeb...@intel.com> wrote:
> > In my opinion flavor defining is a bit hacky. Sure, it will provide us
> > functionality fairly quickly, but also will strip us from flexibility
> > Heat would give. Healing can be done in several ways, simple destroy
> > -> create (basic convergence workflow so far), evacuate with or
> > without shared storage, even rebuild vm, probably few more when we put
> > more thoughts to it.
> But then you'd also need to monitor the availability of *individual* guest and
> down you go the rabbit hole.
> So suppose you're monitoring a guest with a simple ping. And it stops
> responding to that ping.

I was more reffering to monitoring host (not guest), and for sure not by ping.
I was thinking of current zookeeper service group implementation, we might want
to use corosync and write servicegroup plugin for that. There are several 
for that, each requires testing really before we make any decission.

There is also fencing case, which we agree is important, and I think nova should
be able to do that (since it does evacuate, it also should do a fencing). But
for working fencing we really need working host health monitoring, so I suggest
we take baby steps here and solve one issue at the time. And that would be host

> (1) Has it died?
> (2) Is it just too busy to respond to the ping?
> (3) Has its guest network stack died?
> (4) Has its host vif died?
> (5) Has the L2 agent on the compute host died?
> (6) Has its host network stack died?
> (7) Has the compute host died?
> Suppose further it's using shared storage (running off an RBD volume or
> using an iSCSI volume, or whatever). Now you have almost as many recovery
> options as possible causes for the failure, and some of those recovery
> options will potentially destroy your guest's data.
> No matter how you twist and turn the problem, you need strongly consistent
> distributed VM state plus fencing. In other words, you need a full blown HA
> stack.
> > I'd rather use nova for low level task and maybe low level monitoring
> > (imho nova should do that using servicegroup). But I'd use something
> > more more configurable for actual task triggering like heat. That
> > would give us framework rather than mechanism. Later we might want to
> > apply HA on network or volume, then we'll have mechanism ready just
> > monitoring hook and healing will need to be implemented.
> >
> > We can use scheduler hints to place resource on host HA-compatible
> > (whichever health action we'd like to use), this will bit more
> > complicated, but also will give us more flexibility.
> I apologize in advance for my bluntness, but this all sounds to me like you're
> vastly underrating the problem of reliable guest state detection and
> recovery. :)

Guest health in my opinion is just a bit out of scope here. If we'll have robust
way of detecting host health, we can pretty much asume that if host dies, 
guests follow.
There are ways to detect guest health (libvirt watchdog, ceilometer, ping you 
but that should be done somewhere else. And for sure not by evacuation.

> > I agree that we all should meet in Paris and discuss that so we can
> > join our forces. This is one of bigger gaps to be filled imho.
> Pretty much every user I've worked with in the last 2 years agrees.
> Granted, my view may be skewed as HA is typically what customers approach
> us for in the first place, but yes, this definitely needs a globally 
> understood
> and supported solution.
> Cheers,
> Florian
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

OpenStack-dev mailing list

Reply via email to