On Thu, Oct 16, 2014 at 11:01 AM, Thomas Herve
<thomas.he...@enovance.com> wrote:
>> >> This still doesn't do away with the requirement to reliably detect
>> >> node failure, and to fence misbehaving nodes. Detecting that a node
>> >> has failed, and fencing it if unsure, is a prerequisite for any
>> >> recovery action. So you need Corosync/Pacemaker anyway.
>> >
>> > Obviously, yes.  My post covered all of that directly ... the tagging
>> > bit was just additional input into the recovery operation.
>> This is essentially why I am saying using the Pacemaker stack is the
>> smarter approach than hacking something into Ceilometer and Heat. You
>> already need Pacemaker for service availability (and all major vendors
>> have adopted it for that purpose), so a highly available cloud that
>> does *not* use Pacemaker at all won't be a vendor supported option for
>> some time. So people will already be running Pacemaker — then why not
>> use it for what it's good at?
> I may be missing something, but Pacemaker will only provide monitoring of 
> your compute node, right? I think the advantage you would get by using 
> something like Heat is having an instance agent and provide monitoring of 
> your client service, instead of just knowing the status of your hypervisor. 
> Hosts can fail, but there is another array of failures that you can't handle 
> with the global deployment monitoring.

You *are* missing something, indeed. :) Pacemaker would be a perfectly
fine tool for also monitoring the status of your guests on the hosts.
So arguably, nova-compute could in fact hook in with pcsd
(https://github.com/feist/pcs/tree/master/pcs -- all in Python) down
the road to inject VM monitoring into the Pacemaker configuration.
This would, of course, need to be specific to the hypervisor so it
would be a job for the nova driver, rather than being implemented at
the nova-compute level.

But my hunch is that that sort of thing would be for the L release;
for Kilo the low-hanging fruit would be to defend against host failure
(meaning, compute node failure, unrecoverable nova-compute service
failure, etc.).


OpenStack-dev mailing list

Reply via email to