> >> This still doesn't do away with the requirement to reliably detect > >> node failure, and to fence misbehaving nodes. Detecting that a node > >> has failed, and fencing it if unsure, is a prerequisite for any > >> recovery action. So you need Corosync/Pacemaker anyway. > > > > Obviously, yes. My post covered all of that directly ... the tagging > > bit was just additional input into the recovery operation. > > This is essentially why I am saying using the Pacemaker stack is the > smarter approach than hacking something into Ceilometer and Heat. You > already need Pacemaker for service availability (and all major vendors > have adopted it for that purpose), so a highly available cloud that > does *not* use Pacemaker at all won't be a vendor supported option for > some time. So people will already be running Pacemaker — then why not > use it for what it's good at?
I may be missing something, but Pacemaker will only provide monitoring of your compute node, right? I think the advantage you would get by using something like Heat is having an instance agent and provide monitoring of your client service, instead of just knowing the status of your hypervisor. Hosts can fail, but there is another array of failures that you can't handle with the global deployment monitoring. -- Thomas _______________________________________________ OpenStack-dev mailing list OpenStackfirstname.lastname@example.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev