Hi Mike, While not directly answering your question, allow me to share with you the OPNFV Doctor (https://wiki.opnfv.org/doctor), a fault management and maintenance project that extends and uses OpenStack.
The team should deliver today the final document for the two weeks project-wide review. In the meantime you can check the latest available draft here: http://lists.opnfv.org/pipermail/opnfv-tech-discuss/2015-March/001629.html. Feedback is welcome! Thanks, Carlos Carlos Gonçalves | NEC Europe Ltd. | Kurfürsten-Anlage 36 | 69115 Heidelberg | Germany | +49 6221 4342-217 NEC Europe Ltd | Registered Office: Athene, Odyssey Business Park, West End Road, London, HA4 6QE, GB | Registered in England 2832014 From: Mike Dorman [mailto:mdor...@godaddy.com] Sent: 30 March 2015 05:26 To: OpenStack Operators Subject: [Openstack-operators] What to do when a compute node dies? Hi all, I’m curious about how people deal with failures of compute nodes, as in total failure when the box is gone for good. (Mainly care about KVM HV, but also interested in more general cases as well.) The particular situation we’re looking at: how end users could identify or be notified of VMs that no longer exist, because their hypervisor is dead. As I understand it, Nova will still believe VMs are running, and really has no way to know anything has changed (other than the nova-compute instance has dropped off.) I understand failure detection is a tricky thing. But it seems like there must be something a little better than this. Thanks, Mike
_______________________________________________ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators