[openstack-dev] [TripleO] Strategy for recovering crashed nodes in the Overcloud?

Howley, Tom Wed, 23 Jul 2014 03:34:31 -0700

(Resending to properly start new thread.)



Hi,



I'm running a HA overcloud configuration and as far as I'm aware, there is 
currently no mechanism in place for restarting failed nodes in the cluster. 
Originally, I had been wondering if we would use a corosync/pacemaker cluster 
across the control plane with STONITH resources configured for each node (a 
STONITH plugin for Ironic could be written). This might be fine if a 
corosync/pacemaker stack is already being used for HA of some components, but 
it seems overkill otherwise. The undercloud heat could be in a good position to 
restart the overcloud nodes -- is that the plan or are there other options 
being considered?



Thanks,

Tom

_______________________________________________
OpenStack-dev mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [TripleO] Strategy for recovering crashed nodes in the Overcloud?

Reply via email to