Re: [openstack-dev] [TripleO] Strategy for recovering crashed nodes in the Overcloud?

2014-07-25 Thread Ladislav Smola

Hi,

I believe you are looking for stack convergence in Heat. It's not fully 
implemented yet AFAIK.
You can check it out here 
https://blueprints.launchpad.net/heat/+spec/convergence


Hope it will help you.

Ladislav

On 07/23/2014 12:31 PM, Howley, Tom wrote:


(Resending to properly start new thread.)

Hi,

I'm running a HA overcloud configuration and as far as I'm aware, 
there is currently no mechanism in place for restarting failed nodes 
in the cluster. Originally, I had been wondering if we would use a 
corosync/pacemaker cluster across the control plane with STONITH 
resources configured for each node (a STONITH plugin for Ironic could 
be written). This might be fine if a corosync/pacemaker stack is 
already being used for HA of some components, but it seems overkill 
otherwise. The undercloud heat could be in a good position to restart 
the overcloud nodes -- is that the plan or are there other options 
being considered?


Thanks,

Tom



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [TripleO] Strategy for recovering crashed nodes in the Overcloud?

2014-07-23 Thread Howley, Tom
(Resending to properly start new thread.)



Hi,



I'm running a HA overcloud configuration and as far as I'm aware, there is 
currently no mechanism in place for restarting failed nodes in the cluster. 
Originally, I had been wondering if we would use a corosync/pacemaker cluster 
across the control plane with STONITH resources configured for each node (a 
STONITH plugin for Ironic could be written). This might be fine if a 
corosync/pacemaker stack is already being used for HA of some components, but 
it seems overkill otherwise. The undercloud heat could be in a good position to 
restart the overcloud nodes -- is that the plan or are there other options 
being considered?



Thanks,

Tom
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [TripleO] Strategy for recovering crashed nodes in the Overcloud?

2014-07-22 Thread Howley, Tom
Hi,

I'm running a HA overcloud configuration and as far as I'm aware, there is 
currently no mechanism in place for restarting failed nodes in the cluster. 
Originally, I had been wondering if we would use a corosync/pacemaker cluster 
across the control plane with STONITH resources configured for each node (a 
STONITH plugin for Ironic could be written). This might be fine if a 
corosync/pacemaker stack is already being used for HA of some components, but 
it seems overkill otherwise. The undercloud heat could be in a good position to 
restart the overcloud nodes -- is that the plan or are there other options 
being considered?

Thanks,
Tom

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] Strategy for recovering crashed nodes in the Overcloud?

2014-07-22 Thread Charles Crouch


- Original Message -
 Hi,
 
 I'm running a HA overcloud configuration and as far as I'm aware, there is
 currently no mechanism in place for restarting failed nodes in the cluster.
 Originally, I had been wondering if we would use a corosync/pacemaker
 cluster across the control plane with STONITH resources configured for each
 node (a STONITH plugin for Ironic could be written). 

I know some people are starting to look at how to use pacemaker for fencing/
recovery with TripleO, but I'm not aware of any proposals yet. 
I'm sure as soon as that is published it will hit this list.

This might be fine if a
 corosync/pacemaker stack is already being used for HA of some components,
 but it seems overkill otherwise. 

There is a pending patch to add support for using pacemaker to deal with A/P
services: e.g. https://review.openstack.org/#/c/105397/
I'd expect additional patches like this in the future.

The undercloud heat could be in a good
 position to restart the overcloud nodes -- is that the plan or are there
 other options being considered?
 
 Thanks,
 Tom
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev