Re: [openstack-dev] [heat] Application level HA via Heat

2015-01-02 Thread Zane Bitter

On 24/12/14 05:17, Steven Hardy wrote:

On Mon, Dec 22, 2014 at 03:42:37PM -0500, Zane Bitter wrote:

On 22/12/14 13:21, Steven Hardy wrote:

Hi all,

So, lately I've been having various discussions around $subject, and I know
it's something several folks in our community are interested in, so I
wanted to get some ideas I've been pondering out there for discussion.

I'll start with a proposal of how we might replace HARestarter with
AutoScaling group, then give some initial ideas of how we might evolve that
into something capable of a sort-of active/active failover.

1. HARestarter replacement.

My position on HARestarter has long been that equivalent functionality
should be available via AutoScalingGroups of size 1.  Turns out that
shouldn't be too hard to do:

  resources:
   server_group:
 type: OS::Heat::AutoScalingGroup
 properties:
   min_size: 1
   max_size: 1
   resource:
 type: ha_server.yaml

   server_replacement_policy:
 type: OS::Heat::ScalingPolicy
 properties:
   # FIXME: this adjustment_type doesn't exist yet
   adjustment_type: replace_oldest
   auto_scaling_group_id: {get_resource: server_group}
   scaling_adjustment: 1


One potential issue with this is that it is a little bit _too_ equivalent to
HARestarter - it will replace your whole scaled unit (ha_server.yaml in this
case) rather than just the failed resource inside.


Personally I don't see that as a problem, because the interface makes that
explicit - if you put a resource in an AutoScalingGroup, you expect it to
get created/deleted on group adjustment, so anything you don't want
replaced stays outside the group.


I guess I was thinking about having the same mechanism work when the 
size of the scaling group is not fixed at 1.



Happy to consider other alternatives which do less destructive replacement,
but to me this seems like the simplest possible way to replace HARestarter
with something we can actually support long term.


Yeah, I just get uneasy about features that don't compose. Here you have 
to decide between the replacement policy feature and the feature of 
being able to scale out arbitrary stacks. The two uses are so different 
that they almost don't make sense as the same resource. The result will 
be a lot of people implementing scaling groups inside scaling groups in 
order to take advantage of both sets of behaviour.



Even if just replace failed resource is somehow made available later,
we'll still want to support AutoScalingGroup, and replace_oldest is
likely to be useful in other situations, not just this use-case.

Do you have specific ideas of how the just-replace-failed-resource feature
might be implemented?  A way for a signal to declare a resource failed so
convergence auto-healing does a less destructive replacement?


So, currently our ScalingPolicy resource can only support three adjustment
types, all of which change the group capacity.  AutoScalingGroup already
supports batched replacements for rolling updates, so if we modify the
interface to allow a signal to trigger replacement of a group member, then
the snippet above should be logically equivalent to HARestarter AFAICT.

The steps to do this should be:

  - Standardize the ScalingPolicy-AutoScaling group interface, so
aynchronous adjustments (e.g signals) between the two resources don't use
the adjust method.

  - Add an option to replace a member to the signal interface of
AutoScalingGroup

  - Add the new replace adjustment type to ScalingPolicy


I think I am broadly in favour of this.


Ok, great - I think we'll probably want replace_oldest, replace_newest, and
replace_specific, such that both alarm and operator driven replacement have
flexibility over what member is replaced.


We probably want to allow users to specify the replacement policy (e.g. 
oldest first vs. newest first) for the scaling group itself to use when 
scaling down or during rolling updates. If we had that, we'd probably 
only need a single replace adjustment type - if a particular member is 
specified in the message then it would replace that specific one, 
otherwise the scaling group would choose which to replace based on the 
specified policy.



I posted a patch which implements the first step, and the second will be
required for TripleO, e.g we should be doing it soon.

https://review.openstack.org/#/c/143496/
https://review.openstack.org/#/c/140781/

2. A possible next step towards active/active HA failover

The next part is the ability to notify before replacement that a scaling
action is about to happen (just like we do for LoadBalancer resources
already) and orchestrate some or all of the following:

- Attempt to quiesce the currently active node (may be impossible if it's
   in a bad state)

- Detach resources (e.g volumes primarily?) from the current active node,
   and attach them to the new active node

- Run some config action to activate the new node (e.g run some config
   script to fsck and mount a 

Re: [openstack-dev] [heat] Application level HA via Heat

2014-12-24 Thread Steven Hardy
On Mon, Dec 22, 2014 at 03:42:37PM -0500, Zane Bitter wrote:
 On 22/12/14 13:21, Steven Hardy wrote:
 Hi all,
 
 So, lately I've been having various discussions around $subject, and I know
 it's something several folks in our community are interested in, so I
 wanted to get some ideas I've been pondering out there for discussion.
 
 I'll start with a proposal of how we might replace HARestarter with
 AutoScaling group, then give some initial ideas of how we might evolve that
 into something capable of a sort-of active/active failover.
 
 1. HARestarter replacement.
 
 My position on HARestarter has long been that equivalent functionality
 should be available via AutoScalingGroups of size 1.  Turns out that
 shouldn't be too hard to do:
 
   resources:
server_group:
  type: OS::Heat::AutoScalingGroup
  properties:
min_size: 1
max_size: 1
resource:
  type: ha_server.yaml
 
server_replacement_policy:
  type: OS::Heat::ScalingPolicy
  properties:
# FIXME: this adjustment_type doesn't exist yet
adjustment_type: replace_oldest
auto_scaling_group_id: {get_resource: server_group}
scaling_adjustment: 1
 
 One potential issue with this is that it is a little bit _too_ equivalent to
 HARestarter - it will replace your whole scaled unit (ha_server.yaml in this
 case) rather than just the failed resource inside.

Personally I don't see that as a problem, because the interface makes that
explicit - if you put a resource in an AutoScalingGroup, you expect it to
get created/deleted on group adjustment, so anything you don't want
replaced stays outside the group.

Happy to consider other alternatives which do less destructive replacement,
but to me this seems like the simplest possible way to replace HARestarter
with something we can actually support long term.

Even if just replace failed resource is somehow made available later,
we'll still want to support AutoScalingGroup, and replace_oldest is
likely to be useful in other situations, not just this use-case.

Do you have specific ideas of how the just-replace-failed-resource feature
might be implemented?  A way for a signal to declare a resource failed so
convergence auto-healing does a less destructive replacement?

 So, currently our ScalingPolicy resource can only support three adjustment
 types, all of which change the group capacity.  AutoScalingGroup already
 supports batched replacements for rolling updates, so if we modify the
 interface to allow a signal to trigger replacement of a group member, then
 the snippet above should be logically equivalent to HARestarter AFAICT.
 
 The steps to do this should be:
 
   - Standardize the ScalingPolicy-AutoScaling group interface, so
 aynchronous adjustments (e.g signals) between the two resources don't use
 the adjust method.
 
   - Add an option to replace a member to the signal interface of
 AutoScalingGroup
 
   - Add the new replace adjustment type to ScalingPolicy
 
 I think I am broadly in favour of this.

Ok, great - I think we'll probably want replace_oldest, replace_newest, and
replace_specific, such that both alarm and operator driven replacement have
flexibility over what member is replaced.

 I posted a patch which implements the first step, and the second will be
 required for TripleO, e.g we should be doing it soon.
 
 https://review.openstack.org/#/c/143496/
 https://review.openstack.org/#/c/140781/
 
 2. A possible next step towards active/active HA failover
 
 The next part is the ability to notify before replacement that a scaling
 action is about to happen (just like we do for LoadBalancer resources
 already) and orchestrate some or all of the following:
 
 - Attempt to quiesce the currently active node (may be impossible if it's
in a bad state)
 
 - Detach resources (e.g volumes primarily?) from the current active node,
and attach them to the new active node
 
 - Run some config action to activate the new node (e.g run some config
script to fsck and mount a volume, then start some application).
 
 The first step is possible by putting a SofwareConfig/SoftwareDeployment
 resource inside ha_server.yaml (using NO_SIGNAL so we don't fail if the
 node is too bricked to respond and specifying DELETE action so it only runs
 when we replace the resource).
 
 The third step is possible either via a script inside the box which polls
 for the volume attachment, or possibly via an update-only software config.
 
 The second step is the missing piece AFAICS.
 
 I've been wondering if we can do something inside a new heat resource,
 which knows what the current active member of an ASG is, and gets
 triggered on a replace signal to orchestrate e.g deleting and creating a
 VolumeAttachment resource to move a volume between servers.
 
 Something like:
 
   resources:
server_group:
  type: OS::Heat::AutoScalingGroup
  properties:
min_size: 2
max_size: 2
resource:
  type: 

Re: [openstack-dev] [heat] Application level HA via Heat

2014-12-24 Thread Renat Akhmerov
Hi

 Ok, I'm quite happy to accept this may be a better long-term solution, but
 can anyone comment on the current maturity level of Mistral?  Questions
 which spring to mind are:
 
 - Is the DSL stable now?

You can think “yes” because although we keep adding new features we do it in a 
backwards compatible manner. I personally try to be very cautious about this.

 - What's the roadmap re incubation (there are a lot of TBD's here:
https://wiki.openstack.org/wiki/Mistral/Incubation)

Ooh yeah, this page is very very obsolete which is actually my fault because I 
didn’t pay a lot of attention to this after I heard all these rumors about TC 
changing the whole approach around getting projects incubated/integrated.

I think incubation readiness from a technical perspective is good (various 
style checks, procedures etc.), even if there’s still something that we need to 
adjust it must not be difficult and time consuming. The main question for the 
last half a year has been “What OpenStack program best fits Mistral?”. So far 
we’ve had two candidates: Orchestration and some new program (e.g. Workflow 
Service). However, nothing is decided yet on that.

 - How does deferred authentication work for alarm triggered workflows, e.g
  if a ceilometer alarm (which authenticates as a stack domain user) needs
  to signal Mistral to start a workflow?

It works via Keystone trusts. It works but there’s still an issue that we are 
to fix. If we authenticate by a previously created trust and try to call Nova 
then it fails with an authentication error. I know it’s been solved in other 
projects (e.g. Heat) so we need to look at it.

 I guess a first step is creating a contrib Mistral resource and
 investigating it, but it would be great if anyone has first-hand
 experiences they can share before we burn too much time digging into it.

Yes, we already started discussing how we can create Mistral resource for Heat. 
Looks like there’s a couple of volunteers who can do that. Anyway, I’m totally 
for it and any help from our side can be provided (including implementation 
itself)



Renat Akhmerov
@ Mirantis Inc.


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [heat] Application level HA via Heat

2014-12-24 Thread Clint Byrum
Excerpts from Renat Akhmerov's message of 2014-12-24 03:40:22 -0800:
 Hi
 
  Ok, I'm quite happy to accept this may be a better long-term solution, but
  can anyone comment on the current maturity level of Mistral?  Questions
  which spring to mind are:
  
  - Is the DSL stable now?
 
 You can think “yes” because although we keep adding new features we do it in 
 a backwards compatible manner. I personally try to be very cautious about 
 this.
 
  - What's the roadmap re incubation (there are a lot of TBD's here:
 https://wiki.openstack.org/wiki/Mistral/Incubation)
 
 Ooh yeah, this page is very very obsolete which is actually my fault because 
 I didn’t pay a lot of attention to this after I heard all these rumors about 
 TC changing the whole approach around getting projects incubated/integrated.
 
 I think incubation readiness from a technical perspective is good (various 
 style checks, procedures etc.), even if there’s still something that we need 
 to adjust it must not be difficult and time consuming. The main question for 
 the last half a year has been “What OpenStack program best fits Mistral?”. So 
 far we’ve had two candidates: Orchestration and some new program (e.g. 
 Workflow Service). However, nothing is decided yet on that.
 

It's probably worth re-thinking the discussion above given the governance
changes that are being worked on:

http://governance.openstack.org/resolutions/20141202-project-structure-reform-spec.html

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [heat] Application level HA via Heat

2014-12-24 Thread Renat Akhmerov
Thanks Clint,

I actually didn’t see this before (like I said just rumors) so need to read it 
carefully.

Renat Akhmerov
@ Mirantis Inc.



 On 25 Dec 2014, at 00:18, Clint Byrum cl...@fewbar.com wrote:
 
 Excerpts from Renat Akhmerov's message of 2014-12-24 03:40:22 -0800:
 Hi
 
 Ok, I'm quite happy to accept this may be a better long-term solution, but
 can anyone comment on the current maturity level of Mistral?  Questions
 which spring to mind are:
 
 - Is the DSL stable now?
 
 You can think “yes” because although we keep adding new features we do it in 
 a backwards compatible manner. I personally try to be very cautious about 
 this.
 
 - What's the roadmap re incubation (there are a lot of TBD's here:
   https://wiki.openstack.org/wiki/Mistral/Incubation)
 
 Ooh yeah, this page is very very obsolete which is actually my fault because 
 I didn’t pay a lot of attention to this after I heard all these rumors about 
 TC changing the whole approach around getting projects incubated/integrated.
 
 I think incubation readiness from a technical perspective is good (various 
 style checks, procedures etc.), even if there’s still something that we need 
 to adjust it must not be difficult and time consuming. The main question for 
 the last half a year has been “What OpenStack program best fits Mistral?”. 
 So far we’ve had two candidates: Orchestration and some new program (e.g. 
 Workflow Service). However, nothing is decided yet on that.
 
 
 It's probably worth re-thinking the discussion above given the governance
 changes that are being worked on:
 
 http://governance.openstack.org/resolutions/20141202-project-structure-reform-spec.html
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [heat] Application level HA via Heat

2014-12-22 Thread Zane Bitter

On 22/12/14 13:21, Steven Hardy wrote:

Hi all,

So, lately I've been having various discussions around $subject, and I know
it's something several folks in our community are interested in, so I
wanted to get some ideas I've been pondering out there for discussion.

I'll start with a proposal of how we might replace HARestarter with
AutoScaling group, then give some initial ideas of how we might evolve that
into something capable of a sort-of active/active failover.

1. HARestarter replacement.

My position on HARestarter has long been that equivalent functionality
should be available via AutoScalingGroups of size 1.  Turns out that
shouldn't be too hard to do:

  resources:
   server_group:
 type: OS::Heat::AutoScalingGroup
 properties:
   min_size: 1
   max_size: 1
   resource:
 type: ha_server.yaml

   server_replacement_policy:
 type: OS::Heat::ScalingPolicy
 properties:
   # FIXME: this adjustment_type doesn't exist yet
   adjustment_type: replace_oldest
   auto_scaling_group_id: {get_resource: server_group}
   scaling_adjustment: 1


One potential issue with this is that it is a little bit _too_ 
equivalent to HARestarter - it will replace your whole scaled unit 
(ha_server.yaml in this case) rather than just the failed resource inside.



So, currently our ScalingPolicy resource can only support three adjustment
types, all of which change the group capacity.  AutoScalingGroup already
supports batched replacements for rolling updates, so if we modify the
interface to allow a signal to trigger replacement of a group member, then
the snippet above should be logically equivalent to HARestarter AFAICT.

The steps to do this should be:

  - Standardize the ScalingPolicy-AutoScaling group interface, so
aynchronous adjustments (e.g signals) between the two resources don't use
the adjust method.

  - Add an option to replace a member to the signal interface of
AutoScalingGroup

  - Add the new replace adjustment type to ScalingPolicy


I think I am broadly in favour of this.


I posted a patch which implements the first step, and the second will be
required for TripleO, e.g we should be doing it soon.

https://review.openstack.org/#/c/143496/
https://review.openstack.org/#/c/140781/

2. A possible next step towards active/active HA failover

The next part is the ability to notify before replacement that a scaling
action is about to happen (just like we do for LoadBalancer resources
already) and orchestrate some or all of the following:

- Attempt to quiesce the currently active node (may be impossible if it's
   in a bad state)

- Detach resources (e.g volumes primarily?) from the current active node,
   and attach them to the new active node

- Run some config action to activate the new node (e.g run some config
   script to fsck and mount a volume, then start some application).

The first step is possible by putting a SofwareConfig/SoftwareDeployment
resource inside ha_server.yaml (using NO_SIGNAL so we don't fail if the
node is too bricked to respond and specifying DELETE action so it only runs
when we replace the resource).

The third step is possible either via a script inside the box which polls
for the volume attachment, or possibly via an update-only software config.

The second step is the missing piece AFAICS.

I've been wondering if we can do something inside a new heat resource,
which knows what the current active member of an ASG is, and gets
triggered on a replace signal to orchestrate e.g deleting and creating a
VolumeAttachment resource to move a volume between servers.

Something like:

  resources:
   server_group:
 type: OS::Heat::AutoScalingGroup
 properties:
   min_size: 2
   max_size: 2
   resource:
 type: ha_server.yaml

   server_failover_policy:
 type: OS::Heat::FailoverPolicy
 properties:
   auto_scaling_group_id: {get_resource: server_group}
   resource:
 type: OS::Cinder::VolumeAttachment
 properties:
 # FIXME: refs is a ResourceGroup interface not currently
 # available in AutoScalingGroup
 instance_uuid: {get_attr: [server_group, refs, 1]}

   server_replacement_policy:
 type: OS::Heat::ScalingPolicy
 properties:
   # FIXME: this adjustment_type doesn't exist yet
   adjustment_type: replace_oldest
   auto_scaling_policy_id: {get_resource: server_failover_policy}
   scaling_adjustment: 1


This actually fails because a VolumeAttachment needs to be updated in 
place; if you try to switch servers but keep the same Volume when 
replacing the attachment you'll get an error.


TBH {get_attr: [server_group, refs, 1]} is doing most of the heavy 
lifting here, so in theory you could just have an 
OS::Cinder::VolumeAttachment instead of the FailoverPolicy and then all 
you need is a way of triggering a stack update with the same template  
params. I know Ton added a PATCH method to update in Juno so that you 
don't 

Re: [openstack-dev] [heat] Application level HA via Heat

2014-12-22 Thread Angus Salkeld
On Tue, Dec 23, 2014 at 6:42 AM, Zane Bitter zbit...@redhat.com wrote:

 On 22/12/14 13:21, Steven Hardy wrote:

 Hi all,

 So, lately I've been having various discussions around $subject, and I
 know
 it's something several folks in our community are interested in, so I
 wanted to get some ideas I've been pondering out there for discussion.

 I'll start with a proposal of how we might replace HARestarter with
 AutoScaling group, then give some initial ideas of how we might evolve
 that
 into something capable of a sort-of active/active failover.

 1. HARestarter replacement.

 My position on HARestarter has long been that equivalent functionality
 should be available via AutoScalingGroups of size 1.  Turns out that
 shouldn't be too hard to do:

   resources:
server_group:
  type: OS::Heat::AutoScalingGroup
  properties:
min_size: 1
max_size: 1
resource:
  type: ha_server.yaml

server_replacement_policy:
  type: OS::Heat::ScalingPolicy
  properties:
# FIXME: this adjustment_type doesn't exist yet
adjustment_type: replace_oldest
auto_scaling_group_id: {get_resource: server_group}
scaling_adjustment: 1


 One potential issue with this is that it is a little bit _too_ equivalent
 to HARestarter - it will replace your whole scaled unit (ha_server.yaml in
 this case) rather than just the failed resource inside.

  So, currently our ScalingPolicy resource can only support three adjustment
 types, all of which change the group capacity.  AutoScalingGroup already
 supports batched replacements for rolling updates, so if we modify the
 interface to allow a signal to trigger replacement of a group member, then
 the snippet above should be logically equivalent to HARestarter AFAICT.

 The steps to do this should be:

   - Standardize the ScalingPolicy-AutoScaling group interface, so
 aynchronous adjustments (e.g signals) between the two resources don't use
 the adjust method.

   - Add an option to replace a member to the signal interface of
 AutoScalingGroup

   - Add the new replace adjustment type to ScalingPolicy


 I think I am broadly in favour of this.


  I posted a patch which implements the first step, and the second will be
 required for TripleO, e.g we should be doing it soon.

 https://review.openstack.org/#/c/143496/
 https://review.openstack.org/#/c/140781/

 2. A possible next step towards active/active HA failover

 The next part is the ability to notify before replacement that a scaling
 action is about to happen (just like we do for LoadBalancer resources
 already) and orchestrate some or all of the following:

 - Attempt to quiesce the currently active node (may be impossible if it's
in a bad state)

 - Detach resources (e.g volumes primarily?) from the current active node,
and attach them to the new active node

 - Run some config action to activate the new node (e.g run some config
script to fsck and mount a volume, then start some application).

 The first step is possible by putting a SofwareConfig/SoftwareDeployment
 resource inside ha_server.yaml (using NO_SIGNAL so we don't fail if the
 node is too bricked to respond and specifying DELETE action so it only
 runs
 when we replace the resource).

 The third step is possible either via a script inside the box which polls
 for the volume attachment, or possibly via an update-only software config.

 The second step is the missing piece AFAICS.

 I've been wondering if we can do something inside a new heat resource,
 which knows what the current active member of an ASG is, and gets
 triggered on a replace signal to orchestrate e.g deleting and creating a
 VolumeAttachment resource to move a volume between servers.

 Something like:

   resources:
server_group:
  type: OS::Heat::AutoScalingGroup
  properties:
min_size: 2
max_size: 2
resource:
  type: ha_server.yaml

server_failover_policy:
  type: OS::Heat::FailoverPolicy
  properties:
auto_scaling_group_id: {get_resource: server_group}
resource:
  type: OS::Cinder::VolumeAttachment
  properties:
  # FIXME: refs is a ResourceGroup interface not currently
  # available in AutoScalingGroup
  instance_uuid: {get_attr: [server_group, refs, 1]}

server_replacement_policy:
  type: OS::Heat::ScalingPolicy
  properties:
# FIXME: this adjustment_type doesn't exist yet
adjustment_type: replace_oldest
auto_scaling_policy_id: {get_resource: server_failover_policy}
scaling_adjustment: 1


 This actually fails because a VolumeAttachment needs to be updated in
 place; if you try to switch servers but keep the same Volume when replacing
 the attachment you'll get an error.

 TBH {get_attr: [server_group, refs, 1]} is doing most of the heavy lifting
 here, so in theory you could just have an OS::Cinder::VolumeAttachment
 instead of the