On Tue, Dec 23, 2014 at 6:42 AM, Zane Bitter <zbit...@redhat.com> wrote:

> On 22/12/14 13:21, Steven Hardy wrote:
>
>> Hi all,
>>
>> So, lately I've been having various discussions around $subject, and I
>> know
>> it's something several folks in our community are interested in, so I
>> wanted to get some ideas I've been pondering out there for discussion.
>>
>> I'll start with a proposal of how we might replace HARestarter with
>> AutoScaling group, then give some initial ideas of how we might evolve
>> that
>> into something capable of a sort-of active/active failover.
>>
>> 1. HARestarter replacement.
>>
>> My position on HARestarter has long been that equivalent functionality
>> should be available via AutoScalingGroups of size 1.  Turns out that
>> shouldn't be too hard to do:
>>
>>   resources:
>>    server_group:
>>      type: OS::Heat::AutoScalingGroup
>>      properties:
>>        min_size: 1
>>        max_size: 1
>>        resource:
>>          type: ha_server.yaml
>>
>>    server_replacement_policy:
>>      type: OS::Heat::ScalingPolicy
>>      properties:
>>        # FIXME: this adjustment_type doesn't exist yet
>>        adjustment_type: replace_oldest
>>        auto_scaling_group_id: {get_resource: server_group}
>>        scaling_adjustment: 1
>>
>
> One potential issue with this is that it is a little bit _too_ equivalent
> to HARestarter - it will replace your whole scaled unit (ha_server.yaml in
> this case) rather than just the failed resource inside.
>
>  So, currently our ScalingPolicy resource can only support three adjustment
>> types, all of which change the group capacity.  AutoScalingGroup already
>> supports batched replacements for rolling updates, so if we modify the
>> interface to allow a signal to trigger replacement of a group member, then
>> the snippet above should be logically equivalent to HARestarter AFAICT.
>>
>> The steps to do this should be:
>>
>>   - Standardize the ScalingPolicy-AutoScaling group interface, so
>> aynchronous adjustments (e.g signals) between the two resources don't use
>> the "adjust" method.
>>
>>   - Add an option to replace a member to the signal interface of
>> AutoScalingGroup
>>
>>   - Add the new "replace adjustment type to ScalingPolicy
>>
>
> I think I am broadly in favour of this.
>
>
>  I posted a patch which implements the first step, and the second will be
>> required for TripleO, e.g we should be doing it soon.
>>
>> https://review.openstack.org/#/c/143496/
>> https://review.openstack.org/#/c/140781/
>>
>> 2. A possible next step towards active/active HA failover
>>
>> The next part is the ability to notify before replacement that a scaling
>> action is about to happen (just like we do for LoadBalancer resources
>> already) and orchestrate some or all of the following:
>>
>> - Attempt to quiesce the currently active node (may be impossible if it's
>>    in a bad state)
>>
>> - Detach resources (e.g volumes primarily?) from the current active node,
>>    and attach them to the new active node
>>
>> - Run some config action to activate the new node (e.g run some config
>>    script to fsck and mount a volume, then start some application).
>>
>> The first step is possible by putting a SofwareConfig/SoftwareDeployment
>> resource inside ha_server.yaml (using NO_SIGNAL so we don't fail if the
>> node is too bricked to respond and specifying DELETE action so it only
>> runs
>> when we replace the resource).
>>
>> The third step is possible either via a script inside the box which polls
>> for the volume attachment, or possibly via an update-only software config.
>>
>> The second step is the missing piece AFAICS.
>>
>> I've been wondering if we can do something inside a new heat resource,
>> which knows what the current "active" member of an ASG is, and gets
>> triggered on a "replace" signal to orchestrate e.g deleting and creating a
>> VolumeAttachment resource to move a volume between servers.
>>
>> Something like:
>>
>>   resources:
>>    server_group:
>>      type: OS::Heat::AutoScalingGroup
>>      properties:
>>        min_size: 2
>>        max_size: 2
>>        resource:
>>          type: ha_server.yaml
>>
>>    server_failover_policy:
>>      type: OS::Heat::FailoverPolicy
>>      properties:
>>        auto_scaling_group_id: {get_resource: server_group}
>>        resource:
>>          type: OS::Cinder::VolumeAttachment
>>          properties:
>>              # FIXME: "refs" is a ResourceGroup interface not currently
>>              # available in AutoScalingGroup
>>              instance_uuid: {get_attr: [server_group, refs, 1]}
>>
>>    server_replacement_policy:
>>      type: OS::Heat::ScalingPolicy
>>      properties:
>>        # FIXME: this adjustment_type doesn't exist yet
>>        adjustment_type: replace_oldest
>>        auto_scaling_policy_id: {get_resource: server_failover_policy}
>>        scaling_adjustment: 1
>>
>
> This actually fails because a VolumeAttachment needs to be updated in
> place; if you try to switch servers but keep the same Volume when replacing
> the attachment you'll get an error.
>
> TBH {get_attr: [server_group, refs, 1]} is doing most of the heavy lifting
> here, so in theory you could just have an OS::Cinder::VolumeAttachment
> instead of the FailoverPolicy and then all you need is a way of triggering
> a stack update with the same template & params. I know Ton added a PATCH
> method to update in Juno so that you don't have to pass parameters any
> more, and I believe it's planned to do the same with the template.
>
>  By chaining policies like this we could trigger an update on the
>> attachment
>> resource (or a nested template via a provider resource containing many
>> attachments or other resources) every time the ScalingPolicy is triggered.
>>
>> For the sake of clarity, I've not included the existing stuff like
>> ceilometer alarm resources etc above, but hopefully it gets the idea
>> accross so we can discuss further, what are peoples thoughts?  I'm quite
>> happy to iterate on the idea if folks have suggestions for a better
>> interface etc :)
>>
>> One problem I see with the above approach is you'd have to trigger a
>> failover after stack create to get the initial volume attached, still
>> pondering ideas on how best to solve that..
>>
>
> To me this is falling into the same old trap of "hey, we want to run this
> custom workflow, all we need to do is add a new resource type to hang some
> code on". That's pretty much how we got HARestarter.
>
> Also, like HARestarter, this cannot hope to cover the range of possible
> actions that might be needed by various applications.
>
> IMHO the "right" way to implement this is that the Ceilometer alarm
> triggers a workflow in Mistral that takes the appropriate action defined by
> the user, which may (or may not) include updating the Heat stack to a new
> template where the shared storage gets attached to a different server.
>
>
I agree, we should really be changing our policies to be implemented as
mistral workflows. A good first step would be to have a mistral workflow
heat resource
so that users can start getting more flexibility in what they do with alarm
actions.

-Angus


> cheers,
> Zane.
>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to