Re: [openstack-dev] [Heat] [TripleO] Rolling updates spec re-written. RFC

Zane Bitter Tue, 04 Feb 2014 16:20:08 -0800

On 03/02/14 17:09, Clint Byrum wrote:

Excerpts from Thomas Herve's message of 2014-02-03 12:46:05 -0800:

So, I wrote the original rolling updates spec about a year ago, and the
time has come to get serious about implementation. I went through it and
basically rewrote the entire thing to reflect the knowledge I have
gained from a year of working with Heat.


Any and all comments are welcome. I intend to start implementation very
soon, as this is an important component of the HA story for TripleO:

https://wiki.openstack.org/wiki/Heat/Blueprints/RollingUpdates


Hi Clint, thanks for pushing this.

First, I don't think RollingUpdatePattern and CanaryUpdatePattern should be 2 
different entities. The second just looks like a parametrization of the first 
(growth_factor=1?).


Perhaps they can just be one. Until I find parameters which would need
to mean something different, I'll just use UpdatePattern.


I then feel that using (abusing?) depends_on for update pattern is a bit weird. 
Maybe I'm influenced by the CFN design, but the separate UpdatePolicy attribute 
feels better (although I would probably use a property). I guess my main 
question is around the meaning of using the update pattern on a server 
instance. I think I see what you want to do for the group, where child_updating 
would return a number, but I have no idea what it means for a single resource. 
Could you detail the operation a bit more in the document?


I would be o-k with adding another keyword. The idea in abusing depends_on
is that it changes the core language less. Properties is definitely out
for the reasons Christopher brought up, properties is really meant to
be for the resource's end target only.

Agree, -1 for properties - those belong to the resource, and this databelongs to Heat.

UpdatePolicy in cfn is a single string, and causes very generic rolling


Huh?

http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-attribute-updatepolicy.html

Not only is it not just a single string (in fact, it looks a lot likethe properties you have defined), it's even got another layer ofindirection so you can define different types of update policy (rollingvs. canary, anybody?). It's an extremely flexible syntax.

BTW, given that we already implemented this in autoscaling, it might behelpful to talk more specifically about what we need to do in additionin order to support the use cases you have in mind.

update behavior. I want this resource to be able to control multiple
groups as if they are one in some cases (Such as a case where a user
has migrated part of an app to a new type of server, but not all.. so
they will want to treat the entire aggregate as one rolling update).

I'm o-k with overloading it to allow resource references, but I'd like
to hear more people take issue with depends_on before I select that
course.

Resource references in general, and depends_on in particular, feel likevery much the wrong abstraction to me. This is a policy, not a resource.

To answer your question, using it with a server instance allows
rolling updates across non-grouped resources. In the example the
rolling_update_dbs does this.

That's not a great example, because one DB server depends on the other,forcing them into updating serially anyway.

I have to say that even in general, this whole idea about applyingupdate policies to non-grouped resources doesn't make a whole lot ofsense to me. For non-grouped resources you control the resourcedefinitions individually - if you don't want them to update at aparticular time, you have the option of just not updating them.

Where you _do_ need it is for scaling groups where every server is basedon the same launch config, so you need a way to control the membersindividually - by batching up operations (done), adding delays (done)or, even better, notifications and callbacks.

So it seems like doing 'rolling' updates for any random subset ofresources is effectively turning Heat into something of a poor-man'sworkflow service, and IMHO that is probably a mistake.

What we do need for all resources (not just scaling groups) is a way forthe user to say "for this particular resource, notify me when it hasupdated (but, if possible, before we have taken any destructive actionson it), give me a chance to test it and accept or reject the update".For example, when you resize a server, give the user a chance to confirmor reject the change at the VERIFY_RESIZE step (Trove requires this). Orwhen you replace a server during an update, give the user a chance totest the new server and either keep it (continue on and delete the oldone) or not (roll back). Or when you replace a server in a scalinggroup, notify the load balancer _or some other thing_ (e.g. OpenShiftbroker node) that a replacement has been created and wait for it toswitch over to the new one before deleting the old one. Or, of course,when you update a server to some new config, give the user a chance totest it out and make sure it works before continuing with the stackupdate. All of these use cases can, I think, be solved with a singlefeature.


The open questions for me are:

1) How do we notify the user that it's time to check on a resource?(Marconi?)2) How does the user ack/nack? (You're suggesting reusing WaitCondition,and that makes sense to me.)3) How do we break up the operations so the notification occurs at theright time? (With difficulty, but it should be do-able.)4) How does the user indicate for which resources they want to benotified? (Inside an update_policy? Another new directive at thetype/properties/depends_on/update_policy level?)

It also seems that the interface you're creating (child_creating/child_updating) is fairly specific 
to your use case. For autoscaling we have a need for more generic notification system, it would be 
nice to find common grounds. Maybe we can invert the relationship? Add a 
"notified_resources" attribute, which would call hooks on the "parent" when 
actions are happening.


I'm open to a different interface design. I don't really have a firm
grasp of the generic behavior you'd like to model though. This is quite
concrete and would be entirely hidden from template authors, though not
from resource plugin authors. Attributes sound like something where you
want the template authors to get involved in specifying, but maybe that
was just an overloaded term.

So perhaps we can replace this interface with the generic one when your
use case is more clear?

I'm not sure about the implementation Thomas proposed, but I believe theuse case he has in mind is the third of the four I listed above (replacea server in a scaling group).


cheers,
Zane.

_______________________________________________
OpenStack-dev mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Heat] [TripleO] Rolling updates spec re-written. RFC

Reply via email to