On 03/02/14 17:09, Clint Byrum wrote:
Excerpts from Thomas Herve's message of 2014-02-03 12:46:05 -0800:
So, I wrote the original rolling updates spec about a year ago, and the
time has come to get serious about implementation. I went through it and
basically rewrote the entire thing to reflect the knowledge I have
gained from a year of working with Heat.

Any and all comments are welcome. I intend to start implementation very
soon, as this is an important component of the HA story for TripleO:

https://wiki.openstack.org/wiki/Heat/Blueprints/RollingUpdates

Hi Clint, thanks for pushing this.

First, I don't think RollingUpdatePattern and CanaryUpdatePattern should be 2 
different entities. The second just looks like a parametrization of the first 
(growth_factor=1?).

Perhaps they can just be one. Until I find parameters which would need
to mean something different, I'll just use UpdatePattern.


I then feel that using (abusing?) depends_on for update pattern is a bit weird. 
Maybe I'm influenced by the CFN design, but the separate UpdatePolicy attribute 
feels better (although I would probably use a property). I guess my main 
question is around the meaning of using the update pattern on a server 
instance. I think I see what you want to do for the group, where child_updating 
would return a number, but I have no idea what it means for a single resource. 
Could you detail the operation a bit more in the document?


I would be o-k with adding another keyword. The idea in abusing depends_on
is that it changes the core language less. Properties is definitely out
for the reasons Christopher brought up, properties is really meant to
be for the resource's end target only.

Agree, -1 for properties - those belong to the resource, and this data belongs to Heat.

UpdatePolicy in cfn is a single string, and causes very generic rolling

Huh?

http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-attribute-updatepolicy.html

Not only is it not just a single string (in fact, it looks a lot like the properties you have defined), it's even got another layer of indirection so you can define different types of update policy (rolling vs. canary, anybody?). It's an extremely flexible syntax.

BTW, given that we already implemented this in autoscaling, it might be helpful to talk more specifically about what we need to do in addition in order to support the use cases you have in mind.

update behavior. I want this resource to be able to control multiple
groups as if they are one in some cases (Such as a case where a user
has migrated part of an app to a new type of server, but not all.. so
they will want to treat the entire aggregate as one rolling update).

I'm o-k with overloading it to allow resource references, but I'd like
to hear more people take issue with depends_on before I select that
course.

Resource references in general, and depends_on in particular, feel like very much the wrong abstraction to me. This is a policy, not a resource.

To answer your question, using it with a server instance allows
rolling updates across non-grouped resources. In the example the
rolling_update_dbs does this.

That's not a great example, because one DB server depends on the other, forcing them into updating serially anyway.

I have to say that even in general, this whole idea about applying update policies to non-grouped resources doesn't make a whole lot of sense to me. For non-grouped resources you control the resource definitions individually - if you don't want them to update at a particular time, you have the option of just not updating them.

Where you _do_ need it is for scaling groups where every server is based on the same launch config, so you need a way to control the members individually - by batching up operations (done), adding delays (done) or, even better, notifications and callbacks.

So it seems like doing 'rolling' updates for any random subset of resources is effectively turning Heat into something of a poor-man's workflow service, and IMHO that is probably a mistake.

What we do need for all resources (not just scaling groups) is a way for the user to say "for this particular resource, notify me when it has updated (but, if possible, before we have taken any destructive actions on it), give me a chance to test it and accept or reject the update". For example, when you resize a server, give the user a chance to confirm or reject the change at the VERIFY_RESIZE step (Trove requires this). Or when you replace a server during an update, give the user a chance to test the new server and either keep it (continue on and delete the old one) or not (roll back). Or when you replace a server in a scaling group, notify the load balancer _or some other thing_ (e.g. OpenShift broker node) that a replacement has been created and wait for it to switch over to the new one before deleting the old one. Or, of course, when you update a server to some new config, give the user a chance to test it out and make sure it works before continuing with the stack update. All of these use cases can, I think, be solved with a single feature.

The open questions for me are:
1) How do we notify the user that it's time to check on a resource? (Marconi?) 2) How does the user ack/nack? (You're suggesting reusing WaitCondition, and that makes sense to me.) 3) How do we break up the operations so the notification occurs at the right time? (With difficulty, but it should be do-able.) 4) How does the user indicate for which resources they want to be notified? (Inside an update_policy? Another new directive at the type/properties/depends_on/update_policy level?)

It also seems that the interface you're creating (child_creating/child_updating) is fairly specific 
to your use case. For autoscaling we have a need for more generic notification system, it would be 
nice to find common grounds. Maybe we can invert the relationship? Add a 
"notified_resources" attribute, which would call hooks on the "parent" when 
actions are happening.


I'm open to a different interface design. I don't really have a firm
grasp of the generic behavior you'd like to model though. This is quite
concrete and would be entirely hidden from template authors, though not
from resource plugin authors. Attributes sound like something where you
want the template authors to get involved in specifying, but maybe that
was just an overloaded term.

So perhaps we can replace this interface with the generic one when your
use case is more clear?

I'm not sure about the implementation Thomas proposed, but I believe the use case he has in mind is the third of the four I listed above (replace a server in a scaling group).

cheers,
Zane.

_______________________________________________
OpenStack-dev mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to