Re: [openstack-dev] [Heat] [TripleO] Rolling updates spec re-written. RFC

2014-02-05 Thread Jay Dobies

First, I don't think RollingUpdatePattern and CanaryUpdatePattern should be 2 
different entities. The second just looks like a parametrization of the first 
(growth_factor=1?).


Perhaps they can just be one. Until I find parameters which would need
to mean something different, I'll just use UpdatePattern.


I wondered about this too. Maybe I'm just not as familiar with the 
terminology, but since we're stopping on all failures both function as a 
canary in testing the waters before doing the update. The only 
difference is the potential for acceleration.


As for an example of an entirely different strategy, what about the idea 
of standing up new instances with the updates and then killing off the 
old ones? It may come down to me not fully understanding the scale of 
when you say updating configuration, but it may be desirable to not 
scale down your capacity while the update is executing and instead 
having a quick changeover (for instance, in the floating IPs or a load 
balancer).



I then feel that using (abusing?) depends_on for update pattern is a bit weird. 
Maybe I'm influenced by the CFN design, but the separate UpdatePolicy attribute 
feels better (although I would probably use a property). I guess my main 
question is around the meaning of using the update pattern on a server 
instance. I think I see what you want to do for the group, where child_updating 
would return a number, but I have no idea what it means for a single resource. 
Could you detail the operation a bit more in the document?



I would be o-k with adding another keyword. The idea in abusing depends_on
is that it changes the core language less. Properties is definitely out
for the reasons Christopher brought up, properties is really meant to
be for the resource's end target only.


I think depends_on would be a clever use of the existing language if we 
weren't in a position to influence it's evolution. A resource's update 
policy is a first-class concept IMO, so adding that notion directly into 
the definition feels cleaner.


[snip]

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Heat] [TripleO] Rolling updates spec re-written. RFC

2014-02-05 Thread Steven Dake

On 02/04/2014 06:34 PM, Robert Collins wrote:

On 5 February 2014 13:14, Zane Bitter zbit...@redhat.com wrote:



That's not a great example, because one DB server depends on the other,
forcing them into updating serially anyway.

I have to say that even in general, this whole idea about applying update
policies to non-grouped resources doesn't make a whole lot of sense to me.
For non-grouped resources you control the resource definitions individually
- if you don't want them to update at a particular time, you have the option
of just not updating them.

Well, I don't particularly like the idea of doing thousands of
discrete heat stack-update calls, which would seem to be what you're
proposing.

On groups: autoscale groups are a problem for secure minded
deployments because every server has identical resources (today) and
we very much want discrete credentials per server - at least this is
my understanding of the reason we're not using scaling groups in
TripleO.


Where you _do_ need it is for scaling groups where every server is based on
the same launch config, so you need a way to control the members
individually - by batching up operations (done), adding delays (done) or,
even better, notifications and callbacks.

So it seems like doing 'rolling' updates for any random subset of resources
is effectively turning Heat into something of a poor-man's workflow service,
and IMHO that is probably a mistake.

I mean to reply to the other thread, but here is just as good :) -
heat as a way to describe the intended state, and heat takes care of
transitions, is a brilliant model. It absolutely implies a bunch of
workflows - the AWS update policy is probably the key example.

Being able to gracefully, *automatically* work through a transition
between two defined states, allowing the nodes in question to take
care of their own needs along the way seems like a pretty core
function to fit inside Heat itself. Its not at all the same as 'allow
users to define abitrary workflows'.

-Rob

Rob,

I'm not precisely certain what your proposing, but I think we need to 
take care not to turn the Heat DSL into a full-fledged programming 
language.  IMO thousands of updates done through heat is a perfect way 
for a third party service to do such things - eg control workflow.  
Clearly there is a workflow gap in OpenStack, and possibly that thing 
doing the thousands of updates should be a workflow service, rather then 
TripleO, but workflow is out of scope for Heat proper.  Such a workflow 
service could potentially fit in the Orchestration program alongside 
Heat and Autoscaling.  It is too bad there isn't a workflow service 
already because we are getting alot of pressure to make Heat fill this 
gap.  I personally believe filling this gap with heat would be a mistake 
and the correct course of action would be for a workflow service to 
emerge to fill this need (and depend on Heat for orchestration).


I believe this may be what Zane is reacting to; I believe the Heat 
community would like to avoid making the DSL more programmable because 
then it is harder to use and support.  The parameters,resources,outputs 
DSL objects are difficult enough for new folks to pick up and its only 3 
things to understand...


Regards
-steve




What we do need for all resources (not just scaling groups) is a way for the
user to say for this particular resource, notify me when it has updated
(but, if possible, before we have taken any destructive actions on it), give
me a chance to test it and accept or reject the update. For example, when
you resize a server, give the user a chance to confirm or reject the change
at the VERIFY_RESIZE step (Trove requires this). Or when you replace a
server during an update, give the user a chance to test the new server and
either keep it (continue on and delete the old one) or not (roll back). Or
when you replace a server in a scaling group, notify the load balancer _or
some other thing_ (e.g. OpenShift broker node) that a replacement has been
created and wait for it to switch over to the new one before deleting the
old one. Or, of course, when you update a server to some new config, give
the user a chance to test it out and make sure it works before continuing
with the stack update. All of these use cases can, I think, be solved with a
single feature.

The open questions for me are:
1) How do we notify the user that it's time to check on a resource?
(Marconi?)

This is the graceful update stuff I referred to in my mail to Clint -
the proposal from hallway discussions in HK was to do this by
notifying the server itself (that way we don't create a centralised
point of fail). I can see though that in a general sense not all
resources are servers. But - how about allowing to specify where to
notify (and notifing is always by setting a value in metadata
somewhere) - users can then pull that out themselves however they want
to. Adding push notifications is orthogonal IMO - we'd like that for
all metadata changes, 

Re: [openstack-dev] [Heat] [TripleO] Rolling updates spec re-written. RFC

2014-02-05 Thread Clint Byrum
Excerpts from Zane Bitter's message of 2014-02-04 16:14:09 -0800:
 On 03/02/14 17:09, Clint Byrum wrote:
  Excerpts from Thomas Herve's message of 2014-02-03 12:46:05 -0800:
  So, I wrote the original rolling updates spec about a year ago, and the
  time has come to get serious about implementation. I went through it and
  basically rewrote the entire thing to reflect the knowledge I have
  gained from a year of working with Heat.
 
  Any and all comments are welcome. I intend to start implementation very
  soon, as this is an important component of the HA story for TripleO:
 
  https://wiki.openstack.org/wiki/Heat/Blueprints/RollingUpdates
 
  Hi Clint, thanks for pushing this.
 
  First, I don't think RollingUpdatePattern and CanaryUpdatePattern should 
  be 2 different entities. The second just looks like a parametrization of 
  the first (growth_factor=1?).
 
  Perhaps they can just be one. Until I find parameters which would need
  to mean something different, I'll just use UpdatePattern.
 
 
  I then feel that using (abusing?) depends_on for update pattern is a bit 
  weird. Maybe I'm influenced by the CFN design, but the separate 
  UpdatePolicy attribute feels better (although I would probably use a 
  property). I guess my main question is around the meaning of using the 
  update pattern on a server instance. I think I see what you want to do for 
  the group, where child_updating would return a number, but I have no idea 
  what it means for a single resource. Could you detail the operation a bit 
  more in the document?
 
 
  I would be o-k with adding another keyword. The idea in abusing depends_on
  is that it changes the core language less. Properties is definitely out
  for the reasons Christopher brought up, properties is really meant to
  be for the resource's end target only.
 
 Agree, -1 for properties - those belong to the resource, and this data 
 belongs to Heat.
 
  UpdatePolicy in cfn is a single string, and causes very generic rolling
 
 Huh?
 
 http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-attribute-updatepolicy.html
 
 Not only is it not just a single string (in fact, it looks a lot like 
 the properties you have defined), it's even got another layer of 
 indirection so you can define different types of update policy (rolling 
 vs. canary, anybody?). It's an extremely flexible syntax.
 

Oops, I relied a little too much on my memory and not enough on docs for
that one. O-k, I will re-evaluate given actual knowledge of how it
actually works. :-P

 BTW, given that we already implemented this in autoscaling, it might be 
 helpful to talk more specifically about what we need to do in addition 
 in order to support the use cases you have in mind.
 

As Robert mentioned in his mail, autoscaling groups won't allow us to
inject individual credentials. With the ResourceGroup, we can make a
nested stack with a random string generator so that is solved. Now the
other piece we need is to be able to directly choose machines to take
out of commission, which I think we may have a simple solution to but I
don't want to derail on that.

The one used in AutoScalingGroups is also limited to just one group,
thus it can be done all inside the resource.

  update behavior. I want this resource to be able to control multiple
  groups as if they are one in some cases (Such as a case where a user
  has migrated part of an app to a new type of server, but not all.. so
  they will want to treat the entire aggregate as one rolling update).
 
  I'm o-k with overloading it to allow resource references, but I'd like
  to hear more people take issue with depends_on before I select that
  course.
 
 Resource references in general, and depends_on in particular, feel like 
 very much the wrong abstraction to me. This is a policy, not a resource.
 
  To answer your question, using it with a server instance allows
  rolling updates across non-grouped resources. In the example the
  rolling_update_dbs does this.
 
 That's not a great example, because one DB server depends on the other, 
 forcing them into updating serially anyway.
 

You're right, a better example is a set of (n) resource groups which
serve the same service and thus we want to make sure we maintain the
minimum service levels as a whole.

If it were an order of magnitude harder to do it this way, I'd say
sure let's just expand on the single-resource rolling update. But
I think it won't be that much harder to achieve this and then the use
case is solved.

 I have to say that even in general, this whole idea about applying 
 update policies to non-grouped resources doesn't make a whole lot of 
 sense to me. For non-grouped resources you control the resource 
 definitions individually - if you don't want them to update at a 
 particular time, you have the option of just not updating them.
 

If I have to calculate all the deltas and feed Heat 10 templates, each
with one small delta, I'm writing the same code as I'm proposing for

Re: [openstack-dev] [Heat] [TripleO] Rolling updates spec re-written. RFC

2014-02-05 Thread Clint Byrum
Excerpts from Steven Dake's message of 2014-02-05 07:35:37 -0800:
 On 02/04/2014 06:34 PM, Robert Collins wrote:
  On 5 February 2014 13:14, Zane Bitter zbit...@redhat.com wrote:
 
 
  That's not a great example, because one DB server depends on the other,
  forcing them into updating serially anyway.
 
  I have to say that even in general, this whole idea about applying update
  policies to non-grouped resources doesn't make a whole lot of sense to me.
  For non-grouped resources you control the resource definitions individually
  - if you don't want them to update at a particular time, you have the 
  option
  of just not updating them.
  Well, I don't particularly like the idea of doing thousands of
  discrete heat stack-update calls, which would seem to be what you're
  proposing.
 
  On groups: autoscale groups are a problem for secure minded
  deployments because every server has identical resources (today) and
  we very much want discrete credentials per server - at least this is
  my understanding of the reason we're not using scaling groups in
  TripleO.
 
  Where you _do_ need it is for scaling groups where every server is based on
  the same launch config, so you need a way to control the members
  individually - by batching up operations (done), adding delays (done) or,
  even better, notifications and callbacks.
 
  So it seems like doing 'rolling' updates for any random subset of resources
  is effectively turning Heat into something of a poor-man's workflow 
  service,
  and IMHO that is probably a mistake.
  I mean to reply to the other thread, but here is just as good :) -
  heat as a way to describe the intended state, and heat takes care of
  transitions, is a brilliant model. It absolutely implies a bunch of
  workflows - the AWS update policy is probably the key example.
 
  Being able to gracefully, *automatically* work through a transition
  between two defined states, allowing the nodes in question to take
  care of their own needs along the way seems like a pretty core
  function to fit inside Heat itself. Its not at all the same as 'allow
  users to define abitrary workflows'.
 
  -Rob
 Rob,
 
 I'm not precisely certain what your proposing, but I think we need to 
 take care not to turn the Heat DSL into a full-fledged programming 
 language.  IMO thousands of updates done through heat is a perfect way 
 for a third party service to do such things - eg control workflow.  
 Clearly there is a workflow gap in OpenStack, and possibly that thing 
 doing the thousands of updates should be a workflow service, rather then 
 TripleO, but workflow is out of scope for Heat proper.  Such a workflow 
 service could potentially fit in the Orchestration program alongside 
 Heat and Autoscaling.  It is too bad there isn't a workflow service 
 already because we are getting alot of pressure to make Heat fill this 
 gap.  I personally believe filling this gap with heat would be a mistake 
 and the correct course of action would be for a workflow service to 
 emerge to fill this need (and depend on Heat for orchestration).
 

I don't think we want to make it more programmable. I think the opposite,
we want to relieve the template author of workflow by hiding the common
case work-flows behind an update pattern.

To provide some substance to that, if we were to make a workflow service
that does this, it would have to understand templating, and it would
have to understand heat's API. By the time we get done implementing
that, it would look a lot like the resource I've suggested, surrounded
by calls to heatclient and a heat template library.

 I believe this may be what Zane is reacting to; I believe the Heat 
 community would like to avoid making the DSL more programmable because 
 then it is harder to use and support.  The parameters,resources,outputs 
 DSL objects are difficult enough for new folks to pick up and its only 3 
 things to understand...

I do agree that keeping this simple to understand from a template author
perspective is extremely important.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Heat] [TripleO] Rolling updates spec re-written. RFC

2014-02-05 Thread Zane Bitter

On 04/02/14 20:34, Robert Collins wrote:

On 5 February 2014 13:14, Zane Bitter zbit...@redhat.com wrote:



That's not a great example, because one DB server depends on the other,
forcing them into updating serially anyway.

I have to say that even in general, this whole idea about applying update
policies to non-grouped resources doesn't make a whole lot of sense to me.
For non-grouped resources you control the resource definitions individually
- if you don't want them to update at a particular time, you have the option
of just not updating them.


Well, I don't particularly like the idea of doing thousands of
discrete heat stack-update calls, which would seem to be what you're
proposing.


I'm not proposing you do it by hand if that's any help ;)

Ideally a workflow service would exist that could do the messy parts for 
you, but at the end of the day it's just a for-loop in your code. From 
what you say below, I think you started down the path of managing a lot 
of complexity yourself when you were forced to generate templates for 
server groups rather than use autoscaling. I think it would be better 
for _everyone_ for us to put resources into helping TripleO get off that 
path rather than it would for us to put resources into making it less 
inconvenient to stay on it.



On groups: autoscale groups are a problem for secure minded
deployments because every server has identical resources (today) and
we very much want discrete credentials per server - at least this is
my understanding of the reason we're not using scaling groups in
TripleO.


OK, I wasn't aware that y'all are not using scaling groups. It sounds 
like this is the real problem we should be addressing, because everyone 
wants secure-minded deployments and nobody wants to have to manually 
define the configs for their 1000 all-but-identical servers. If we had a 
mechanism to ensure that every server in a scaling group could obtain 
its own credentials then it seems to me that the issue of whether to 
apply autoscaling-style rolling upgrades to manually-defined groups of 
resources becomes moot.


(Note: if anybody read that paragraph and started thinking hey, we 
could make Turing-complete programmable template templates using the 
JSON equivalent of XSLT, please just stop right now kthx.)



Where you _do_ need it is for scaling groups where every server is based on
the same launch config, so you need a way to control the members
individually - by batching up operations (done), adding delays (done) or,
even better, notifications and callbacks.

So it seems like doing 'rolling' updates for any random subset of resources
is effectively turning Heat into something of a poor-man's workflow service,
and IMHO that is probably a mistake.


I mean to reply to the other thread, but here is just as good :) -
heat as a way to describe the intended state, and heat takes care of
transitions, is a brilliant model. It absolutely implies a bunch of
workflows - the AWS update policy is probably the key example.


Absolutely. Orchestration works by building a workflow internally, which 
Heat then also executes. No disagreement there.



Being able to gracefully, *automatically* work through a transition
between two defined states, allowing the nodes in question to take
care of their own needs along the way seems like a pretty core
function to fit inside Heat itself. Its not at all the same as 'allow
users to define abitrary workflows'.


That's fair and, I like to think, consistent with what I was suggesting 
below.



What we do need for all resources (not just scaling groups) is a way for the
user to say for this particular resource, notify me when it has updated
(but, if possible, before we have taken any destructive actions on it), give
me a chance to test it and accept or reject the update. For example, when
you resize a server, give the user a chance to confirm or reject the change
at the VERIFY_RESIZE step (Trove requires this). Or when you replace a
server during an update, give the user a chance to test the new server and
either keep it (continue on and delete the old one) or not (roll back). Or
when you replace a server in a scaling group, notify the load balancer _or
some other thing_ (e.g. OpenShift broker node) that a replacement has been
created and wait for it to switch over to the new one before deleting the
old one. Or, of course, when you update a server to some new config, give
the user a chance to test it out and make sure it works before continuing
with the stack update. All of these use cases can, I think, be solved with a
single feature.

The open questions for me are:
1) How do we notify the user that it's time to check on a resource?
(Marconi?)


This is the graceful update stuff I referred to in my mail to Clint -
the proposal from hallway discussions in HK was to do this by
notifying the server itself (that way we don't create a centralised
point of fail). I can see though that in a general sense not all
resources are servers. 

Re: [openstack-dev] [Heat] [TripleO] Rolling updates spec re-written. RFC

2014-02-05 Thread Zane Bitter

On 05/02/14 11:39, Clint Byrum wrote:

Excerpts from Zane Bitter's message of 2014-02-04 16:14:09 -0800:

On 03/02/14 17:09, Clint Byrum wrote:

UpdatePolicy in cfn is a single string, and causes very generic rolling


Huh?

http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-attribute-updatepolicy.html

Not only is it not just a single string (in fact, it looks a lot like
the properties you have defined), it's even got another layer of
indirection so you can define different types of update policy (rolling
vs. canary, anybody?). It's an extremely flexible syntax.



Oops, I relied a little too much on my memory and not enough on docs for
that one. O-k, I will re-evaluate given actual knowledge of how it
actually works. :-P


cheers :D


BTW, given that we already implemented this in autoscaling, it might be
helpful to talk more specifically about what we need to do in addition
in order to support the use cases you have in mind.



As Robert mentioned in his mail, autoscaling groups won't allow us to
inject individual credentials. With the ResourceGroup, we can make a
nested stack with a random string generator so that is solved. Now the


\o/ for the random string generator solving the problem!

:-( for ResourceGroup being the only way to do it.

This is exactly why I hate ResourceGroup and think it was a mistake. 
Powerful software comes from being able to combine simple concepts in 
complex ways. Right now you have to choose between an autoscaling group, 
which has rolling updates, and a ResourceGroup which allows you to scale 
stacks. That sucks. What you need is to have both at the same time, and 
the way to do that is to allow autoscaling groups to scale stacks, as 
has long been planned.


At this point it would be a mistake to add a _complicated_ feature 
solely for the purpose of working around the fact the we can't yet 
combine two other, existing, features. It would be better to fix 
autoscaling groups to allow you to inject individual credentials and 
then add a simpler feature that does not need to create ad-hoc groups.



other piece we need is to be able to directly choose machines to take
out of commission, which I think we may have a simple solution to but I
don't want to derail on that.

The one used in AutoScalingGroups is also limited to just one group,
thus it can be done all inside the resource.


update behavior. I want this resource to be able to control multiple
groups as if they are one in some cases (Such as a case where a user
has migrated part of an app to a new type of server, but not all.. so
they will want to treat the entire aggregate as one rolling update).

I'm o-k with overloading it to allow resource references, but I'd like
to hear more people take issue with depends_on before I select that
course.


Resource references in general, and depends_on in particular, feel like
very much the wrong abstraction to me. This is a policy, not a resource.


To answer your question, using it with a server instance allows
rolling updates across non-grouped resources. In the example the
rolling_update_dbs does this.


That's not a great example, because one DB server depends on the other,
forcing them into updating serially anyway.



You're right, a better example is a set of (n) resource groups which
serve the same service and thus we want to make sure we maintain the
minimum service levels as a whole.


That's interesting, and I'd like to hear more about that use case and 
why it couldn't be solved using autoscaling groups assuming the obstacle 
to using them at all were eliminated. If there's a real use case here 
beyond work around lack of stack-scaling functionality then I'm 
definitely open to being persuaded. I'd just like to make sure that it 
exists and justifies the extra complexity.



If it were an order of magnitude harder to do it this way, I'd say
sure let's just expand on the single-resource rolling update. But
I think it won't be that much harder to achieve this and then the use
case is solved.


I guess what I'm thinking is that your proposal is really two features:

1) Notifications/callbacks on update that allow the user to hook in to 
the workflow.

2) Rolling updates over ad-hoc groups (not autoscaling groups).

I think we all agree that (1) is needed; by my count ~6 really good use 
cases have been mentioned in this thread.


What I'm suggesting is that we probably don't need to do (2) at all if 
we fix autoscaling groups to be something you could use.


Having reviewed the code for rolling updates in scaling groups, I can 
report that it is painfully complicated and that you'd be doing yourself 
a big favour by not attempting to reimplement it with ad-hoc groups ;). 
(To be fair, I don't think this would be quite as bad, though clearly it 
wouldn't be as good as not having to do it at all.) More concerning than 
that, though, is the way this looks set to make the template format even 
more arcane than it already is. We might eventually be able 

Re: [openstack-dev] [Heat] [TripleO] Rolling updates spec re-written. RFC

2014-02-04 Thread Thomas Spatzier
Thomas Herve thomas.he...@enovance.com wrote on 03/02/2014 21:46:05:
 From: Thomas Herve thomas.he...@enovance.com
 To: OpenStack Development Mailing List (not for usage questions)
 openstack-dev@lists.openstack.org
 Date: 03/02/2014 21:52
 Subject: Re: [openstack-dev] [Heat] [TripleO] Rolling updates spec
 re-written. RFC

  So, I wrote the original rolling updates spec about a year ago, and the
  time has come to get serious about implementation. I went through it
and
  basically rewrote the entire thing to reflect the knowledge I have
  gained from a year of working with Heat.
 
  Any and all comments are welcome. I intend to start implementation very
  soon, as this is an important component of the HA story for TripleO:
 
  https://wiki.openstack.org/wiki/Heat/Blueprints/RollingUpdates

 Hi Clint, thanks for pushing this.

 First, I don't think RollingUpdatePattern and CanaryUpdatePattern
 should be 2 different entities. The second just looks like a
 parametrization of the first (growth_factor=1?).

 I then feel that using (abusing?) depends_on for update pattern is a
 bit weird. Maybe I'm influenced by the CFN design, but the separate
 UpdatePolicy attribute feels better (although I would probably use a
 property). I guess my main question is around the meaning of using
 the update pattern on a server instance. I think I see what you want
 to do for the group, where child_updating would return a number, but
 I have no idea what it means for a single resource. Could you detail
 the operation a bit more in the document?

I also think that the depends_on feels a bit weird. In most use cases
depends_on is more about waiting for some other resource to be ready, but
for rolling updates the resource if more a data container (a policy) only
that is just there - that's at least how I understand it from a user's
perspective. So refering to that resource via a special property would look
more intuitive to me.

That would also be in line with other cases already implemented: an
InstanceGroup that points to its LaunchConfiguration; a SoftwareDeployment
that points to a SoftwareConfiguration.


 It also seems that the interface you're creating (child_creating/
 child_updating) is fairly specific to your use case. For autoscaling
 we have a need for more generic notification system, it would be
 nice to find common grounds. Maybe we can invert the relationship?
 Add a notified_resources attribute, which would call hooks on the
 parent when actions are happening.

 Thanks,

 --
 Thomas

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Heat] [TripleO] Rolling updates spec re-written. RFC

2014-02-04 Thread Thomas Herve

  I then feel that using (abusing?) depends_on for update pattern is a bit
  weird. Maybe I'm influenced by the CFN design, but the separate
  UpdatePolicy attribute feels better (although I would probably use a
  property). I guess my main question is around the meaning of using the
  update pattern on a server instance. I think I see what you want to do for
  the group, where child_updating would return a number, but I have no idea
  what it means for a single resource. Could you detail the operation a bit
  more in the document?
  
 
 I would be o-k with adding another keyword. The idea in abusing depends_on
 is that it changes the core language less. Properties is definitely out
 for the reasons Christopher brought up, properties is really meant to
 be for the resource's end target only.

OK. Part of my confusion is that I didn't really understand the relationship 
you're trying to build.

 UpdatePolicy in cfn is a single string, and causes very generic rolling
 update behavior. I want this resource to be able to control multiple
 groups as if they are one in some cases (Such as a case where a user
 has migrated part of an app to a new type of server, but not all.. so
 they will want to treat the entire aggregate as one rolling update).

We don't need to have the same restriction as CFN.

 I'm o-k with overloading it to allow resource references, but I'd like
 to hear more people take issue with depends_on before I select that
 course.
 
 To answer your question, using it with a server instance allows
 rolling updates across non-grouped resources. In the example the
 rolling_update_dbs does this.

I think I start to understand. The depends_on implicitly creates a group of 
children with the parent being your rolling update resource.

  It also seems that the interface you're creating
  (child_creating/child_updating) is fairly specific to your use case. For
  autoscaling we have a need for more generic notification system, it would
  be nice to find common grounds. Maybe we can invert the relationship? Add
  a notified_resources attribute, which would call hooks on the parent
  when actions are happening.
  
 
 I'm open to a different interface design. I don't really have a firm
 grasp of the generic behavior you'd like to model though. This is quite
 concrete and would be entirely hidden from template authors, though not
 from resource plugin authors. Attributes sound like something where you
 want the template authors to get involved in specifying, but maybe that
 was just an overloaded term.
 
 So perhaps we can replace this interface with the generic one when your
 use case is more clear?

Sure, I don't want to block you. I think I'd rather have another name than 
depends_on, to make it more clear what it means, even if at the implementation 
detail they are roughly the same. Or maybe we shouldn't use depends_on in HOT. 
Something like parent_resources would seem closer to what we want.

-- 
Thomas

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Heat] [TripleO] Rolling updates spec re-written. RFC

2014-02-04 Thread Zane Bitter

On 03/02/14 17:09, Clint Byrum wrote:

Excerpts from Thomas Herve's message of 2014-02-03 12:46:05 -0800:

So, I wrote the original rolling updates spec about a year ago, and the
time has come to get serious about implementation. I went through it and
basically rewrote the entire thing to reflect the knowledge I have
gained from a year of working with Heat.

Any and all comments are welcome. I intend to start implementation very
soon, as this is an important component of the HA story for TripleO:

https://wiki.openstack.org/wiki/Heat/Blueprints/RollingUpdates


Hi Clint, thanks for pushing this.

First, I don't think RollingUpdatePattern and CanaryUpdatePattern should be 2 
different entities. The second just looks like a parametrization of the first 
(growth_factor=1?).


Perhaps they can just be one. Until I find parameters which would need
to mean something different, I'll just use UpdatePattern.



I then feel that using (abusing?) depends_on for update pattern is a bit weird. 
Maybe I'm influenced by the CFN design, but the separate UpdatePolicy attribute 
feels better (although I would probably use a property). I guess my main 
question is around the meaning of using the update pattern on a server 
instance. I think I see what you want to do for the group, where child_updating 
would return a number, but I have no idea what it means for a single resource. 
Could you detail the operation a bit more in the document?



I would be o-k with adding another keyword. The idea in abusing depends_on
is that it changes the core language less. Properties is definitely out
for the reasons Christopher brought up, properties is really meant to
be for the resource's end target only.


Agree, -1 for properties - those belong to the resource, and this data 
belongs to Heat.



UpdatePolicy in cfn is a single string, and causes very generic rolling


Huh?

http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-attribute-updatepolicy.html

Not only is it not just a single string (in fact, it looks a lot like 
the properties you have defined), it's even got another layer of 
indirection so you can define different types of update policy (rolling 
vs. canary, anybody?). It's an extremely flexible syntax.


BTW, given that we already implemented this in autoscaling, it might be 
helpful to talk more specifically about what we need to do in addition 
in order to support the use cases you have in mind.



update behavior. I want this resource to be able to control multiple
groups as if they are one in some cases (Such as a case where a user
has migrated part of an app to a new type of server, but not all.. so
they will want to treat the entire aggregate as one rolling update).

I'm o-k with overloading it to allow resource references, but I'd like
to hear more people take issue with depends_on before I select that
course.


Resource references in general, and depends_on in particular, feel like 
very much the wrong abstraction to me. This is a policy, not a resource.



To answer your question, using it with a server instance allows
rolling updates across non-grouped resources. In the example the
rolling_update_dbs does this.


That's not a great example, because one DB server depends on the other, 
forcing them into updating serially anyway.


I have to say that even in general, this whole idea about applying 
update policies to non-grouped resources doesn't make a whole lot of 
sense to me. For non-grouped resources you control the resource 
definitions individually - if you don't want them to update at a 
particular time, you have the option of just not updating them.


Where you _do_ need it is for scaling groups where every server is based 
on the same launch config, so you need a way to control the members 
individually - by batching up operations (done), adding delays (done) 
or, even better, notifications and callbacks.


So it seems like doing 'rolling' updates for any random subset of 
resources is effectively turning Heat into something of a poor-man's 
workflow service, and IMHO that is probably a mistake.


What we do need for all resources (not just scaling groups) is a way for 
the user to say for this particular resource, notify me when it has 
updated (but, if possible, before we have taken any destructive actions 
on it), give me a chance to test it and accept or reject the update. 
For example, when you resize a server, give the user a chance to confirm 
or reject the change at the VERIFY_RESIZE step (Trove requires this). Or 
when you replace a server during an update, give the user a chance to 
test the new server and either keep it (continue on and delete the old 
one) or not (roll back). Or when you replace a server in a scaling 
group, notify the load balancer _or some other thing_ (e.g. OpenShift 
broker node) that a replacement has been created and wait for it to 
switch over to the new one before deleting the old one. Or, of course, 
when you update a server to some new config, give 

Re: [openstack-dev] [Heat] [TripleO] Rolling updates spec re-written. RFC

2014-02-04 Thread Christopher Armstrong
On Tue, Feb 4, 2014 at 6:14 PM, Zane Bitter zbit...@redhat.com wrote:

 On 03/02/14 17:09, Clint Byrum wrote:

 Excerpts from Thomas Herve's message of 2014-02-03 12:46:05 -0800:



  update behavior. I want this resource to be able to control multiple
 groups as if they are one in some cases (Such as a case where a user
 has migrated part of an app to a new type of server, but not all.. so
 they will want to treat the entire aggregate as one rolling update).

 I'm o-k with overloading it to allow resource references, but I'd like
 to hear more people take issue with depends_on before I select that
 course.


 Resource references in general, and depends_on in particular, feel like
 very much the wrong abstraction to me. This is a policy, not a resource.


  To answer your question, using it with a server instance allows
 rolling updates across non-grouped resources. In the example the
 rolling_update_dbs does this.


 That's not a great example, because one DB server depends on the other,
 forcing them into updating serially anyway.

 I have to say that even in general, this whole idea about applying update
 policies to non-grouped resources doesn't make a whole lot of sense to me.
 For non-grouped resources you control the resource definitions individually
 - if you don't want them to update at a particular time, you have the
 option of just not updating them.

 Where you _do_ need it is for scaling groups where every server is based
 on the same launch config, so you need a way to control the members
 individually - by batching up operations (done), adding delays (done) or,
 even better, notifications and callbacks.

 So it seems like doing 'rolling' updates for any random subset of
 resources is effectively turning Heat into something of a poor-man's
 workflow service, and IMHO that is probably a mistake.

 What we do need for all resources (not just scaling groups) is a way for
 the user to say for this particular resource, notify me when it has
 updated (but, if possible, before we have taken any destructive actions on
 it), give me a chance to test it and accept or reject the update. For
 example, when you resize a server, give the user a chance to confirm or
 reject the change at the VERIFY_RESIZE step (Trove requires this). Or when
 you replace a server during an update, give the user a chance to test the
 new server and either keep it (continue on and delete the old one) or not
 (roll back). Or when you replace a server in a scaling group, notify the
 load balancer _or some other thing_ (e.g. OpenShift broker node) that a
 replacement has been created and wait for it to switch over to the new one
 before deleting the old one. Or, of course, when you update a server to
 some new config, give the user a chance to test it out and make sure it
 works before continuing with the stack update. All of these use cases can,
 I think, be solved with a single feature.

 The open questions for me are:
 1) How do we notify the user that it's time to check on a resource?
 (Marconi?)
 2) How does the user ack/nack? (You're suggesting reusing WaitCondition,
 and that makes sense to me.)
 3) How do we break up the operations so the notification occurs at the
 right time? (With difficulty, but it should be do-able.)
 4) How does the user indicate for which resources they want to be
 notified? (Inside an update_policy? Another new directive at the
 type/properties/depends_on/update_policy level?)


To relate this to another interesting feature, I think it would also be
super awesome if Heat grew the ability to support remotely-hosted resource
*types* (in addition to the resource notifications you're talking about) by
way of an API over Marconi (or maybe just a simple REST API that Heat would
invoke). I'm pretty sure CFN has something like this, too, using their
queue service. And I think their thing has the custom code ACK back over
the queue service to indicate that operations are complete, fwiw.



 we have a need for more generic notification system, it would be nice to
 find common grounds. Maybe we can invert the relationship? Add a
 notified_resources attribute, which would call hooks on the parent when
 actions are happening.

 It also seems that the interface you're creating 
 (child_creating/child_updating)
 is fairly specific to your use case. For autoscaling
 I'm open to a different interface design. I don't really have a firm
 grasp of the generic behavior you'd like to model though. This is quite
 concrete and would be entirely hidden from template authors, though not
 from resource plugin authors. Attributes sound like something where you
 want the template authors to get involved in specifying, but maybe that
 was just an overloaded term.

 So perhaps we can replace this interface with the generic one when your
 use case is more clear?


 I'm not sure about the implementation Thomas proposed, but I believe the
 use case he has in mind is the third of the four I listed above (replace a
 server in a 

Re: [openstack-dev] [Heat] [TripleO] Rolling updates spec re-written. RFC

2014-02-04 Thread Robert Collins
On 4 February 2014 08:51, Clint Byrum cl...@fewbar.com wrote:
 Excerpts from Robert Collins's message of 2014-02-03 10:47:06 -0800:
 Quick thoughts:

  - I'd like to be able to express a minimum service percentage: e.g. I
 know I need 80% of my capacity available at anyone time, so an
 additional constraint to the unit counts, is to stay below 20% down at
 a time (and this implies that if 20% have failed, either stop or spin
 up more nodes before continuing).


 Right will add that.

Thanks.

 One thing though, all failures lead to rollback. I put that in the
 'Unresolved issues' section. Continuing a group operation with any
 failures is an entirely different change to Heat. We have a few choices,
 from a whole re-thinking of how we handle failures, to just a special
 type of resource group that tolerates failure percentages.

Lets tackle that in a future iteration - it seems orthogonal to me.

 The wait condition stuff seems to be conflating in the 'graceful
 operations' stuff we discussed briefly at the summit, which in my head
 at least is an entirely different thing - it's per node rather than
 per group. If done separately that might make each feature
 substantially easier to reason about.

 Agreed. I think something more generic than an actual Heat wait condition
 would make more sense. Perhaps even returning all of the active scheduler
 tasks which the update must wait on would make sense. Then in the
 graceful update version we can just make the dynamically created wait
 conditions depend on the update pattern, which would have the same effect.

Its not clear to me what would be more generic than a heat wait condition ;).

 With the maximum out of service addition, we'll also need to make sure
 that upon the must wait for these things completing we evaluate state
 again before letting the update proceed.

How about - sketching:

Resources:
  NovaCompute0:
Type: OS::Nova::Server
Properties:
  action-readiness:
handle: MyWaitConditionHandle
notify-path: NovaCompute0Config.Metadata.heat-action
  NovaCompute0Config:
Type: AWS::AutoScaling::LaunchConfiguration
...

Alternatively, have something in the NovaCompute0Config.Metadata
section that is identified as heats signal-to-us marker.

-Rob

-- 
Robert Collins rbtcoll...@hp.com
Distinguished Technologist
HP Converged Cloud

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Heat] [TripleO] Rolling updates spec re-written. RFC

2014-02-04 Thread Robert Collins
On 5 February 2014 13:14, Zane Bitter zbit...@redhat.com wrote:


 That's not a great example, because one DB server depends on the other,
 forcing them into updating serially anyway.

 I have to say that even in general, this whole idea about applying update
 policies to non-grouped resources doesn't make a whole lot of sense to me.
 For non-grouped resources you control the resource definitions individually
 - if you don't want them to update at a particular time, you have the option
 of just not updating them.

Well, I don't particularly like the idea of doing thousands of
discrete heat stack-update calls, which would seem to be what you're
proposing.

On groups: autoscale groups are a problem for secure minded
deployments because every server has identical resources (today) and
we very much want discrete credentials per server - at least this is
my understanding of the reason we're not using scaling groups in
TripleO.

 Where you _do_ need it is for scaling groups where every server is based on
 the same launch config, so you need a way to control the members
 individually - by batching up operations (done), adding delays (done) or,
 even better, notifications and callbacks.

 So it seems like doing 'rolling' updates for any random subset of resources
 is effectively turning Heat into something of a poor-man's workflow service,
 and IMHO that is probably a mistake.

I mean to reply to the other thread, but here is just as good :) -
heat as a way to describe the intended state, and heat takes care of
transitions, is a brilliant model. It absolutely implies a bunch of
workflows - the AWS update policy is probably the key example.

Being able to gracefully, *automatically* work through a transition
between two defined states, allowing the nodes in question to take
care of their own needs along the way seems like a pretty core
function to fit inside Heat itself. Its not at all the same as 'allow
users to define abitrary workflows'.

-Rob

 What we do need for all resources (not just scaling groups) is a way for the
 user to say for this particular resource, notify me when it has updated
 (but, if possible, before we have taken any destructive actions on it), give
 me a chance to test it and accept or reject the update. For example, when
 you resize a server, give the user a chance to confirm or reject the change
 at the VERIFY_RESIZE step (Trove requires this). Or when you replace a
 server during an update, give the user a chance to test the new server and
 either keep it (continue on and delete the old one) or not (roll back). Or
 when you replace a server in a scaling group, notify the load balancer _or
 some other thing_ (e.g. OpenShift broker node) that a replacement has been
 created and wait for it to switch over to the new one before deleting the
 old one. Or, of course, when you update a server to some new config, give
 the user a chance to test it out and make sure it works before continuing
 with the stack update. All of these use cases can, I think, be solved with a
 single feature.

 The open questions for me are:
 1) How do we notify the user that it's time to check on a resource?
 (Marconi?)

This is the graceful update stuff I referred to in my mail to Clint -
the proposal from hallway discussions in HK was to do this by
notifying the server itself (that way we don't create a centralised
point of fail). I can see though that in a general sense not all
resources are servers. But - how about allowing to specify where to
notify (and notifing is always by setting a value in metadata
somewhere) - users can then pull that out themselves however they want
to. Adding push notifications is orthogonal IMO - we'd like that for
all metadata changes, for instance.

 2) How does the user ack/nack? (You're suggesting reusing WaitCondition, and
 that makes sense to me.)

The server would use a WaitCondition yes.

 3) How do we break up the operations so the notification occurs at the right
 time? (With difficulty, but it should be do-able.)

Just wrap the existing operations - if should notify then:
notify-wait-do, otherwise just do.

 4) How does the user indicate for which resources they want to be notified?
 (Inside an update_policy? Another new directive at the
 type/properties/depends_on/update_policy level?)

I would say per resource.

-Rob


-- 
Robert Collins rbtcoll...@hp.com
Distinguished Technologist
HP Converged Cloud

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Heat] [TripleO] Rolling updates spec re-written. RFC

2014-02-04 Thread Christopher Armstrong
On Tue, Feb 4, 2014 at 7:34 PM, Robert Collins robe...@robertcollins.netwrote:

 On 5 February 2014 13:14, Zane Bitter zbit...@redhat.com wrote:


  That's not a great example, because one DB server depends on the other,
  forcing them into updating serially anyway.
 
  I have to say that even in general, this whole idea about applying update
  policies to non-grouped resources doesn't make a whole lot of sense to
 me.
  For non-grouped resources you control the resource definitions
 individually
  - if you don't want them to update at a particular time, you have the
 option
  of just not updating them.

 Well, I don't particularly like the idea of doing thousands of
 discrete heat stack-update calls, which would seem to be what you're
 proposing.

 On groups: autoscale groups are a problem for secure minded
 deployments because every server has identical resources (today) and
 we very much want discrete credentials per server - at least this is
 my understanding of the reason we're not using scaling groups in
 TripleO.

  Where you _do_ need it is for scaling groups where every server is based
 on
  the same launch config, so you need a way to control the members
  individually - by batching up operations (done), adding delays (done) or,
  even better, notifications and callbacks.
 
  So it seems like doing 'rolling' updates for any random subset of
 resources
  is effectively turning Heat into something of a poor-man's workflow
 service,
  and IMHO that is probably a mistake.

 I mean to reply to the other thread, but here is just as good :) -
 heat as a way to describe the intended state, and heat takes care of
 transitions, is a brilliant model. It absolutely implies a bunch of
 workflows - the AWS update policy is probably the key example.

 Being able to gracefully, *automatically* work through a transition
 between two defined states, allowing the nodes in question to take
 care of their own needs along the way seems like a pretty core
 function to fit inside Heat itself. Its not at all the same as 'allow
 users to define abitrary workflows'.

 -Rob


Agreed. I have been assuming that the autoscaling service outside of the
Heat engine would need to send several pre-calculated template changes in
sequence in order to implement rolling updates for resource groups, but I
think it would be much much better if Heat could take care of this as a
core feature.



-- 
Christopher Armstrong
http://twitter.com/radix/
http://github.com/radix/
http://radix.twistedmatrix.com/
http://planet-if.com/
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [Heat] [TripleO] Rolling updates spec re-written. RFC

2014-02-03 Thread Clint Byrum
So, I wrote the original rolling updates spec about a year ago, and the
time has come to get serious about implementation. I went through it and
basically rewrote the entire thing to reflect the knowledge I have
gained from a year of working with Heat.

Any and all comments are welcome. I intend to start implementation very
soon, as this is an important component of the HA story for TripleO:

https://wiki.openstack.org/wiki/Heat/Blueprints/RollingUpdates

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Heat] [TripleO] Rolling updates spec re-written. RFC

2014-02-03 Thread Robert Collins
Quick thoughts:

 - I'd like to be able to express a minimum service percentage: e.g. I
know I need 80% of my capacity available at anyone time, so an
additional constraint to the unit counts, is to stay below 20% down at
a time (and this implies that if 20% have failed, either stop or spin
up more nodes before continuing).

The wait condition stuff seems to be conflating in the 'graceful
operations' stuff we discussed briefly at the summit, which in my head
at least is an entirely different thing - it's per node rather than
per group. If done separately that might make each feature
substantially easier to reason about.

-Rob

On 4 February 2014 06:52, Clint Byrum cl...@fewbar.com wrote:
 So, I wrote the original rolling updates spec about a year ago, and the
 time has come to get serious about implementation. I went through it and
 basically rewrote the entire thing to reflect the knowledge I have
 gained from a year of working with Heat.

 Any and all comments are welcome. I intend to start implementation very
 soon, as this is an important component of the HA story for TripleO:

 https://wiki.openstack.org/wiki/Heat/Blueprints/RollingUpdates

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



-- 
Robert Collins rbtcoll...@hp.com
Distinguished Technologist
HP Converged Cloud

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Heat] [TripleO] Rolling updates spec re-written. RFC

2014-02-03 Thread Clint Byrum
Excerpts from Robert Collins's message of 2014-02-03 10:47:06 -0800:
 Quick thoughts:
 
  - I'd like to be able to express a minimum service percentage: e.g. I
 know I need 80% of my capacity available at anyone time, so an
 additional constraint to the unit counts, is to stay below 20% down at
 a time (and this implies that if 20% have failed, either stop or spin
 up more nodes before continuing).
 

Right will add that.

One thing though, all failures lead to rollback. I put that in the
'Unresolved issues' section. Continuing a group operation with any
failures is an entirely different change to Heat. We have a few choices,
from a whole re-thinking of how we handle failures, to just a special
type of resource group that tolerates failure percentages.

 The wait condition stuff seems to be conflating in the 'graceful
 operations' stuff we discussed briefly at the summit, which in my head
 at least is an entirely different thing - it's per node rather than
 per group. If done separately that might make each feature
 substantially easier to reason about.

Agreed. I think something more generic than an actual Heat wait condition
would make more sense. Perhaps even returning all of the active scheduler
tasks which the update must wait on would make sense. Then in the
graceful update version we can just make the dynamically created wait
conditions depend on the update pattern, which would have the same effect.

With the maximum out of service addition, we'll also need to make sure
that upon the must wait for these things completing we evaluate state
again before letting the update proceed.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Heat] [TripleO] Rolling updates spec re-written. RFC

2014-02-03 Thread Thomas Herve
 So, I wrote the original rolling updates spec about a year ago, and the
 time has come to get serious about implementation. I went through it and
 basically rewrote the entire thing to reflect the knowledge I have
 gained from a year of working with Heat.
 
 Any and all comments are welcome. I intend to start implementation very
 soon, as this is an important component of the HA story for TripleO:
 
 https://wiki.openstack.org/wiki/Heat/Blueprints/RollingUpdates

Hi Clint, thanks for pushing this.

First, I don't think RollingUpdatePattern and CanaryUpdatePattern should be 2 
different entities. The second just looks like a parametrization of the first 
(growth_factor=1?).

I then feel that using (abusing?) depends_on for update pattern is a bit weird. 
Maybe I'm influenced by the CFN design, but the separate UpdatePolicy attribute 
feels better (although I would probably use a property). I guess my main 
question is around the meaning of using the update pattern on a server 
instance. I think I see what you want to do for the group, where child_updating 
would return a number, but I have no idea what it means for a single resource. 
Could you detail the operation a bit more in the document?

It also seems that the interface you're creating 
(child_creating/child_updating) is fairly specific to your use case. For 
autoscaling we have a need for more generic notification system, it would be 
nice to find common grounds. Maybe we can invert the relationship? Add a 
notified_resources attribute, which would call hooks on the parent when 
actions are happening.

Thanks,

-- 
Thomas 

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Heat] [TripleO] Rolling updates spec re-written. RFC

2014-02-03 Thread Christopher Armstrong
Heya Clint, this BP looks really good - it should significantly simplify
the implementation of scaling if this becomes a core Heat feature. Comments
below.

On Mon, Feb 3, 2014 at 2:46 PM, Thomas Herve thomas.he...@enovance.comwrote:

  So, I wrote the original rolling updates spec about a year ago, and the
  time has come to get serious about implementation. I went through it and
  basically rewrote the entire thing to reflect the knowledge I have
  gained from a year of working with Heat.
 
  Any and all comments are welcome. I intend to start implementation very
  soon, as this is an important component of the HA story for TripleO:
 
  https://wiki.openstack.org/wiki/Heat/Blueprints/RollingUpdates

 Hi Clint, thanks for pushing this.

 First, I don't think RollingUpdatePattern and CanaryUpdatePattern should
 be 2 different entities. The second just looks like a parametrization of
 the first (growth_factor=1?).


Agreed.



 I then feel that using (abusing?) depends_on for update pattern is a bit
 weird. Maybe I'm influenced by the CFN design, but the separate
 UpdatePolicy attribute feels better (although I would probably use a
 property). I guess my main question is around the meaning of using the
 update pattern on a server instance. I think I see what you want to do for
 the group, where child_updating would return a number, but I have no idea
 what it means for a single resource. Could you detail the operation a bit
 more in the document?



I agree that depends_on is weird and I think it should be avoided. I'm not
sure a property is the right decision, though, assuming that it's the heat
engine that's dealing with the rolling updates -- I think having the engine
reach into a resource's properties would set a strange precedent. The CFN
design does seem pretty reasonable to me, assuming an update_policy field
in a HOT resource, referring to the policy that the resource should use.


It also seems that the interface you're creating
 (child_creating/child_updating) is fairly specific to your use case. For
 autoscaling we have a need for more generic notification system, it would
 be nice to find common grounds. Maybe we can invert the relationship? Add a
 notified_resources attribute, which would call hooks on the parent when
 actions are happening.



Yeah, this would be really helpful for stuff like load balancer
notifications (and any of a number of different resource relationships).

-- 
IRC: radix
http://twitter.com/radix
Christopher Armstrong
Rackspace
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Heat] [TripleO] Rolling updates spec re-written. RFC

2014-02-03 Thread Clint Byrum
Excerpts from Thomas Herve's message of 2014-02-03 12:46:05 -0800:
  So, I wrote the original rolling updates spec about a year ago, and the
  time has come to get serious about implementation. I went through it and
  basically rewrote the entire thing to reflect the knowledge I have
  gained from a year of working with Heat.
  
  Any and all comments are welcome. I intend to start implementation very
  soon, as this is an important component of the HA story for TripleO:
  
  https://wiki.openstack.org/wiki/Heat/Blueprints/RollingUpdates
 
 Hi Clint, thanks for pushing this.
 
 First, I don't think RollingUpdatePattern and CanaryUpdatePattern should be 2 
 different entities. The second just looks like a parametrization of the first 
 (growth_factor=1?).

Perhaps they can just be one. Until I find parameters which would need
to mean something different, I'll just use UpdatePattern.

 
 I then feel that using (abusing?) depends_on for update pattern is a bit 
 weird. Maybe I'm influenced by the CFN design, but the separate UpdatePolicy 
 attribute feels better (although I would probably use a property). I guess my 
 main question is around the meaning of using the update pattern on a server 
 instance. I think I see what you want to do for the group, where 
 child_updating would return a number, but I have no idea what it means for a 
 single resource. Could you detail the operation a bit more in the document?
 

I would be o-k with adding another keyword. The idea in abusing depends_on
is that it changes the core language less. Properties is definitely out
for the reasons Christopher brought up, properties is really meant to
be for the resource's end target only.

UpdatePolicy in cfn is a single string, and causes very generic rolling
update behavior. I want this resource to be able to control multiple
groups as if they are one in some cases (Such as a case where a user
has migrated part of an app to a new type of server, but not all.. so
they will want to treat the entire aggregate as one rolling update).

I'm o-k with overloading it to allow resource references, but I'd like
to hear more people take issue with depends_on before I select that
course.

To answer your question, using it with a server instance allows
rolling updates across non-grouped resources. In the example the
rolling_update_dbs does this.

 It also seems that the interface you're creating 
 (child_creating/child_updating) is fairly specific to your use case. For 
 autoscaling we have a need for more generic notification system, it would be 
 nice to find common grounds. Maybe we can invert the relationship? Add a 
 notified_resources attribute, which would call hooks on the parent when 
 actions are happening.
 

I'm open to a different interface design. I don't really have a firm
grasp of the generic behavior you'd like to model though. This is quite
concrete and would be entirely hidden from template authors, though not
from resource plugin authors. Attributes sound like something where you
want the template authors to get involved in specifying, but maybe that
was just an overloaded term.

So perhaps we can replace this interface with the generic one when your
use case is more clear?

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev