I'm not sure how to feel about this... Its clever... It kind of feels like your really trying to be able to register 'actions' in heat so that heat users can poke the vm's to do something... For example "perform a chef run".
While using stack updates listed below could be made to work, is that trying to fit a square peg in a round hole? Would it be better to add a separate api for that? Maybe in the end though, thats just is a matter of what command the user runs, rather then how it gets things done? It may be the same under the hood. What about multiple update actions? Perhaps some types of updates could be run in parallel and others must be done serially? How would you let the Autoscaling group know which updates could run which way? As for ResourceGroup vs AutoscalingGroup, It would be really good for ResourceGroup to support rolling updates properly too. Would it be very difficult to implement it there too? While having the updates happen in the template dependency order is interesting, is that really the correct thing to do? Why not reverse order? I'm guessing it may totally depend on the software. Maybe some app needs the clients upgraded before the server, or the server upgraded before the clients? It may even be version specific? There may even be some steps that aren't obvious where to run them.... "update the clients, upgrade the server packages, stop the servers, run the db upgrade script on one of the servers, start up all the servers" Maybe this is a good place to hook Mistral and Heat together. Heat would have an api that allows actions to be performed on vm's. It would not have any ordering. Mistral could then poke the heat actions api for the stack to assemble workflows... Or for tighter integration, maybe a CompoundAction resource is created that really is a Mistral workflow that pokes the action api, and the workflow was exposed right back through the Heat action api so users could invoke complicated workflows the same way as simple ones... Thanks, Kevin ________________________________________ From: Zane Bitter [[email protected]] Sent: Thursday, April 02, 2015 3:31 PM To: OpenStack Development Mailing List Subject: [openstack-dev] [TripleO][Heat] Overcloud software updates and ResourceGroups A few of us have been looking for a way to perform software updates to servers in a TripleO Heat/Puppet-based overcloud that avoids an impedance mismatch with Heat concepts and how Heat runs its workflow. As many talented TripleO-ers who have gone before can probably testify, that's surprisingly difficult to do, but we did come up with an idea that I think might work and which I'd like to get wider feedback on. For clarity, I'm speaking here in the context of the new overcloud-without-mergepy templates. The idea is that we create a SoftwareConfig that, when run, can update some software on the server. (The exact mechanism for the update is not important for this discussion; suffice to say that in principle it could be as simple as "[yum|apt-get] update".) The SoftwareConfig would have at least one input, though it need not do anything with the value. Then each server has that config deployed to it with a SoftwareDeployment at the time it is created. However, it is set to execute only on the UPDATE action. The value of (one of) the input(s) is obtained from a parameter. As a result, we can trigger the software update by simply changing the value of the input parameter, and the regular Heat dependency graph will be respected. The actual input value could be by convention a uuid, a timestamp, a random string, or just about anything so long as it changes. Here's a trivial example of what this deployment might look like: update_config: type: OS::Heat::SoftwareConfig properties: config: {get_file: do_sw_update.sh} inputs: - name: update_after_time description: Timestamp of the most recent update request update_deployment: type: OS::Heat::SoftwareDeployment properties: actions: - UPDATE config: {get_resource: update_config} server: {get_resource: my_server} input_values: update_after_time: {get_param: update_timestamp} (A possible future enhancement is that if you keep a mapping between previous input values and the system state after the corresponding update, you could even automatically handle rollbacks in the event the user decided to cancel the update.) And now we should be able to trigger an update to all of our servers, in the regular Heat dependency order, by simply (thanks to the fact that parameters now keep their previous values on stack updates unless they're explicitly changed) running a command like: heat stack-update my_overcloud -f $TMPL -P "update_timestamp=$(date)" (A future goal of Heat is to make specifying the template again optional too... I don't think that change landed yet, but in this case we can always obtain the template from Tuskar, so it's not so bad.) Astute readers may have noticed that this does not actually solve our problem. In reality groups of similar servers are deployed within ResourceGroups and there are no dependencies between the members. So, for example, all of the controller nodes would be updated in parallel, with the likely result that the overcloud could be unavailable for some time even if it is deployed with HA. The good news is that a solution to this problem is already implemented in Heat: rolling updates. For example, the controller node availability problem can be solved by setting a rolling update batch size of 1. The bad news is that rolling updates are implemented only for AutoscalingGroups, not ResourceGroups. Accordingly, I propose that we switch the implementation of overcloud-without-mergepy from ResourceGroups to AutoscalingGroups. This would be a breaking change for overcloud updates (although no worse than the change from merge.py over to overcloud-without-mergepy), but that also means that there'll never be a better time than now to make it. I suspect that some folks (Tomas?) have possibly looked into this in the past... can anybody identify any potential obstacles to the change? Two candidates come to mind: 1) The SoftwareDeployments (plural) resource type. I believe we carefully designed that to work with both ResourceGroup and AutoscalingGroup though. 2) The elision feature (https://review.openstack.org/#/c/128365/). Steve, I think this was only implemented for ResourceGroup? An AutoscalingGroup version of this should be feasible though, or do we have better ideas for how to solve it in that context? cheers, Zane. __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: [email protected]?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: [email protected]?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
