Re: [openstack-dev] [TripleO][Heat] Selectively disabling deployment resources
On 08/03/17 10:05, James Slagle wrote: On Tue, Mar 7, 2017 at 7:24 PM, Zane Bitter wrote: On 07/03/17 14:34, James Slagle wrote: I've been working on this spec for TripleO: https://review.openstack.org/#/c/431745/ which allows users to selectively disable Heat deployment resources for a given server (or server in the case of a *DeloymentGroup resource). I'm not completely clear on what this means. You can selectively disable resources with conditionals. But I think you mean that you want to selectively disable *changes* to resources? Yes, that's right. The reason I can't use conditionals is that I still want the SoftwareDeploymentGroup resources to be updated, but I may want to selectively exclude servers from the group that is passed in via the servers property. E.g., instead of updating the deployment metadata for *all* computes, I may want to exclude a single compute that is temporarily unreachable, without that failing the whole stack-update. Have you seen the filter function? http://git.openstack.org/cgit/openstack/heat/tree/heat/engine/hot/functions.py#n1279 I started by taking an approach that would be specific to TripleO. Basically mapping all the deployment resources to a nested stack containing the logic to selectively disable servers from the deployment (using yaql) based on a provided parameter value. Here's the main patch: https://review.openstack.org/#/c/442681/ After considering that complexity, particularly the yaql expression, I'm wondering if it would be better to add this support natively to Heat. I was looking at the restricted_actions key in the resource_registry and was thinking this might be a reasonable place to add such support. It would require some changes to how restricted_actions work. One change would be a method for specifying that restricted_actions should not fail the stack operation if an action would have otherwise been triggered. Currently the behavior is to raise an exception and mark the stack failed if an action needs to be taken but has been marked restricted. That would need to be tweaked to allow specifying that that we don't want the stack to fail. One thought would be to change the allowed values of restricted_actions to: replace_fail replace_ignore update_fail update_ignore replace update where replace and update were synonyms for replace_fail/update_fail to maintain backwards compatibility. Anything that involves the resource definition in the template changing but Heat not modifying the resource is problematic, because that messes with Heat's internal bookkeeping. I don't think this case would violate that principle. The template + environment files would match what Heat has done. After an update, the 2 would be in sync as to what servers the updated Deployment resource was triggered. I'm afraid I can't agree; it isn't that straightforward. Also, if you want to implement a generic mechanism that applies to every kind of resource (like restricted_actions do) then it isn't enough for it to work in one particular use case. Another change would be to add logic to the Deployment resources themselves to consider if any restricted_actions have been set on an Server resources before triggering an updated deployment for a given server. Why not just a property, "no_new_deployments_please: true"? That would actually work and be pretty straightforward I think. We could have a map parameter with server names and the property that the user could use to set the value. The tricky part, since this would presumably be implemented in the software deployment API itself, would be how to keep the Heat SoftwareDeployment resource in sync with what's actually happening, so that the Right Thing happens again when you start doing new deployments. cheers, Zane. The reason why I was initially not considering this route was because it doesn't allow the user to disable only some deployments for a given server. It's all or nothing. However, it's much simpler than a totally flexible option, and it addresses 2 of the largest use cases of this feature. I'll look into this route a bit more. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO][Heat] Selectively disabling deployment resources
On Wed, Mar 8, 2017 at 4:08 AM, Steven Hardy wrote: > On Tue, Mar 07, 2017 at 02:34:50PM -0500, James Slagle wrote: >> I've been working on this spec for TripleO: >> https://review.openstack.org/#/c/431745/ >> >> which allows users to selectively disable Heat deployment resources >> for a given server (or server in the case of a *DeloymentGroup >> resource). >> >> Some of the main use cases in TripleO for such a feature are scaling >> out compute nodes where you do not need to rerun Puppet (or make any >> changes at all) on non-compute nodes, or to exclude nodes from hanging >> a stack-update if you know they are unreachable or degraded for some >> reason. There are others, but those are 2 of the major use cases. > > Thanks for raising this, I know it's been a pain point for some users of > TripleO. > > However I think we're conflating two different issues here: > > 1. Don't re-run puppet (or yum update) when no other changes have happened > > 2. Disable deployment resources when changes have happened Yea, possibly, but (1) doesn't really solve the use cases in the spec. It'd certainly be a small improvement, but it's not really what users are asking for. (2) is much more difficult to reason about because we in fact have to execute puppet to fully determine if changes have happened. I don't really think these two are conflated. For some purposes, the 2nd is just a more abstract definition of the first. For better or worse, part of the reason people are asking for this feature is because they don't want to undo manual changes. While that's not something we should really spend a lot of time solving for, the fact is that OpenStack architecture allows for horizontally scaling compute nodes without have to touch every other single node in your deployment but TripleO can't take advantage of that. So, just giving users a way to opt out of the generated unique identifier triggering the puppet applys and other deployments, wouldn't help them if they unintentionally changed some other hiera data that triggers a deployment. Plus, we have some deployments that are going to execute every time outside of unique identifiers being generated (hosts-config.yaml). > (1) is actually very simple, and is the default behavior of Heat > (SoftwareDeployment resources never update unless either the config > referenced or the input_values change). We just need to provide an option > to disable the DeployIdentifier/UpdateIdentifier timestamps from being > generated in tripleoclient. > > (2) is harder, because the whole point of SoftwareDeploymentGroup is to run > the exact same configuration on a group of servers, with no exceptions. > > As Zane mentions (2) is related to the way ResourceGroup works, but the > problem here isn't ResourceGroup per-se, as it would in theory be pretty > easy to reimplement SoftwareDeploymentGroup to generate it's nested stack > without inheriting from ResourceGroup (which may be needed if you want a > flag to make existing Deployments in the group immutable). > > I'd suggest we solve (1) and do some testing, it may be enough to solve the > "don't change computes on scale-out" case at least? Possibly, as long as no other deployments are triggered. I think of the use case more as: add a compute node(s), don't touch any existing nodes to minimize risk as opposed to: add a compute node(s), don't re-run puppet on any existing nodes as I know that it's not needed For the scale out case, the desire to minimize risk is a big part of why other nodes don't need to be touched. > > One way to potentially solve (2) would be to unroll the > SoftwareDeploymentGroup resources and instead generate the Deployment > resources via jinja2 - this would enable completely removing them on update > if that's what is desired, similar to what we already do for upgrades to > e.g not upgrade any compute nodes. Thanks, I hadn't considered that approach, but will look into it. I'd guess you'd still need a parameter or map data fed into the jinja2 templating, so that it would not generate the deployment resources based on what was desired to be disabled. Or, this could use conditionals perhaps. -- -- James Slagle -- __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO][Heat] Selectively disabling deployment resources
On Tue, Mar 7, 2017 at 7:24 PM, Zane Bitter wrote: > On 07/03/17 14:34, James Slagle wrote: >> >> I've been working on this spec for TripleO: >> https://review.openstack.org/#/c/431745/ >> >> which allows users to selectively disable Heat deployment resources >> for a given server (or server in the case of a *DeloymentGroup >> resource). > > > I'm not completely clear on what this means. You can selectively disable > resources with conditionals. But I think you mean that you want to > selectively disable *changes* to resources? Yes, that's right. The reason I can't use conditionals is that I still want the SoftwareDeploymentGroup resources to be updated, but I may want to selectively exclude servers from the group that is passed in via the servers property. E.g., instead of updating the deployment metadata for *all* computes, I may want to exclude a single compute that is temporarily unreachable, without that failing the whole stack-update. >> I started by taking an approach that would be specific to TripleO. >> Basically mapping all the deployment resources to a nested stack >> containing the logic to selectively disable servers from the >> deployment (using yaql) based on a provided parameter value. Here's >> the main patch: https://review.openstack.org/#/c/442681/ >> >> After considering that complexity, particularly the yaql expression, >> I'm wondering if it would be better to add this support natively to >> Heat. >> >> I was looking at the restricted_actions key in the resource_registry >> and was thinking this might be a reasonable place to add such support. >> It would require some changes to how restricted_actions work. >> >> One change would be a method for specifying that restricted_actions >> should not fail the stack operation if an action would have otherwise >> been triggered. Currently the behavior is to raise an exception and >> mark the stack failed if an action needs to be taken but has been >> marked restricted. That would need to be tweaked to allow specifying >> that that we don't want the stack to fail. One thought would be to >> change the allowed values of restricted_actions to: >> >> replace_fail >> replace_ignore >> update_fail >> update_ignore >> replace >> update >> >> where replace and update were synonyms for replace_fail/update_fail to >> maintain backwards compatibility. > > > Anything that involves the resource definition in the template changing but > Heat not modifying the resource is problematic, because that messes with > Heat's internal bookkeeping. I don't think this case would violate that principle. The template + environment files would match what Heat has done. After an update, the 2 would be in sync as to what servers the updated Deployment resource was triggered. > >> Another change would be to add logic to the Deployment resources >> themselves to consider if any restricted_actions have been set on an >> Server resources before triggering an updated deployment for a given >> server. > > > Why not just a property, "no_new_deployments_please: true"? That would actually work and be pretty straightforward I think. We could have a map parameter with server names and the property that the user could use to set the value. The reason why I was initially not considering this route was because it doesn't allow the user to disable only some deployments for a given server. It's all or nothing. However, it's much simpler than a totally flexible option, and it addresses 2 of the largest use cases of this feature. I'll look into this route a bit more. -- -- James Slagle -- __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO][Heat] Selectively disabling deployment resources
On Tue, Mar 07, 2017 at 02:34:50PM -0500, James Slagle wrote: > I've been working on this spec for TripleO: > https://review.openstack.org/#/c/431745/ > > which allows users to selectively disable Heat deployment resources > for a given server (or server in the case of a *DeloymentGroup > resource). > > Some of the main use cases in TripleO for such a feature are scaling > out compute nodes where you do not need to rerun Puppet (or make any > changes at all) on non-compute nodes, or to exclude nodes from hanging > a stack-update if you know they are unreachable or degraded for some > reason. There are others, but those are 2 of the major use cases. Thanks for raising this, I know it's been a pain point for some users of TripleO. However I think we're conflating two different issues here: 1. Don't re-run puppet (or yum update) when no other changes have happened 2. Disable deployment resources when changes have happened (1) is actually very simple, and is the default behavior of Heat (SoftwareDeployment resources never update unless either the config referenced or the input_values change). We just need to provide an option to disable the DeployIdentifier/UpdateIdentifier timestamps from being generated in tripleoclient. (2) is harder, because the whole point of SoftwareDeploymentGroup is to run the exact same configuration on a group of servers, with no exceptions. As Zane mentions (2) is related to the way ResourceGroup works, but the problem here isn't ResourceGroup per-se, as it would in theory be pretty easy to reimplement SoftwareDeploymentGroup to generate it's nested stack without inheriting from ResourceGroup (which may be needed if you want a flag to make existing Deployments in the group immutable). I'd suggest we solve (1) and do some testing, it may be enough to solve the "don't change computes on scale-out" case at least? One way to potentially solve (2) would be to unroll the SoftwareDeploymentGroup resources and instead generate the Deployment resources via jinja2 - this would enable completely removing them on update if that's what is desired, similar to what we already do for upgrades to e.g not upgrade any compute nodes. Steve > > I started by taking an approach that would be specific to TripleO. > Basically mapping all the deployment resources to a nested stack > containing the logic to selectively disable servers from the > deployment (using yaql) based on a provided parameter value. Here's > the main patch: https://review.openstack.org/#/c/442681/ > > After considering that complexity, particularly the yaql expression, > I'm wondering if it would be better to add this support natively to > Heat. > > I was looking at the restricted_actions key in the resource_registry > and was thinking this might be a reasonable place to add such support. > It would require some changes to how restricted_actions work. > > One change would be a method for specifying that restricted_actions > should not fail the stack operation if an action would have otherwise > been triggered. Currently the behavior is to raise an exception and > mark the stack failed if an action needs to be taken but has been > marked restricted. That would need to be tweaked to allow specifying > that that we don't want the stack to fail. One thought would be to > change the allowed values of restricted_actions to: > > replace_fail > replace_ignore > update_fail > update_ignore > replace > update > > where replace and update were synonyms for replace_fail/update_fail to > maintain backwards compatibility. > > Another change would be to add logic to the Deployment resources > themselves to consider if any restricted_actions have been set on an > Server resources before triggering an updated deployment for a given > server. > > It also might be nice to allow specifying restricted_actions on the > server's name property (which typically is the hostname) instead of > having to use the resource name. The reason being is that it is not > really feasibly to expect operators/users to have to represent the > full nested_stack structure in their resource_registry. They would > have to query and record nested_stack names just to refer to a given > server resource. Each ResourceGroup nested stack would be have to be > individually represented, etc. Unless there is another way I'm > overlooking. > > Whether or not the restricted_actions approach is taken, is Heat > interested in this functionality natively? I think it would make for a > much cleaner implementation than something TripleO specific. I can > work on a Heat spec if there's interest, though I'd like to get some > early feedback. > > Thanks. > > -- > -- James Slagle > -- > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Ste
Re: [openstack-dev] [TripleO][Heat] Selectively disabling deployment resources
On 07/03/17 14:34, James Slagle wrote: I've been working on this spec for TripleO: https://review.openstack.org/#/c/431745/ which allows users to selectively disable Heat deployment resources for a given server (or server in the case of a *DeloymentGroup resource). I'm not completely clear on what this means. You can selectively disable resources with conditionals. But I think you mean that you want to selectively disable *changes* to resources? Some of the main use cases in TripleO for such a feature are scaling out compute nodes where you do not need to rerun Puppet (or make any changes at all) on non-compute nodes, or to exclude nodes from hanging a stack-update if you know they are unreachable or degraded for some reason. There are others, but those are 2 of the major use cases. I think you're running up against a limitation of the scaling group implementation in Heat. In AWS Autoscaling, you have a LaunchConfig associated with a group that is used when scaling up to create new members, but existing members are not changed when you specify a new LaunchConfig unless you also specifically include a rolling update UpdatePolicy. (That isn't a great interface in CloudFormation, but it works and I can't actually think of anything better.) Heat's AWS-style resources work similarly. Heat's native autoscaling group resources don't have a separate LaunchConfig, and although they used to work similarly to the AWS ones with respect to when they would update existing members, IIRC somebody decided that was a "bug" and "fixed" it. In any event, TripleO uses ResourceGroup, and the very existence of ResourceGroup is predicated on the idea that you can just generate the nested template by making copies of the inline resource definition - that is, the idea that you'll *never* need this feature which it turns out you do, in fact, need. TripleO can't move away from ResourceGroup because it relies on it to auto-assign pre-chosen names for specific servers. Senlin, for the record, gets this right. I started by taking an approach that would be specific to TripleO. Basically mapping all the deployment resources to a nested stack containing the logic to selectively disable servers from the deployment (using yaql) based on a provided parameter value. Here's the main patch: https://review.openstack.org/#/c/442681/ After considering that complexity, particularly the yaql expression, I'm wondering if it would be better to add this support natively to Heat. I was looking at the restricted_actions key in the resource_registry and was thinking this might be a reasonable place to add such support. It would require some changes to how restricted_actions work. One change would be a method for specifying that restricted_actions should not fail the stack operation if an action would have otherwise been triggered. Currently the behavior is to raise an exception and mark the stack failed if an action needs to be taken but has been marked restricted. That would need to be tweaked to allow specifying that that we don't want the stack to fail. One thought would be to change the allowed values of restricted_actions to: replace_fail replace_ignore update_fail update_ignore replace update where replace and update were synonyms for replace_fail/update_fail to maintain backwards compatibility. Anything that involves the resource definition in the template changing but Heat not modifying the resource is problematic, because that messes with Heat's internal bookkeeping. Another change would be to add logic to the Deployment resources themselves to consider if any restricted_actions have been set on an Server resources before triggering an updated deployment for a given server. Why not just a property, "no_new_deployments_please: true"? It also might be nice to allow specifying restricted_actions on the server's name property (which typically is the hostname) instead of having to use the resource name. The reason being is that it is not really feasibly to expect operators/users to have to represent the full nested_stack structure in their resource_registry. They would have to query and record nested_stack names just to refer to a given server resource. Each ResourceGroup nested stack would be have to be individually represented, etc. Unless there is another way I'm overlooking. Whether or not the restricted_actions approach is taken, is Heat interested in this functionality natively? I think it would make for a much cleaner implementation than something TripleO specific. I can work on a Heat spec if there's interest, though I'd like to get some early feedback. Thanks. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [TripleO][Heat] Selectively disabling deployment resources
I've been working on this spec for TripleO: https://review.openstack.org/#/c/431745/ which allows users to selectively disable Heat deployment resources for a given server (or server in the case of a *DeloymentGroup resource). Some of the main use cases in TripleO for such a feature are scaling out compute nodes where you do not need to rerun Puppet (or make any changes at all) on non-compute nodes, or to exclude nodes from hanging a stack-update if you know they are unreachable or degraded for some reason. There are others, but those are 2 of the major use cases. I started by taking an approach that would be specific to TripleO. Basically mapping all the deployment resources to a nested stack containing the logic to selectively disable servers from the deployment (using yaql) based on a provided parameter value. Here's the main patch: https://review.openstack.org/#/c/442681/ After considering that complexity, particularly the yaql expression, I'm wondering if it would be better to add this support natively to Heat. I was looking at the restricted_actions key in the resource_registry and was thinking this might be a reasonable place to add such support. It would require some changes to how restricted_actions work. One change would be a method for specifying that restricted_actions should not fail the stack operation if an action would have otherwise been triggered. Currently the behavior is to raise an exception and mark the stack failed if an action needs to be taken but has been marked restricted. That would need to be tweaked to allow specifying that that we don't want the stack to fail. One thought would be to change the allowed values of restricted_actions to: replace_fail replace_ignore update_fail update_ignore replace update where replace and update were synonyms for replace_fail/update_fail to maintain backwards compatibility. Another change would be to add logic to the Deployment resources themselves to consider if any restricted_actions have been set on an Server resources before triggering an updated deployment for a given server. It also might be nice to allow specifying restricted_actions on the server's name property (which typically is the hostname) instead of having to use the resource name. The reason being is that it is not really feasibly to expect operators/users to have to represent the full nested_stack structure in their resource_registry. They would have to query and record nested_stack names just to refer to a given server resource. Each ResourceGroup nested stack would be have to be individually represented, etc. Unless there is another way I'm overlooking. Whether or not the restricted_actions approach is taken, is Heat interested in this functionality natively? I think it would make for a much cleaner implementation than something TripleO specific. I can work on a Heat spec if there's interest, though I'd like to get some early feedback. Thanks. -- -- James Slagle -- __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev