Re: [openstack-dev] [TripleO][Heat] Selectively disabling deployment resources

2017-03-08 Thread Zane Bitter

On 08/03/17 10:05, James Slagle wrote:

On Tue, Mar 7, 2017 at 7:24 PM, Zane Bitter  wrote:

On 07/03/17 14:34, James Slagle wrote:


I've been working on this spec for TripleO:
https://review.openstack.org/#/c/431745/

which allows users to selectively disable Heat deployment resources
for a given server (or server in the case of a *DeloymentGroup
resource).



I'm not completely clear on what this means. You can selectively disable
resources with conditionals. But I think you mean that you want to
selectively disable *changes* to resources?


Yes, that's right. The reason I can't use conditionals is that I still
want the SoftwareDeploymentGroup resources to be updated, but I may
want to selectively exclude servers from the group that is passed in
via the servers property. E.g., instead of updating the deployment
metadata for *all* computes, I may want to exclude a single compute
that is temporarily unreachable, without that failing the whole
stack-update.


Have you seen the filter function?

http://git.openstack.org/cgit/openstack/heat/tree/heat/engine/hot/functions.py#n1279


I started by taking an approach that would be specific to TripleO.
Basically mapping all the deployment resources to a nested stack
containing the logic to selectively disable servers from the
deployment (using yaql) based on a provided parameter value. Here's
the main patch: https://review.openstack.org/#/c/442681/

After considering that complexity, particularly the yaql expression,
I'm wondering if it would be better to add this support natively to
Heat.

I was looking at the restricted_actions key in the resource_registry
and was thinking this might be a reasonable place to add such support.
It would require some changes to how restricted_actions work.

One change would be a method for specifying that restricted_actions
should not fail the stack operation if an action would have otherwise
been triggered. Currently the behavior is to raise an exception and
mark the stack failed if an action needs to be taken but has been
marked restricted. That would need to be tweaked to allow specifying
that that we don't want the stack to fail. One thought would be to
change the allowed values of restricted_actions to:

replace_fail
replace_ignore
update_fail
update_ignore
replace
update

where replace and update were synonyms for replace_fail/update_fail to
maintain backwards compatibility.



Anything that involves the resource definition in the template changing but
Heat not modifying the resource is problematic, because that messes with
Heat's internal bookkeeping.


I don't think this case would violate that principle. The template +
environment files would match what Heat has done. After an update, the
2 would be in sync as to what servers the updated Deployment resource
was triggered.


I'm afraid I can't agree; it isn't that straightforward. Also, if you 
want to implement a generic mechanism that applies to every kind of 
resource (like restricted_actions do) then it isn't enough for it to 
work in one particular use case.



Another change would be to add logic to the Deployment resources
themselves to consider if any restricted_actions have been set on an
Server resources before triggering an updated deployment for a given
server.



Why not just a property, "no_new_deployments_please: true"?


That would actually work and be pretty straightforward I think. We
could have a map parameter with server names and the property that the
user could use to set the value.


The tricky part, since this would presumably be implemented in the 
software deployment API itself, would be how to keep the Heat 
SoftwareDeployment resource in sync with what's actually happening, so 
that the Right Thing happens again when you start doing new deployments.


cheers,
Zane.


The reason why I was initially not considering this route was because
it doesn't allow the user to disable only some deployments for a given
server. It's all or nothing. However, it's much simpler than a totally
flexible option, and it addresses 2 of the largest use cases of this
feature. I'll look into this route a bit more.




__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO][Heat] Selectively disabling deployment resources

2017-03-08 Thread James Slagle
On Wed, Mar 8, 2017 at 4:08 AM, Steven Hardy  wrote:
> On Tue, Mar 07, 2017 at 02:34:50PM -0500, James Slagle wrote:
>> I've been working on this spec for TripleO:
>> https://review.openstack.org/#/c/431745/
>>
>> which allows users to selectively disable Heat deployment resources
>> for a given server (or server in the case of a *DeloymentGroup
>> resource).
>>
>> Some of the main use cases in TripleO for such a feature are scaling
>> out compute nodes where you do not need to rerun Puppet (or make any
>> changes at all) on non-compute nodes, or to exclude nodes from hanging
>> a stack-update if you know they are unreachable or degraded for some
>> reason. There are others, but those are 2 of the major use cases.
>
> Thanks for raising this, I know it's been a pain point for some users of
> TripleO.
>
> However I think we're conflating two different issues here:
>
> 1. Don't re-run puppet (or yum update) when no other changes have happened
>
> 2. Disable deployment resources when changes have happened

Yea, possibly, but (1) doesn't really solve the use cases in the spec.
It'd certainly be a small improvement, but it's not really what users
are asking for.

(2) is much more difficult to reason about because we in fact have to
execute puppet to fully determine if changes have happened.

I don't really think these two are conflated. For some purposes, the
2nd is just a more abstract definition of the first. For better or
worse, part of the reason people are asking for this feature is
because they don't want to undo manual changes. While that's not
something we should really spend a lot of time solving for, the fact
is that OpenStack architecture allows for horizontally scaling compute
nodes without have to touch every other single node in your deployment
but TripleO can't take advantage of that.

So, just giving users a way to opt out of the generated unique
identifier triggering the puppet applys and other deployments,
wouldn't help them if they unintentionally changed some other hiera
data that triggers a deployment.

Plus, we have some deployments that are going to execute every time
outside of unique identifiers being generated (hosts-config.yaml).

> (1) is actually very simple, and is the default behavior of Heat
> (SoftwareDeployment resources never update unless either the config
> referenced or the input_values change).  We just need to provide an option
> to disable the DeployIdentifier/UpdateIdentifier timestamps from being
> generated in tripleoclient.
>
> (2) is harder, because the whole point of SoftwareDeploymentGroup is to run
> the exact same configuration on a group of servers, with no exceptions.
>
> As Zane mentions (2) is related to the way ResourceGroup works, but the
> problem here isn't ResourceGroup per-se, as it would in theory be pretty
> easy to reimplement SoftwareDeploymentGroup to generate it's nested stack
> without inheriting from ResourceGroup (which may be needed if you want a
> flag to make existing Deployments in the group immutable).
>
> I'd suggest we solve (1) and do some testing, it may be enough to solve the
> "don't change computes on scale-out" case at least?

Possibly, as long as no other deployments are triggered. I think of
the use case more as:

add a compute node(s), don't touch any existing nodes to minimize risk

as opposed to:

add a compute node(s), don't re-run puppet on any existing nodes as I
know that it's not needed

For the scale out case, the desire to minimize risk is a big part of
why other nodes don't need to be touched.

>
> One way to potentially solve (2) would be to unroll the
> SoftwareDeploymentGroup resources and instead generate the Deployment
> resources via jinja2 - this would enable completely removing them on update
> if that's what is desired, similar to what we already do for upgrades to
> e.g not upgrade any compute nodes.

Thanks, I hadn't considered that approach, but will look into it. I'd
guess you'd still need a parameter or map data fed into the jinja2
templating, so that it would not generate the deployment resources
based on what was desired to be disabled. Or, this could use
conditionals perhaps.



-- 
-- James Slagle
--

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO][Heat] Selectively disabling deployment resources

2017-03-08 Thread James Slagle
On Tue, Mar 7, 2017 at 7:24 PM, Zane Bitter  wrote:
> On 07/03/17 14:34, James Slagle wrote:
>>
>> I've been working on this spec for TripleO:
>> https://review.openstack.org/#/c/431745/
>>
>> which allows users to selectively disable Heat deployment resources
>> for a given server (or server in the case of a *DeloymentGroup
>> resource).
>
>
> I'm not completely clear on what this means. You can selectively disable
> resources with conditionals. But I think you mean that you want to
> selectively disable *changes* to resources?

Yes, that's right. The reason I can't use conditionals is that I still
want the SoftwareDeploymentGroup resources to be updated, but I may
want to selectively exclude servers from the group that is passed in
via the servers property. E.g., instead of updating the deployment
metadata for *all* computes, I may want to exclude a single compute
that is temporarily unreachable, without that failing the whole
stack-update.

>> I started by taking an approach that would be specific to TripleO.
>> Basically mapping all the deployment resources to a nested stack
>> containing the logic to selectively disable servers from the
>> deployment (using yaql) based on a provided parameter value. Here's
>> the main patch: https://review.openstack.org/#/c/442681/
>>
>> After considering that complexity, particularly the yaql expression,
>> I'm wondering if it would be better to add this support natively to
>> Heat.
>>
>> I was looking at the restricted_actions key in the resource_registry
>> and was thinking this might be a reasonable place to add such support.
>> It would require some changes to how restricted_actions work.
>>
>> One change would be a method for specifying that restricted_actions
>> should not fail the stack operation if an action would have otherwise
>> been triggered. Currently the behavior is to raise an exception and
>> mark the stack failed if an action needs to be taken but has been
>> marked restricted. That would need to be tweaked to allow specifying
>> that that we don't want the stack to fail. One thought would be to
>> change the allowed values of restricted_actions to:
>>
>> replace_fail
>> replace_ignore
>> update_fail
>> update_ignore
>> replace
>> update
>>
>> where replace and update were synonyms for replace_fail/update_fail to
>> maintain backwards compatibility.
>
>
> Anything that involves the resource definition in the template changing but
> Heat not modifying the resource is problematic, because that messes with
> Heat's internal bookkeeping.

I don't think this case would violate that principle. The template +
environment files would match what Heat has done. After an update, the
2 would be in sync as to what servers the updated Deployment resource
was triggered.

>
>> Another change would be to add logic to the Deployment resources
>> themselves to consider if any restricted_actions have been set on an
>> Server resources before triggering an updated deployment for a given
>> server.
>
>
> Why not just a property, "no_new_deployments_please: true"?

That would actually work and be pretty straightforward I think. We
could have a map parameter with server names and the property that the
user could use to set the value.

The reason why I was initially not considering this route was because
it doesn't allow the user to disable only some deployments for a given
server. It's all or nothing. However, it's much simpler than a totally
flexible option, and it addresses 2 of the largest use cases of this
feature. I'll look into this route a bit more.

-- 
-- James Slagle
--

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO][Heat] Selectively disabling deployment resources

2017-03-08 Thread Steven Hardy
On Tue, Mar 07, 2017 at 02:34:50PM -0500, James Slagle wrote:
> I've been working on this spec for TripleO:
> https://review.openstack.org/#/c/431745/
>
> which allows users to selectively disable Heat deployment resources
> for a given server (or server in the case of a *DeloymentGroup
> resource).
> 
> Some of the main use cases in TripleO for such a feature are scaling
> out compute nodes where you do not need to rerun Puppet (or make any
> changes at all) on non-compute nodes, or to exclude nodes from hanging
> a stack-update if you know they are unreachable or degraded for some
> reason. There are others, but those are 2 of the major use cases.

Thanks for raising this, I know it's been a pain point for some users of
TripleO.

However I think we're conflating two different issues here:

1. Don't re-run puppet (or yum update) when no other changes have happened

2. Disable deployment resources when changes have happened

(1) is actually very simple, and is the default behavior of Heat
(SoftwareDeployment resources never update unless either the config
referenced or the input_values change).  We just need to provide an option
to disable the DeployIdentifier/UpdateIdentifier timestamps from being
generated in tripleoclient.

(2) is harder, because the whole point of SoftwareDeploymentGroup is to run
the exact same configuration on a group of servers, with no exceptions.

As Zane mentions (2) is related to the way ResourceGroup works, but the
problem here isn't ResourceGroup per-se, as it would in theory be pretty
easy to reimplement SoftwareDeploymentGroup to generate it's nested stack
without inheriting from ResourceGroup (which may be needed if you want a
flag to make existing Deployments in the group immutable).

I'd suggest we solve (1) and do some testing, it may be enough to solve the
"don't change computes on scale-out" case at least?

One way to potentially solve (2) would be to unroll the
SoftwareDeploymentGroup resources and instead generate the Deployment
resources via jinja2 - this would enable completely removing them on update
if that's what is desired, similar to what we already do for upgrades to
e.g not upgrade any compute nodes.

Steve

> 
> I started by taking an approach that would be specific to TripleO.
> Basically mapping all the deployment resources to a nested stack
> containing the logic to selectively disable servers from the
> deployment (using yaql) based on a provided parameter value. Here's
> the main patch: https://review.openstack.org/#/c/442681/
> 
> After considering that complexity, particularly the yaql expression,
> I'm wondering if it would be better to add this support natively to
> Heat.
> 
> I was looking at the restricted_actions key in the resource_registry
> and was thinking this might be a reasonable place to add such support.
> It would require some changes to how restricted_actions work.
> 
> One change would be a method for specifying that restricted_actions
> should not fail the stack operation if an action would have otherwise
> been triggered. Currently the behavior is to raise an exception and
> mark the stack failed if an action needs to be taken but has been
> marked restricted. That would need to be tweaked to allow specifying
> that that we don't want the stack to fail. One thought would be to
> change the allowed values of restricted_actions to:
> 
> replace_fail
> replace_ignore
> update_fail
> update_ignore
> replace
> update
> 
> where replace and update were synonyms for replace_fail/update_fail to
> maintain backwards compatibility.
> 
> Another change would be to add logic to the Deployment resources
> themselves to consider if any restricted_actions have been set on an
> Server resources before triggering an updated deployment for a given
> server.
> 
> It also might be nice to allow specifying restricted_actions on the
> server's name property (which typically is the hostname) instead of
> having to use the resource name. The reason being is that it is not
> really feasibly to expect operators/users to have to represent the
> full nested_stack structure in their resource_registry. They would
> have to query and record nested_stack names just to refer to a given
> server resource. Each ResourceGroup nested stack would be have to be
> individually represented, etc. Unless there is another way I'm
> overlooking.
> 
> Whether or not the restricted_actions approach is taken, is Heat
> interested in this functionality natively? I think it would make for a
> much cleaner implementation than something TripleO specific. I can
> work on a Heat spec if there's interest, though I'd like to get some
> early feedback.
> 
> Thanks.
> 
> -- 
> -- James Slagle
> --
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

-- 

Re: [openstack-dev] [TripleO][Heat] Selectively disabling deployment resources

2017-03-07 Thread Zane Bitter

On 07/03/17 14:34, James Slagle wrote:

I've been working on this spec for TripleO:
https://review.openstack.org/#/c/431745/

which allows users to selectively disable Heat deployment resources
for a given server (or server in the case of a *DeloymentGroup
resource).


I'm not completely clear on what this means. You can selectively disable 
resources with conditionals. But I think you mean that you want to 
selectively disable *changes* to resources?



Some of the main use cases in TripleO for such a feature are scaling
out compute nodes where you do not need to rerun Puppet (or make any
changes at all) on non-compute nodes, or to exclude nodes from hanging
a stack-update if you know they are unreachable or degraded for some
reason. There are others, but those are 2 of the major use cases.


I think you're running up against a limitation of the scaling group 
implementation in Heat. In AWS Autoscaling, you have a LaunchConfig 
associated with a group that is used when scaling up to create new 
members, but existing members are not changed when you specify a new 
LaunchConfig unless you also specifically include a rolling update 
UpdatePolicy. (That isn't a great interface in CloudFormation, but it 
works and I can't actually think of anything better.)


Heat's AWS-style resources work similarly. Heat's native autoscaling 
group resources don't have a separate LaunchConfig, and although they 
used to work similarly to the AWS ones with respect to when they would 
update existing members, IIRC somebody decided that was a "bug" and 
"fixed" it.


In any event, TripleO uses ResourceGroup, and the very existence of 
ResourceGroup is predicated on the idea that you can just generate the 
nested template by making copies of the inline resource definition - 
that is, the idea that you'll *never* need this feature which it turns 
out you do, in fact, need. TripleO can't move away from ResourceGroup 
because it relies on it to auto-assign pre-chosen names for specific 
servers.


Senlin, for the record, gets this right.


I started by taking an approach that would be specific to TripleO.
Basically mapping all the deployment resources to a nested stack
containing the logic to selectively disable servers from the
deployment (using yaql) based on a provided parameter value. Here's
the main patch: https://review.openstack.org/#/c/442681/

After considering that complexity, particularly the yaql expression,
I'm wondering if it would be better to add this support natively to
Heat.

I was looking at the restricted_actions key in the resource_registry
and was thinking this might be a reasonable place to add such support.
It would require some changes to how restricted_actions work.

One change would be a method for specifying that restricted_actions
should not fail the stack operation if an action would have otherwise
been triggered. Currently the behavior is to raise an exception and
mark the stack failed if an action needs to be taken but has been
marked restricted. That would need to be tweaked to allow specifying
that that we don't want the stack to fail. One thought would be to
change the allowed values of restricted_actions to:

replace_fail
replace_ignore
update_fail
update_ignore
replace
update

where replace and update were synonyms for replace_fail/update_fail to
maintain backwards compatibility.


Anything that involves the resource definition in the template changing 
but Heat not modifying the resource is problematic, because that messes 
with Heat's internal bookkeeping.



Another change would be to add logic to the Deployment resources
themselves to consider if any restricted_actions have been set on an
Server resources before triggering an updated deployment for a given
server.


Why not just a property, "no_new_deployments_please: true"?


It also might be nice to allow specifying restricted_actions on the
server's name property (which typically is the hostname) instead of
having to use the resource name. The reason being is that it is not
really feasibly to expect operators/users to have to represent the
full nested_stack structure in their resource_registry. They would
have to query and record nested_stack names just to refer to a given
server resource. Each ResourceGroup nested stack would be have to be
individually represented, etc. Unless there is another way I'm
overlooking.

Whether or not the restricted_actions approach is taken, is Heat
interested in this functionality natively? I think it would make for a
much cleaner implementation than something TripleO specific. I can
work on a Heat spec if there's interest, though I'd like to get some
early feedback.

Thanks.




__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [TripleO][Heat] Selectively disabling deployment resources

2017-03-07 Thread James Slagle
I've been working on this spec for TripleO:
https://review.openstack.org/#/c/431745/

which allows users to selectively disable Heat deployment resources
for a given server (or server in the case of a *DeloymentGroup
resource).

Some of the main use cases in TripleO for such a feature are scaling
out compute nodes where you do not need to rerun Puppet (or make any
changes at all) on non-compute nodes, or to exclude nodes from hanging
a stack-update if you know they are unreachable or degraded for some
reason. There are others, but those are 2 of the major use cases.

I started by taking an approach that would be specific to TripleO.
Basically mapping all the deployment resources to a nested stack
containing the logic to selectively disable servers from the
deployment (using yaql) based on a provided parameter value. Here's
the main patch: https://review.openstack.org/#/c/442681/

After considering that complexity, particularly the yaql expression,
I'm wondering if it would be better to add this support natively to
Heat.

I was looking at the restricted_actions key in the resource_registry
and was thinking this might be a reasonable place to add such support.
It would require some changes to how restricted_actions work.

One change would be a method for specifying that restricted_actions
should not fail the stack operation if an action would have otherwise
been triggered. Currently the behavior is to raise an exception and
mark the stack failed if an action needs to be taken but has been
marked restricted. That would need to be tweaked to allow specifying
that that we don't want the stack to fail. One thought would be to
change the allowed values of restricted_actions to:

replace_fail
replace_ignore
update_fail
update_ignore
replace
update

where replace and update were synonyms for replace_fail/update_fail to
maintain backwards compatibility.

Another change would be to add logic to the Deployment resources
themselves to consider if any restricted_actions have been set on an
Server resources before triggering an updated deployment for a given
server.

It also might be nice to allow specifying restricted_actions on the
server's name property (which typically is the hostname) instead of
having to use the resource name. The reason being is that it is not
really feasibly to expect operators/users to have to represent the
full nested_stack structure in their resource_registry. They would
have to query and record nested_stack names just to refer to a given
server resource. Each ResourceGroup nested stack would be have to be
individually represented, etc. Unless there is another way I'm
overlooking.

Whether or not the restricted_actions approach is taken, is Heat
interested in this functionality natively? I think it would make for a
much cleaner implementation than something TripleO specific. I can
work on a Heat spec if there's interest, though I'd like to get some
early feedback.

Thanks.

-- 
-- James Slagle
--

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev