Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

2015-01-09 Thread Clint Byrum
Excerpts from Zane Bitter's message of 2015-01-09 14:57:21 -0800:
> On 08/01/15 05:39, Anant Patil wrote:
> > 1. The stack was failing when there were single disjoint resources or
> > just one resource in template. The graph did not include this resource
> > due to a minor bug in dependency_names(). I have added a test case and
> > fix here:
> > https://github.com/anantpatil/heat-convergence-prototype/commit/b58abd77cf596475ecf3f19ed38adf8ad3bb6b3b
> 
> Thanks, sorry about that! I will push a patch to fix it up.
> 
> > 2. The resource graph is created with keys in both forward order
> > traversal and reverse order traversal and the update will finish the
> > forward order and attempt the reverse order. If this is the case, then
> > the update-replaced resources will be deleted before the update is
> > complete and if the update fails, the old resource is not available for
> > roll-back; a new resource has to be created then. I have added a test
> > case at the above mentioned location.
> >
> > In our PoC, the updates (concurrent updates) won't remove a
> > update-replaced resource until all the resources are updated, and
> > resource clean-up phase is started.
> 
> Hmmm, this is a really interesting question actually. That's certainly 
> not how Heat works at the moment; we've always assumed that rollback is 
> "best-effort" at recovering the exact resources you had before. It would 
> be great to have users weigh in on how they expect this to behave. I'm 
> curious now what CloudFormation does.
> 
> I'm reluctant to change it though because I'm pretty sure this is 
> definitely *not* how you would want e.g. a rolling update of an 
> autoscaling group to happen.
> 
> > It is unacceptable to remove the old
> > resource to be rolled-back to since it may have changes which the user
> > doesn't want to loose;
> 
> If they didn't want to lose it they shouldn't have tried an update that 
> would replace it. If an update causes a replacement or an interruption 
> to service then I consider the same fair game for the rollback - the 
> user has already given us permission for that kind of change. (Whether 
> the user's consent was informed is a separate question, addressed by 
> Ryan's update-preview work.)
> 

In the original vision we had for using scaled groups to manage, say,
nova-compute nodes, you definitely can't "create" new servers, so you
can't just create all the new instances without de-allocating some.

That said, thats why we are using in-place methods like rebuild.

I think it would be acceptable to have cleanup run asynchronously,
and to have rollback re-create anything that has already been cleaned up.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

2015-01-09 Thread Zane Bitter

On 08/01/15 05:39, Anant Patil wrote:

1. The stack was failing when there were single disjoint resources or
just one resource in template. The graph did not include this resource
due to a minor bug in dependency_names(). I have added a test case and
fix here:
https://github.com/anantpatil/heat-convergence-prototype/commit/b58abd77cf596475ecf3f19ed38adf8ad3bb6b3b


Thanks, sorry about that! I will push a patch to fix it up.


2. The resource graph is created with keys in both forward order
traversal and reverse order traversal and the update will finish the
forward order and attempt the reverse order. If this is the case, then
the update-replaced resources will be deleted before the update is
complete and if the update fails, the old resource is not available for
roll-back; a new resource has to be created then. I have added a test
case at the above mentioned location.

In our PoC, the updates (concurrent updates) won't remove a
update-replaced resource until all the resources are updated, and
resource clean-up phase is started.


Hmmm, this is a really interesting question actually. That's certainly 
not how Heat works at the moment; we've always assumed that rollback is 
"best-effort" at recovering the exact resources you had before. It would 
be great to have users weigh in on how they expect this to behave. I'm 
curious now what CloudFormation does.


I'm reluctant to change it though because I'm pretty sure this is 
definitely *not* how you would want e.g. a rolling update of an 
autoscaling group to happen.



It is unacceptable to remove the old
resource to be rolled-back to since it may have changes which the user
doesn't want to loose;


If they didn't want to lose it they shouldn't have tried an update that 
would replace it. If an update causes a replacement or an interruption 
to service then I consider the same fair game for the rollback - the 
user has already given us permission for that kind of change. (Whether 
the user's consent was informed is a separate question, addressed by 
Ryan's update-preview work.)



and that's why probably they use the roll-back
flag.


I don't think there's any basis for saying that. People use the rollback 
flag because they want the stack left in a consistent state even if an 
error occurs.


cheers,
Zane.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

2015-01-08 Thread Anant Patil
On 08-Jan-15 16:09, Anant Patil wrote:
> On 16-Dec-14 09:41, Zane Bitter wrote:
>> On 15/12/14 09:32, Anant Patil wrote:
>>> On 12-Dec-14 06:29, Zane Bitter wrote:
 On 11/12/14 01:14, Anant Patil wrote:
> On 04-Dec-14 10:49, Zane Bitter wrote:
>> On 01/12/14 02:02, Anant Patil wrote:
>>> On GitHub:https://github.com/anantpatil/heat-convergence-poc
>>
>> I'm trying to review this code at the moment, and finding some stuff I
>> don't understand:
>>
>> https://github.com/anantpatil/heat-convergence-poc/blob/master/heat/engine/stack.py#L911-L916
>>
>> This appears to loop through all of the resources *prior* to kicking off
>> any actual updates to check if the resource will change. This is
>> impossible to do in general, since a resource may obtain a property
>> value from an attribute of another resource and there is no way to know
>> whether an update to said other resource would cause a change in the
>> attribute value.
>>
>> In addition, no attempt to catch UpdateReplace is made. Although that
>> looks like a simple fix, I'm now worried about the level to which this
>> code has been tested.
>>
> We were working on new branch and as we discussed on Skype, we have
> handled all these cases. Please have a look at our current branch:
> https://github.com/anantpatil/heat-convergence-poc/tree/graph-version
>
> When a new resource is taken for convergence, its children are loaded
> and the resource definition is re-parsed. The frozen resource definition
> will have all the "get_attr" resolved.
>
>>
>> I'm also trying to wrap my head around how resources are cleaned up in
>> dependency order. If I understand correctly, you store in the
>> ResourceGraph table the dependencies between various resource names in
>> the current template (presumably there could also be some left around
>> from previous templates too?). For each resource name there may be a
>> number of rows in the Resource table, each with an incrementing version.
>> As far as I can tell though, there's nowhere that the dependency graph
>> for _previous_ templates is persisted? So if the dependency order
>> changes in the template we have no way of knowing the correct order to
>> clean up in any more? (There's not even a mechanism to associate a
>> resource version with a particular template, which might be one avenue
>> by which to recover the dependencies.)
>>
>> I think this is an important case we need to be able to handle, so I
>> added a scenario to my test framework to exercise it and discovered that
>> my implementation was also buggy. Here's the fix:
>> https://github.com/zaneb/heat-convergence-prototype/commit/786f367210ca0acf9eb22bea78fd9d51941b0e40
>>
>
> Thanks for pointing this out Zane. We too had a buggy implementation for
> handling inverted dependency. I had a hard look at our algorithm where
> we were continuously merging the edges from new template into the edges
> from previous updates. It was an optimized way of traversing the graph
> in both forward and reverse order with out missing any resources. But,
> when the dependencies are inverted,  this wouldn't work.
>
> We have changed our algorithm. The changes in edges are noted down in
> DB, only the delta of edges from previous template is calculated and
> kept. At any given point of time, the graph table has all the edges from
> current template and delta from previous templates. Each edge has
> template ID associated with it.

 The thing is, the cleanup dependencies aren't really about the template.
 The real resources really depend on other real resources. You can't
 delete a Volume before its VolumeAttachment, not because it says so in
 the template but because it will fail if you try. The template can give
 us a rough guide in advance to what those dependencies will be, but if
 that's all we keep then we are discarding information.

 There may be multiple versions of a resource corresponding to one
 template version. Even worse, the actual dependencies of a resource
 change on a smaller time scale than an entire stack update (this is the
 reason the current implementation updates the template one resource at a
 time as we go).

>>>
>>> Absolutely! The edges from the template are kept only for the reference
>>> purposes. When we have a resource in new template, its template ID will
>>> also be marked to current template. At any point of time, realized
>>> resource will from current template, even if it were found in previous
>>> templates. The template ID "moves" for a resource if it is found.
>>
>> In theory (disclaimer: I didn't implement this yet) it can change on an 
>> even smaller timescale than that. The existing plugins are something of 
>> a black box to us: if a failure occ

Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

2015-01-08 Thread Anant Patil
On 16-Dec-14 09:41, Zane Bitter wrote:
> On 15/12/14 09:32, Anant Patil wrote:
>> On 12-Dec-14 06:29, Zane Bitter wrote:
>>> On 11/12/14 01:14, Anant Patil wrote:
 On 04-Dec-14 10:49, Zane Bitter wrote:
> On 01/12/14 02:02, Anant Patil wrote:
>> On GitHub:https://github.com/anantpatil/heat-convergence-poc
>
> I'm trying to review this code at the moment, and finding some stuff I
> don't understand:
>
> https://github.com/anantpatil/heat-convergence-poc/blob/master/heat/engine/stack.py#L911-L916
>
> This appears to loop through all of the resources *prior* to kicking off
> any actual updates to check if the resource will change. This is
> impossible to do in general, since a resource may obtain a property
> value from an attribute of another resource and there is no way to know
> whether an update to said other resource would cause a change in the
> attribute value.
>
> In addition, no attempt to catch UpdateReplace is made. Although that
> looks like a simple fix, I'm now worried about the level to which this
> code has been tested.
>
 We were working on new branch and as we discussed on Skype, we have
 handled all these cases. Please have a look at our current branch:
 https://github.com/anantpatil/heat-convergence-poc/tree/graph-version

 When a new resource is taken for convergence, its children are loaded
 and the resource definition is re-parsed. The frozen resource definition
 will have all the "get_attr" resolved.

>
> I'm also trying to wrap my head around how resources are cleaned up in
> dependency order. If I understand correctly, you store in the
> ResourceGraph table the dependencies between various resource names in
> the current template (presumably there could also be some left around
> from previous templates too?). For each resource name there may be a
> number of rows in the Resource table, each with an incrementing version.
> As far as I can tell though, there's nowhere that the dependency graph
> for _previous_ templates is persisted? So if the dependency order
> changes in the template we have no way of knowing the correct order to
> clean up in any more? (There's not even a mechanism to associate a
> resource version with a particular template, which might be one avenue
> by which to recover the dependencies.)
>
> I think this is an important case we need to be able to handle, so I
> added a scenario to my test framework to exercise it and discovered that
> my implementation was also buggy. Here's the fix:
> https://github.com/zaneb/heat-convergence-prototype/commit/786f367210ca0acf9eb22bea78fd9d51941b0e40
>

 Thanks for pointing this out Zane. We too had a buggy implementation for
 handling inverted dependency. I had a hard look at our algorithm where
 we were continuously merging the edges from new template into the edges
 from previous updates. It was an optimized way of traversing the graph
 in both forward and reverse order with out missing any resources. But,
 when the dependencies are inverted,  this wouldn't work.

 We have changed our algorithm. The changes in edges are noted down in
 DB, only the delta of edges from previous template is calculated and
 kept. At any given point of time, the graph table has all the edges from
 current template and delta from previous templates. Each edge has
 template ID associated with it.
>>>
>>> The thing is, the cleanup dependencies aren't really about the template.
>>> The real resources really depend on other real resources. You can't
>>> delete a Volume before its VolumeAttachment, not because it says so in
>>> the template but because it will fail if you try. The template can give
>>> us a rough guide in advance to what those dependencies will be, but if
>>> that's all we keep then we are discarding information.
>>>
>>> There may be multiple versions of a resource corresponding to one
>>> template version. Even worse, the actual dependencies of a resource
>>> change on a smaller time scale than an entire stack update (this is the
>>> reason the current implementation updates the template one resource at a
>>> time as we go).
>>>
>>
>> Absolutely! The edges from the template are kept only for the reference
>> purposes. When we have a resource in new template, its template ID will
>> also be marked to current template. At any point of time, realized
>> resource will from current template, even if it were found in previous
>> templates. The template ID "moves" for a resource if it is found.
> 
> In theory (disclaimer: I didn't implement this yet) it can change on an 
> even smaller timescale than that. The existing plugins are something of 
> a black box to us: if a failure occurs we don't necessarily know whether 
> the real-world dependency is on the old or new version of another resource.
> 

Yes, a

Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

2014-12-18 Thread Zane Bitter

On 15/12/14 07:47, Murugan, Visnusaran wrote:

We have similar questions regarding other
areas in your implementation, which we believe if we understand the outline of 
your implementation. It is difficult to get
a hold on your approach just by looking at code. Docs strings / Etherpad will 
help.


I added a bunch of extra docstrings and comments:

https://github.com/zaneb/heat-convergence-prototype/commit/5d79e009196dc224bd588e19edef5f0939b04607

I also implemented a --pdb option that will automatically set 
breakpoints at the beginning of all of the asynchronous events, so that 
you'll be dropped into the debugger and can single-step through the 
code, look at variables and so on:


https://github.com/zaneb/heat-convergence-prototype/commit/2a7a56dde21cad979fae25acc9fb01c6b4d9c6f7

I hope that helps. If you have more questions, please feel free to ask.

cheers,
Zane.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

2014-12-18 Thread Clint Byrum
Excerpts from Anant Patil's message of 2014-12-16 07:36:58 -0800:
> On 16-Dec-14 00:59, Clint Byrum wrote:
> > Excerpts from Anant Patil's message of 2014-12-15 07:15:30 -0800:
> >> On 13-Dec-14 05:42, Zane Bitter wrote:
> >>> On 12/12/14 05:29, Murugan, Visnusaran wrote:
> >>>>
> >>>>
> >>>>> -Original Message-
> >>>>> From: Zane Bitter [mailto:zbit...@redhat.com]
> >>>>> Sent: Friday, December 12, 2014 6:37 AM
> >>>>> To: openstack-dev@lists.openstack.org
> >>>>> Subject: Re: [openstack-dev] [Heat] Convergence proof-of-concept
> >>>>> showdown
> >>>>>
> >>>>> On 11/12/14 08:26, Murugan, Visnusaran wrote:
> >>>>>>>> [Murugan, Visnusaran]
> >>>>>>>> In case of rollback where we have to cleanup earlier version of
> >>>>>>>> resources,
> >>>>>>> we could get the order from old template. We'd prefer not to have a
> >>>>>>> graph table.
> >>>>>>>
> >>>>>>> In theory you could get it by keeping old templates around. But that
> >>>>>>> means keeping a lot of templates, and it will be hard to keep track
> >>>>>>> of when you want to delete them. It also means that when starting an
> >>>>>>> update you'll need to load every existing previous version of the
> >>>>>>> template in order to calculate the dependencies. It also leaves the
> >>>>>>> dependencies in an ambiguous state when a resource fails, and
> >>>>>>> although that can be worked around it will be a giant pain to 
> >>>>>>> implement.
> >>>>>>>
> >>>>>>
> >>>>>> Agree that looking to all templates for a delete is not good. But
> >>>>>> baring Complexity, we feel we could achieve it by way of having an
> >>>>>> update and a delete stream for a stack update operation. I will
> >>>>>> elaborate in detail in the etherpad sometime tomorrow :)
> >>>>>>
> >>>>>>> I agree that I'd prefer not to have a graph table. After trying a
> >>>>>>> couple of different things I decided to store the dependencies in the
> >>>>>>> Resource table, where we can read or write them virtually for free
> >>>>>>> because it turns out that we are always reading or updating the
> >>>>>>> Resource itself at exactly the same time anyway.
> >>>>>>>
> >>>>>>
> >>>>>> Not sure how this will work in an update scenario when a resource does
> >>>>>> not change and its dependencies do.
> >>>>>
> >>>>> We'll always update the requirements, even when the properties don't
> >>>>> change.
> >>>>>
> >>>>
> >>>> Can you elaborate a bit on rollback.
> >>>
> >>> I didn't do anything special to handle rollback. It's possible that we 
> >>> need to - obviously the difference in the UpdateReplace + rollback case 
> >>> is that the replaced resource is now the one we want to keep, and yet 
> >>> the replaced_by/replaces dependency will force the newer (replacement) 
> >>> resource to be checked for deletion first, which is an inversion of the 
> >>> usual order.
> >>>
> >>
> >> This is where the version is so handy! For UpdateReplaced ones, there is
> >> an older version to go back to. This version could just be template ID,
> >> as I mentioned in another e-mail. All resources are at the current
> >> template ID if they are found in the current template, even if they is
> >> no need to update them. Otherwise, they need to be cleaned-up in the
> >> order given in the previous templates.
> >>
> >> I think the template ID is used as version as far as I can see in Zane's
> >> PoC. If the resource template key doesn't match the current template
> >> key, the resource is deleted. The version is misnomer here, but that
> >> field (template id) is used as though we had versions of resources.
> >>
> >>> However, I tried to think of a scenario where that would cause problems 
> >>> and I couldn't come up with one. Provi

Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

2014-12-18 Thread Gurjar, Unmesh

> -Original Message-
> From: Zane Bitter [mailto:zbit...@redhat.com]
> Sent: Thursday, December 18, 2014 7:42 AM
> To: openstack-dev@lists.openstack.org
> Subject: Re: [openstack-dev] [Heat] Convergence proof-of-concept
> showdown
> 
> On 17/12/14 13:05, Gurjar, Unmesh wrote:
> >> I'm storing a tuple of its name and database ID. The data structure
> >> is resource.GraphKey. I was originally using the name for something,
> >> but I suspect I could probably drop it now and just store the
> >> database ID, but I haven't tried it yet. (Having the name in there
> >> definitely makes debugging more pleasant though ;)
> >>
> >
> > I agree, having name might come in handy while debugging!
> >
> >> When I build the traversal graph each node is a tuple of the GraphKey
> >> and a boolean to indicate whether it corresponds to an update or a
> >> cleanup operation (both can appear for a single resource in the same
> graph).
> >
> > Just to confirm my understanding, cleanup operation takes care of both:
> > 1. resources which are deleted as a part of update and 2. previous
> > versioned resource which was updated by replacing with a new resource
> > (UpdateReplace scenario)
> 
> Yes, correct. Also:
> 
> 3. resource versions which failed to delete for whatever reason on a previous
> traversal
> 
> > Also, the cleanup operation is performed after the update completes
> successfully.
> 
> NO! They are not separate things!
> 
> https://github.com/openstack/heat/blob/stable/juno/heat/engine/update.
> py#L177-L198
> 
> >>> If I am correct, you are updating all resources on update regardless
> >>> of their change which will be inefficient if stack contains a million
> resource.
> >>
> >> I'm calling update() on all resources regardless of change, but
> >> update() will only call handle_update() if something has changed
> >> (unless the plugin has overridden Resource._needs_update()).
> >>
> >> There's no way to know whether a resource needs to be updated before
> >> you're ready to update it, so I don't think of this as 'inefficient', just
> 'correct'.
> >>
> >>> We have similar questions regarding other areas in your
> >>> implementation, which we believe if we understand the outline of
> >>> your implementation. It is difficult to get a hold on your approach
> >>> just by looking
> >> at code. Docs strings / Etherpad will help.
> >>>
> >>>
> >>> About streams, Yes in a million resource stack, the data will be
> >>> huge, but
> >> less than template.
> >>
> >> No way, it's O(n^3) (cubed!) in the worst case to store streams for
> >> each resource.
> >>
> >>> Also this stream is stored
> >>> only In IN_PROGRESS resources.
> >>
> >> Now I'm really confused. Where does it come from if the resource
> >> doesn't get it until it's already in progress? And how will that 
> >> information
> help it?
> >>
> >
> > When an operation on stack is initiated, the stream will be identified.
> 
> OK, this may be one of the things I was getting confused about - I though a
> 'stream' belonged to one particular resource and just contained all of the
> paths to reaching that resource. But here it seems like you're saying that a
> 'stream' is a representation of the entire graph?
> So it's essentially just a gratuitously bloated NIH serialisation of the
> Dependencies graph?
> 
> > To begin
> > the operation, the action is initiated on the leaf (or root)
> > resource(s) and the stream is stored (only) in this/these IN_PROGRESS
> resource(s).
> 
> How does that work? Does it get deleted again when the resource moves to
> COMPLETE?
> 

Yes, IMO, upon resource completion, the stream can be deleted. I do not
foresee any situation where-in the storing the stream is required.
Since, when another operation is initiated on the stack, that template should
be parsed and the new stream should be identified and used.

> > The stream should then keep getting passed to the next/previous level
> > of resource(s) as and when the dependencies for the next/previous level
> of resource(s) are met.
> 
> That sounds... identical to the way it's implemented in my prototype (passing
> a serialisation of the graph down through the notification triggers), except 
> for
> the part about storing it in the Resource table.
> Why would we pers

Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

2014-12-17 Thread Zane Bitter

On 17/12/14 13:05, Gurjar, Unmesh wrote:

I'm storing a tuple of its name and database ID. The data structure is
resource.GraphKey. I was originally using the name for something, but I
suspect I could probably drop it now and just store the database ID, but I
haven't tried it yet. (Having the name in there definitely makes debugging
more pleasant though ;)



I agree, having name might come in handy while debugging!


When I build the traversal graph each node is a tuple of the GraphKey and a
boolean to indicate whether it corresponds to an update or a cleanup
operation (both can appear for a single resource in the same graph).


Just to confirm my understanding, cleanup operation takes care of both:
1. resources which are deleted as a part of update and
2. previous versioned resource which was updated by replacing with a new
resource (UpdateReplace scenario)


Yes, correct. Also:

3. resource versions which failed to delete for whatever reason on a 
previous traversal



Also, the cleanup operation is performed after the update completes 
successfully.


NO! They are not separate things!

https://github.com/openstack/heat/blob/stable/juno/heat/engine/update.py#L177-L198


If I am correct, you are updating all resources on update regardless
of their change which will be inefficient if stack contains a million resource.


I'm calling update() on all resources regardless of change, but update() will
only call handle_update() if something has changed (unless the plugin has
overridden Resource._needs_update()).

There's no way to know whether a resource needs to be updated before
you're ready to update it, so I don't think of this as 'inefficient', just 
'correct'.


We have similar questions regarding other areas in your
implementation, which we believe if we understand the outline of your
implementation. It is difficult to get a hold on your approach just by looking

at code. Docs strings / Etherpad will help.



About streams, Yes in a million resource stack, the data will be huge, but

less than template.

No way, it's O(n^3) (cubed!) in the worst case to store streams for each
resource.


Also this stream is stored
only In IN_PROGRESS resources.


Now I'm really confused. Where does it come from if the resource doesn't
get it until it's already in progress? And how will that information help it?



When an operation on stack is initiated, the stream will be identified.


OK, this may be one of the things I was getting confused about - I 
though a 'stream' belonged to one particular resource and just contained 
all of the paths to reaching that resource. But here it seems like 
you're saying that a 'stream' is a representation of the entire graph? 
So it's essentially just a gratuitously bloated NIH serialisation of the 
Dependencies graph?



To begin
the operation, the action is initiated on the leaf (or root) resource(s) and the
stream is stored (only) in this/these IN_PROGRESS resource(s).


How does that work? Does it get deleted again when the resource moves to 
COMPLETE?



The stream should then keep getting passed to the next/previous level of 
resource(s) as
and when the dependencies for the next/previous level of resource(s) are met.


That sounds... identical to the way it's implemented in my prototype 
(passing a serialisation of the graph down through the notification 
triggers), except for the part about storing it in the Resource table. 
Why would we persist to the database data that we only need for the 
duration that we already have it in memory anyway?


If we're going to persist it we should do so once, in the Stack table, 
at the time that we're preparing to start the traversal.



The reason to have entire dependency list to reduce DB queries while a

stack update.

But we never need to know that. We only need to know what just happened
and what to do next.



As mentioned earlier, each level of resources in a graph pass on the dependency
list/stream to their next/previous level of resources. This is information 
should further
be used to determine what is to be done next and drive the operation to 
completion.


Why would we store *and* forward?


When you have a singular dependency on each resources similar to your
implantation, then we will end up loading Dependencies one at a time and

altering almost all resource's dependency regardless of their change.


Regarding a 2 template approach for delete, it is not actually 2
different templates. Its just that we have a delete stream To be taken up

post update.

That would be a regression from Heat's current behaviour, where we start
cleaning up resources as soon as they have nothing depending on them.
There's not even a reason to make it worse than what we already have,
because it's actually a lot _easier_ to treat update and clean up as the same
kind of operation and throw both into the same big graph. The dual
implementations and all of the edge cases go away and you can just trust in
the graph traversal to do the Right Thing in th

Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

2014-12-17 Thread Gurjar, Unmesh
> -Original Message-
> From: Zane Bitter [mailto:zbit...@redhat.com]
> Sent: Tuesday, December 16, 2014 9:43 AM
> To: openstack-dev@lists.openstack.org
> Subject: Re: [openstack-dev] [Heat] Convergence proof-of-concept
> showdown
> 
> On 15/12/14 07:47, Murugan, Visnusaran wrote:
> > Hi Zane,
> >
> > We have been going through this chain for quite some time now and we
> still feel a disconnect in our understanding.
> 
> Yes, I thought last week that we were on the same page, but now it looks like
> we're drifting off again :(
> 
> > Can you put up a etherpad where we can understand your approach.
> 
> Maybe you could put up an etherpad with your questions. Practically all of
> the relevant code is in Stack._create_or_update, Stack._dependencies and
> Converger.check_resource. That's 134 lines of code by my count.
> There's not a lot more I can usefully say about it without knowing which parts
> exactly you're stuck on, but I can definitely answer specific questions.
> 
> > For example: for storing resource dependencies, Are you storing its
> > name, version tuple or just its ID.
> 
> I'm storing a tuple of its name and database ID. The data structure is
> resource.GraphKey. I was originally using the name for something, but I
> suspect I could probably drop it now and just store the database ID, but I
> haven't tried it yet. (Having the name in there definitely makes debugging
> more pleasant though ;)
> 

I agree, having name might come in handy while debugging!

> When I build the traversal graph each node is a tuple of the GraphKey and a
> boolean to indicate whether it corresponds to an update or a cleanup
> operation (both can appear for a single resource in the same graph).

Just to confirm my understanding, cleanup operation takes care of both:
1. resources which are deleted as a part of update and
2. previous versioned resource which was updated by replacing with a new 
resource (UpdateReplace scenario)
Also, the cleanup operation is performed after the update completes 
successfully. 

> 
> > If I am correct, you are updating all resources on update regardless
> > of their change which will be inefficient if stack contains a million 
> > resource.
> 
> I'm calling update() on all resources regardless of change, but update() will
> only call handle_update() if something has changed (unless the plugin has
> overridden Resource._needs_update()).
> 
> There's no way to know whether a resource needs to be updated before
> you're ready to update it, so I don't think of this as 'inefficient', just 
> 'correct'.
> 
> > We have similar questions regarding other areas in your
> > implementation, which we believe if we understand the outline of your
> > implementation. It is difficult to get a hold on your approach just by 
> > looking
> at code. Docs strings / Etherpad will help.
> >
> >
> > About streams, Yes in a million resource stack, the data will be huge, but
> less than template.
> 
> No way, it's O(n^3) (cubed!) in the worst case to store streams for each
> resource.
> 
> > Also this stream is stored
> > only In IN_PROGRESS resources.
> 
> Now I'm really confused. Where does it come from if the resource doesn't
> get it until it's already in progress? And how will that information help it?
> 

When an operation on stack is initiated, the stream will be identified. To begin
the operation, the action is initiated on the leaf (or root) resource(s) and the
stream is stored (only) in this/these IN_PROGRESS resource(s).
The stream should then keep getting passed to the next/previous level of 
resource(s) as
and when the dependencies for the next/previous level of resource(s) are met.

> > The reason to have entire dependency list to reduce DB queries while a
> stack update.
> 
> But we never need to know that. We only need to know what just happened
> and what to do next.
> 

As mentioned earlier, each level of resources in a graph pass on the dependency
list/stream to their next/previous level of resources. This is information 
should further
be used to determine what is to be done next and drive the operation to 
completion.

> > When you have a singular dependency on each resources similar to your
> > implantation, then we will end up loading Dependencies one at a time and
> altering almost all resource's dependency regardless of their change.
> >
> > Regarding a 2 template approach for delete, it is not actually 2
> > different templates. Its just that we have a delete stream To be taken up
> post update.
> 
> That would be a regression from Heat's current behaviour, where we st

Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

2014-12-16 Thread Anant Patil
On 16-Dec-14 00:59, Clint Byrum wrote:
> Excerpts from Anant Patil's message of 2014-12-15 07:15:30 -0800:
>> On 13-Dec-14 05:42, Zane Bitter wrote:
>>> On 12/12/14 05:29, Murugan, Visnusaran wrote:
>>>>
>>>>
>>>>> -Original Message-
>>>>> From: Zane Bitter [mailto:zbit...@redhat.com]
>>>>> Sent: Friday, December 12, 2014 6:37 AM
>>>>> To: openstack-dev@lists.openstack.org
>>>>> Subject: Re: [openstack-dev] [Heat] Convergence proof-of-concept
>>>>> showdown
>>>>>
>>>>> On 11/12/14 08:26, Murugan, Visnusaran wrote:
>>>>>>>> [Murugan, Visnusaran]
>>>>>>>> In case of rollback where we have to cleanup earlier version of
>>>>>>>> resources,
>>>>>>> we could get the order from old template. We'd prefer not to have a
>>>>>>> graph table.
>>>>>>>
>>>>>>> In theory you could get it by keeping old templates around. But that
>>>>>>> means keeping a lot of templates, and it will be hard to keep track
>>>>>>> of when you want to delete them. It also means that when starting an
>>>>>>> update you'll need to load every existing previous version of the
>>>>>>> template in order to calculate the dependencies. It also leaves the
>>>>>>> dependencies in an ambiguous state when a resource fails, and
>>>>>>> although that can be worked around it will be a giant pain to implement.
>>>>>>>
>>>>>>
>>>>>> Agree that looking to all templates for a delete is not good. But
>>>>>> baring Complexity, we feel we could achieve it by way of having an
>>>>>> update and a delete stream for a stack update operation. I will
>>>>>> elaborate in detail in the etherpad sometime tomorrow :)
>>>>>>
>>>>>>> I agree that I'd prefer not to have a graph table. After trying a
>>>>>>> couple of different things I decided to store the dependencies in the
>>>>>>> Resource table, where we can read or write them virtually for free
>>>>>>> because it turns out that we are always reading or updating the
>>>>>>> Resource itself at exactly the same time anyway.
>>>>>>>
>>>>>>
>>>>>> Not sure how this will work in an update scenario when a resource does
>>>>>> not change and its dependencies do.
>>>>>
>>>>> We'll always update the requirements, even when the properties don't
>>>>> change.
>>>>>
>>>>
>>>> Can you elaborate a bit on rollback.
>>>
>>> I didn't do anything special to handle rollback. It's possible that we 
>>> need to - obviously the difference in the UpdateReplace + rollback case 
>>> is that the replaced resource is now the one we want to keep, and yet 
>>> the replaced_by/replaces dependency will force the newer (replacement) 
>>> resource to be checked for deletion first, which is an inversion of the 
>>> usual order.
>>>
>>
>> This is where the version is so handy! For UpdateReplaced ones, there is
>> an older version to go back to. This version could just be template ID,
>> as I mentioned in another e-mail. All resources are at the current
>> template ID if they are found in the current template, even if they is
>> no need to update them. Otherwise, they need to be cleaned-up in the
>> order given in the previous templates.
>>
>> I think the template ID is used as version as far as I can see in Zane's
>> PoC. If the resource template key doesn't match the current template
>> key, the resource is deleted. The version is misnomer here, but that
>> field (template id) is used as though we had versions of resources.
>>
>>> However, I tried to think of a scenario where that would cause problems 
>>> and I couldn't come up with one. Provided we know the actual, real-world 
>>> dependencies of each resource I don't think the ordering of those two 
>>> checks matters.
>>>
>>> In fact, I currently can't think of a case where the dependency order 
>>> between replacement and replaced resources matters at all. It matters in 
>>> the current Heat implementation because resources are artificially 
>>> segmented into the current and bac

Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

2014-12-15 Thread Zane Bitter

On 15/12/14 10:15, Anant Patil wrote:

On 13-Dec-14 05:42, Zane Bitter wrote:

On 12/12/14 05:29, Murugan, Visnusaran wrote:




-Original Message-
From: Zane Bitter [mailto:zbit...@redhat.com]
Sent: Friday, December 12, 2014 6:37 AM
To: openstack-dev@lists.openstack.org
Subject: Re: [openstack-dev] [Heat] Convergence proof-of-concept
showdown

On 11/12/14 08:26, Murugan, Visnusaran wrote:

[Murugan, Visnusaran]
In case of rollback where we have to cleanup earlier version of
resources,

we could get the order from old template. We'd prefer not to have a
graph table.

In theory you could get it by keeping old templates around. But that
means keeping a lot of templates, and it will be hard to keep track
of when you want to delete them. It also means that when starting an
update you'll need to load every existing previous version of the
template in order to calculate the dependencies. It also leaves the
dependencies in an ambiguous state when a resource fails, and
although that can be worked around it will be a giant pain to implement.



Agree that looking to all templates for a delete is not good. But
baring Complexity, we feel we could achieve it by way of having an
update and a delete stream for a stack update operation. I will
elaborate in detail in the etherpad sometime tomorrow :)


I agree that I'd prefer not to have a graph table. After trying a
couple of different things I decided to store the dependencies in the
Resource table, where we can read or write them virtually for free
because it turns out that we are always reading or updating the
Resource itself at exactly the same time anyway.



Not sure how this will work in an update scenario when a resource does
not change and its dependencies do.


We'll always update the requirements, even when the properties don't
change.



Can you elaborate a bit on rollback.


I didn't do anything special to handle rollback. It's possible that we
need to - obviously the difference in the UpdateReplace + rollback case
is that the replaced resource is now the one we want to keep, and yet
the replaced_by/replaces dependency will force the newer (replacement)
resource to be checked for deletion first, which is an inversion of the
usual order.



This is where the version is so handy! For UpdateReplaced ones, there is


*sometimes*


an older version to go back to. This version could just be template ID,
as I mentioned in another e-mail. All resources are at the current
template ID if they are found in the current template, even if they is
no need to update them. Otherwise, they need to be cleaned-up in the
order given in the previous templates.

I think the template ID is used as version as far as I can see in Zane's
PoC. If the resource template key doesn't match the current template
key, the resource is deleted. The version is misnomer here, but that
field (template id) is used as though we had versions of resources.


Correct. Because if we had wanted to keep it, we would have updated it 
to the new template version already.



However, I tried to think of a scenario where that would cause problems
and I couldn't come up with one. Provided we know the actual, real-world
dependencies of each resource I don't think the ordering of those two
checks matters.

In fact, I currently can't think of a case where the dependency order
between replacement and replaced resources matters at all. It matters in
the current Heat implementation because resources are artificially
segmented into the current and backup stacks, but with a holistic view
of dependencies that may well not be required. I tried taking that line
out of the simulator code and all the tests still passed. If anybody can
think of a scenario in which it would make a difference, I would be very
interested to hear it.

In any event though, it should be no problem to reverse the direction of
that one edge in these particular circumstances if it does turn out to
be a problem.


We had an approach with depends_on
and needed_by columns in ResourceTable. But dropped it when we figured out
we had too many DB operations for Update.


Yeah, I initially ran into this problem too - you have a bunch of nodes
that are waiting on the current node, and now you have to go look them
all up in the database to see what else they're waiting on in order to
tell if they're ready to be triggered.

It turns out the answer is to distribute the writes but centralise the
reads. So at the start of the update, we read all of the Resources,
obtain their dependencies and build one central graph[1]. We than make
that graph available to each resource (either by passing it as a
notification parameter, or storing it somewhere central in the DB that
they will all have to read anyway, i.e. the Stack). But when we update a
dependency we don't update the central graph, we update the individual
Resource so there's no global lock required.

[1]
https://github.co

Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

2014-12-15 Thread Zane Bitter

On 15/12/14 07:47, Murugan, Visnusaran wrote:

Hi Zane,

We have been going through this chain for quite some time now and we still feel 
a disconnect in our understanding.


Yes, I thought last week that we were on the same page, but now it looks 
like we're drifting off again :(



Can you put up a etherpad where we can understand your approach.


Maybe you could put up an etherpad with your questions. Practically all 
of the relevant code is in Stack._create_or_update, Stack._dependencies 
and Converger.check_resource. That's 134 lines of code by my count. 
There's not a lot more I can usefully say about it without knowing which 
parts exactly you're stuck on, but I can definitely answer specific 
questions.



For example: for storing resource dependencies,
Are you storing its name, version tuple or just its ID.


I'm storing a tuple of its name and database ID. The data structure is 
resource.GraphKey. I was originally using the name for something, but I 
suspect I could probably drop it now and just store the database ID, but 
I haven't tried it yet. (Having the name in there definitely makes 
debugging more pleasant though ;)


When I build the traversal graph each node is a tuple of the GraphKey 
and a boolean to indicate whether it corresponds to an update or a 
cleanup operation (both can appear for a single resource in the same graph).



If I am correct, you are updating all resources on update regardless
of their change which will be inefficient if stack contains a million resource.


I'm calling update() on all resources regardless of change, but update() 
will only call handle_update() if something has changed (unless the 
plugin has overridden Resource._needs_update()).


There's no way to know whether a resource needs to be updated before 
you're ready to update it, so I don't think of this as 'inefficient', 
just 'correct'.



We have similar questions regarding other
areas in your implementation, which we believe if we understand the outline of 
your implementation. It is difficult to get
a hold on your approach just by looking at code. Docs strings / Etherpad will 
help.


About streams, Yes in a million resource stack, the data will be huge, but less 
than template.


No way, it's O(n^3) (cubed!) in the worst case to store streams for each 
resource.



Also this stream is stored
only In IN_PROGRESS resources.


Now I'm really confused. Where does it come from if the resource doesn't 
get it until it's already in progress? And how will that information 
help it?



The reason to have entire dependency list to reduce DB queries while a stack 
update.


But we never need to know that. We only need to know what just happened 
and what to do next.



When you have a singular dependency on each resources similar to your 
implantation, then we will end up loading
Dependencies one at a time and altering almost all resource's dependency 
regardless of their change.

Regarding a 2 template approach for delete, it is not actually 2 different 
templates. Its just that we have a delete stream
To be taken up post update.


That would be a regression from Heat's current behaviour, where we start 
cleaning up resources as soon as they have nothing depending on them. 
There's not even a reason to make it worse than what we already have, 
because it's actually a lot _easier_ to treat update and clean up as the 
same kind of operation and throw both into the same big graph. The dual 
implementations and all of the edge cases go away and you can just trust 
in the graph traversal to do the Right Thing in the most parallel way 
possible.



(Any post operation will be handled as an update) This approach is True when 
Rollback==True
We can always fall back to regular stream (non-delete stream) if Rollback=False


I don't understand what you're saying here.


In our view we would like to have only one basic operation and that is UPDATE.

1. CREATE will be an update where a realized graph == Empty
2. UPDATE will be an update where realized graph == Full/Partial realized 
(possibly with a delete stream as post operation if Rollback==True)
3. DELETE will be just another update with an empty to_be_realized_graph.


Yes, that goes without saying. In my implementation 
Stack._create_or_update() handles all three operations.



It would be great if we can freeze a stable approach by mid-week as Christmas 
vacations are round the corner.  :) :)


-Original Message-
From: Zane Bitter [mailto:zbit...@redhat.com]
Sent: Saturday, December 13, 2014 5:43 AM
To: openstack-dev@lists.openstack.org
Subject: Re: [openstack-dev] [Heat] Convergence proof-of-concept
showdown

On 12/12/14 05:29, Murugan, Visnusaran wrote:




-Original Message-
From: Zane Bitter [mailto:zbit...@redhat.com]
Sent: Friday, December 12, 2014 6:37 AM
To: openstack-dev@lists.openstack.org
Subject: Re: [openstack-de

Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

2014-12-15 Thread Zane Bitter

On 15/12/14 09:32, Anant Patil wrote:

On 12-Dec-14 06:29, Zane Bitter wrote:

On 11/12/14 01:14, Anant Patil wrote:

On 04-Dec-14 10:49, Zane Bitter wrote:

On 01/12/14 02:02, Anant Patil wrote:

On GitHub:https://github.com/anantpatil/heat-convergence-poc


I'm trying to review this code at the moment, and finding some stuff I
don't understand:

https://github.com/anantpatil/heat-convergence-poc/blob/master/heat/engine/stack.py#L911-L916

This appears to loop through all of the resources *prior* to kicking off
any actual updates to check if the resource will change. This is
impossible to do in general, since a resource may obtain a property
value from an attribute of another resource and there is no way to know
whether an update to said other resource would cause a change in the
attribute value.

In addition, no attempt to catch UpdateReplace is made. Although that
looks like a simple fix, I'm now worried about the level to which this
code has been tested.


We were working on new branch and as we discussed on Skype, we have
handled all these cases. Please have a look at our current branch:
https://github.com/anantpatil/heat-convergence-poc/tree/graph-version

When a new resource is taken for convergence, its children are loaded
and the resource definition is re-parsed. The frozen resource definition
will have all the "get_attr" resolved.



I'm also trying to wrap my head around how resources are cleaned up in
dependency order. If I understand correctly, you store in the
ResourceGraph table the dependencies between various resource names in
the current template (presumably there could also be some left around
from previous templates too?). For each resource name there may be a
number of rows in the Resource table, each with an incrementing version.
As far as I can tell though, there's nowhere that the dependency graph
for _previous_ templates is persisted? So if the dependency order
changes in the template we have no way of knowing the correct order to
clean up in any more? (There's not even a mechanism to associate a
resource version with a particular template, which might be one avenue
by which to recover the dependencies.)

I think this is an important case we need to be able to handle, so I
added a scenario to my test framework to exercise it and discovered that
my implementation was also buggy. Here's the fix:
https://github.com/zaneb/heat-convergence-prototype/commit/786f367210ca0acf9eb22bea78fd9d51941b0e40



Thanks for pointing this out Zane. We too had a buggy implementation for
handling inverted dependency. I had a hard look at our algorithm where
we were continuously merging the edges from new template into the edges
from previous updates. It was an optimized way of traversing the graph
in both forward and reverse order with out missing any resources. But,
when the dependencies are inverted,  this wouldn't work.

We have changed our algorithm. The changes in edges are noted down in
DB, only the delta of edges from previous template is calculated and
kept. At any given point of time, the graph table has all the edges from
current template and delta from previous templates. Each edge has
template ID associated with it.


The thing is, the cleanup dependencies aren't really about the template.
The real resources really depend on other real resources. You can't
delete a Volume before its VolumeAttachment, not because it says so in
the template but because it will fail if you try. The template can give
us a rough guide in advance to what those dependencies will be, but if
that's all we keep then we are discarding information.

There may be multiple versions of a resource corresponding to one
template version. Even worse, the actual dependencies of a resource
change on a smaller time scale than an entire stack update (this is the
reason the current implementation updates the template one resource at a
time as we go).



Absolutely! The edges from the template are kept only for the reference
purposes. When we have a resource in new template, its template ID will
also be marked to current template. At any point of time, realized
resource will from current template, even if it were found in previous
templates. The template ID "moves" for a resource if it is found.


In theory (disclaimer: I didn't implement this yet) it can change on an 
even smaller timescale than that. The existing plugins are something of 
a black box to us: if a failure occurs we don't necessarily know whether 
the real-world dependency is on the old or new version of another resource.



Given that our Resource entries in the DB are in 1:1 correspondence with
actual resources (we create a new one whenever we need to replace the
underlying resource), I found it makes the most conceptual and practical
sense to store the requirements in the resource itself, and update them
at the time they actually change in the real world (bonus: introduces no
new locking issues and no extra DB writes). I settled on this after a
legitimate attempt

Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

2014-12-15 Thread Clint Byrum
Excerpts from Anant Patil's message of 2014-12-15 07:15:30 -0800:
> On 13-Dec-14 05:42, Zane Bitter wrote:
> > On 12/12/14 05:29, Murugan, Visnusaran wrote:
> >>
> >>
> >>> -Original Message-
> >>> From: Zane Bitter [mailto:zbit...@redhat.com]
> >>> Sent: Friday, December 12, 2014 6:37 AM
> >>> To: openstack-dev@lists.openstack.org
> >>> Subject: Re: [openstack-dev] [Heat] Convergence proof-of-concept
> >>> showdown
> >>>
> >>> On 11/12/14 08:26, Murugan, Visnusaran wrote:
> >>>>>> [Murugan, Visnusaran]
> >>>>>> In case of rollback where we have to cleanup earlier version of
> >>>>>> resources,
> >>>>> we could get the order from old template. We'd prefer not to have a
> >>>>> graph table.
> >>>>>
> >>>>> In theory you could get it by keeping old templates around. But that
> >>>>> means keeping a lot of templates, and it will be hard to keep track
> >>>>> of when you want to delete them. It also means that when starting an
> >>>>> update you'll need to load every existing previous version of the
> >>>>> template in order to calculate the dependencies. It also leaves the
> >>>>> dependencies in an ambiguous state when a resource fails, and
> >>>>> although that can be worked around it will be a giant pain to implement.
> >>>>>
> >>>>
> >>>> Agree that looking to all templates for a delete is not good. But
> >>>> baring Complexity, we feel we could achieve it by way of having an
> >>>> update and a delete stream for a stack update operation. I will
> >>>> elaborate in detail in the etherpad sometime tomorrow :)
> >>>>
> >>>>> I agree that I'd prefer not to have a graph table. After trying a
> >>>>> couple of different things I decided to store the dependencies in the
> >>>>> Resource table, where we can read or write them virtually for free
> >>>>> because it turns out that we are always reading or updating the
> >>>>> Resource itself at exactly the same time anyway.
> >>>>>
> >>>>
> >>>> Not sure how this will work in an update scenario when a resource does
> >>>> not change and its dependencies do.
> >>>
> >>> We'll always update the requirements, even when the properties don't
> >>> change.
> >>>
> >>
> >> Can you elaborate a bit on rollback.
> > 
> > I didn't do anything special to handle rollback. It's possible that we 
> > need to - obviously the difference in the UpdateReplace + rollback case 
> > is that the replaced resource is now the one we want to keep, and yet 
> > the replaced_by/replaces dependency will force the newer (replacement) 
> > resource to be checked for deletion first, which is an inversion of the 
> > usual order.
> > 
> 
> This is where the version is so handy! For UpdateReplaced ones, there is
> an older version to go back to. This version could just be template ID,
> as I mentioned in another e-mail. All resources are at the current
> template ID if they are found in the current template, even if they is
> no need to update them. Otherwise, they need to be cleaned-up in the
> order given in the previous templates.
> 
> I think the template ID is used as version as far as I can see in Zane's
> PoC. If the resource template key doesn't match the current template
> key, the resource is deleted. The version is misnomer here, but that
> field (template id) is used as though we had versions of resources.
> 
> > However, I tried to think of a scenario where that would cause problems 
> > and I couldn't come up with one. Provided we know the actual, real-world 
> > dependencies of each resource I don't think the ordering of those two 
> > checks matters.
> > 
> > In fact, I currently can't think of a case where the dependency order 
> > between replacement and replaced resources matters at all. It matters in 
> > the current Heat implementation because resources are artificially 
> > segmented into the current and backup stacks, but with a holistic view 
> > of dependencies that may well not be required. I tried taking that line 
> > out of the simulator code and all the tests still passed. If anybody can 
> > think of a scenario in which it would make a difference, I would be

Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

2014-12-15 Thread Anant Patil
On 13-Dec-14 05:42, Zane Bitter wrote:
> On 12/12/14 05:29, Murugan, Visnusaran wrote:
>>
>>
>>> -Original Message-
>>> From: Zane Bitter [mailto:zbit...@redhat.com]
>>> Sent: Friday, December 12, 2014 6:37 AM
>>> To: openstack-dev@lists.openstack.org
>>> Subject: Re: [openstack-dev] [Heat] Convergence proof-of-concept
>>> showdown
>>>
>>> On 11/12/14 08:26, Murugan, Visnusaran wrote:
>>>>>> [Murugan, Visnusaran]
>>>>>> In case of rollback where we have to cleanup earlier version of
>>>>>> resources,
>>>>> we could get the order from old template. We'd prefer not to have a
>>>>> graph table.
>>>>>
>>>>> In theory you could get it by keeping old templates around. But that
>>>>> means keeping a lot of templates, and it will be hard to keep track
>>>>> of when you want to delete them. It also means that when starting an
>>>>> update you'll need to load every existing previous version of the
>>>>> template in order to calculate the dependencies. It also leaves the
>>>>> dependencies in an ambiguous state when a resource fails, and
>>>>> although that can be worked around it will be a giant pain to implement.
>>>>>
>>>>
>>>> Agree that looking to all templates for a delete is not good. But
>>>> baring Complexity, we feel we could achieve it by way of having an
>>>> update and a delete stream for a stack update operation. I will
>>>> elaborate in detail in the etherpad sometime tomorrow :)
>>>>
>>>>> I agree that I'd prefer not to have a graph table. After trying a
>>>>> couple of different things I decided to store the dependencies in the
>>>>> Resource table, where we can read or write them virtually for free
>>>>> because it turns out that we are always reading or updating the
>>>>> Resource itself at exactly the same time anyway.
>>>>>
>>>>
>>>> Not sure how this will work in an update scenario when a resource does
>>>> not change and its dependencies do.
>>>
>>> We'll always update the requirements, even when the properties don't
>>> change.
>>>
>>
>> Can you elaborate a bit on rollback.
> 
> I didn't do anything special to handle rollback. It's possible that we 
> need to - obviously the difference in the UpdateReplace + rollback case 
> is that the replaced resource is now the one we want to keep, and yet 
> the replaced_by/replaces dependency will force the newer (replacement) 
> resource to be checked for deletion first, which is an inversion of the 
> usual order.
> 

This is where the version is so handy! For UpdateReplaced ones, there is
an older version to go back to. This version could just be template ID,
as I mentioned in another e-mail. All resources are at the current
template ID if they are found in the current template, even if they is
no need to update them. Otherwise, they need to be cleaned-up in the
order given in the previous templates.

I think the template ID is used as version as far as I can see in Zane's
PoC. If the resource template key doesn't match the current template
key, the resource is deleted. The version is misnomer here, but that
field (template id) is used as though we had versions of resources.

> However, I tried to think of a scenario where that would cause problems 
> and I couldn't come up with one. Provided we know the actual, real-world 
> dependencies of each resource I don't think the ordering of those two 
> checks matters.
> 
> In fact, I currently can't think of a case where the dependency order 
> between replacement and replaced resources matters at all. It matters in 
> the current Heat implementation because resources are artificially 
> segmented into the current and backup stacks, but with a holistic view 
> of dependencies that may well not be required. I tried taking that line 
> out of the simulator code and all the tests still passed. If anybody can 
> think of a scenario in which it would make a difference, I would be very 
> interested to hear it.
> 
> In any event though, it should be no problem to reverse the direction of 
> that one edge in these particular circumstances if it does turn out to 
> be a problem.
> 
>> We had an approach with depends_on
>> and needed_by columns in ResourceTable. But dropped it when we figured out
>> we had too many DB operations for Update.
> 
> Yeah, I initially ran into this problem too - you have a bun

Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

2014-12-15 Thread Anant Patil
On 12-Dec-14 06:29, Zane Bitter wrote:
> On 11/12/14 01:14, Anant Patil wrote:
>> On 04-Dec-14 10:49, Zane Bitter wrote:
>>> On 01/12/14 02:02, Anant Patil wrote:
 On GitHub:https://github.com/anantpatil/heat-convergence-poc
>>>
>>> I'm trying to review this code at the moment, and finding some stuff I
>>> don't understand:
>>>
>>> https://github.com/anantpatil/heat-convergence-poc/blob/master/heat/engine/stack.py#L911-L916
>>>
>>> This appears to loop through all of the resources *prior* to kicking off
>>> any actual updates to check if the resource will change. This is
>>> impossible to do in general, since a resource may obtain a property
>>> value from an attribute of another resource and there is no way to know
>>> whether an update to said other resource would cause a change in the
>>> attribute value.
>>>
>>> In addition, no attempt to catch UpdateReplace is made. Although that
>>> looks like a simple fix, I'm now worried about the level to which this
>>> code has been tested.
>>>
>> We were working on new branch and as we discussed on Skype, we have
>> handled all these cases. Please have a look at our current branch:
>> https://github.com/anantpatil/heat-convergence-poc/tree/graph-version
>>
>> When a new resource is taken for convergence, its children are loaded
>> and the resource definition is re-parsed. The frozen resource definition
>> will have all the "get_attr" resolved.
>>
>>>
>>> I'm also trying to wrap my head around how resources are cleaned up in
>>> dependency order. If I understand correctly, you store in the
>>> ResourceGraph table the dependencies between various resource names in
>>> the current template (presumably there could also be some left around
>>> from previous templates too?). For each resource name there may be a
>>> number of rows in the Resource table, each with an incrementing version.
>>> As far as I can tell though, there's nowhere that the dependency graph
>>> for _previous_ templates is persisted? So if the dependency order
>>> changes in the template we have no way of knowing the correct order to
>>> clean up in any more? (There's not even a mechanism to associate a
>>> resource version with a particular template, which might be one avenue
>>> by which to recover the dependencies.)
>>>
>>> I think this is an important case we need to be able to handle, so I
>>> added a scenario to my test framework to exercise it and discovered that
>>> my implementation was also buggy. Here's the fix:
>>> https://github.com/zaneb/heat-convergence-prototype/commit/786f367210ca0acf9eb22bea78fd9d51941b0e40
>>>
>>
>> Thanks for pointing this out Zane. We too had a buggy implementation for
>> handling inverted dependency. I had a hard look at our algorithm where
>> we were continuously merging the edges from new template into the edges
>> from previous updates. It was an optimized way of traversing the graph
>> in both forward and reverse order with out missing any resources. But,
>> when the dependencies are inverted,  this wouldn't work.
>>
>> We have changed our algorithm. The changes in edges are noted down in
>> DB, only the delta of edges from previous template is calculated and
>> kept. At any given point of time, the graph table has all the edges from
>> current template and delta from previous templates. Each edge has
>> template ID associated with it.
> 
> The thing is, the cleanup dependencies aren't really about the template. 
> The real resources really depend on other real resources. You can't 
> delete a Volume before its VolumeAttachment, not because it says so in 
> the template but because it will fail if you try. The template can give 
> us a rough guide in advance to what those dependencies will be, but if 
> that's all we keep then we are discarding information.
> 
> There may be multiple versions of a resource corresponding to one 
> template version. Even worse, the actual dependencies of a resource 
> change on a smaller time scale than an entire stack update (this is the 
> reason the current implementation updates the template one resource at a 
> time as we go).
> 

Absolutely! The edges from the template are kept only for the reference
purposes. When we have a resource in new template, its template ID will
also be marked to current template. At any point of time, realized
resource will from current template, even if it were found in previous
templates. The template ID "moves" for a resource if it is found.

> Given that our Resource entries in the DB are in 1:1 correspondence with 
> actual resources (we create a new one whenever we need to replace the 
> underlying resource), I found it makes the most conceptual and practical 
> sense to store the requirements in the resource itself, and update them 
> at the time they actually change in the real world (bonus: introduces no 
> new locking issues and no extra DB writes). I settled on this after a 
> legitimate attempt at trying other options, but they didn't work out: 
> https://github.com/zaneb/heat

Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

2014-12-15 Thread Murugan, Visnusaran
Hi Zane,

We have been going through this chain for quite some time now and we still feel 
a disconnect in our understanding. 
Can you put up a etherpad where we can understand your approach. For example: 
for storing resource dependencies,
Are you storing its name, version tuple or just its ID. If I am correct, you 
are updating all resources on update regardless 
of their change which will be inefficient if stack contains a million resource. 
We have similar questions regarding other
areas in your implementation, which we believe if we understand the outline of 
your implementation. It is difficult to get
a hold on your approach just by looking at code. Docs strings / Etherpad will 
help.


About streams, Yes in a million resource stack, the data will be huge, but less 
than template. Also this stream is stored
only In IN_PROGRESS resources. The reason to have entire dependency list to 
reduce DB queries while a stack update.
When you have a singular dependency on each resources similar to your 
implantation, then we will end up loading 
Dependencies one at a time and altering almost all resource's dependency 
regardless of their change.

Regarding a 2 template approach for delete, it is not actually 2 different 
templates. Its just that we have a delete stream
To be taken up post update. (Any post operation will be handled as an update) 
This approach is True when Rollback==True
We can always fall back to regular stream (non-delete stream) if Rollback=False

In our view we would like to have only one basic operation and that is UPDATE.

1. CREATE will be an update where a realized graph == Empty
2. UPDATE will be an update where realized graph == Full/Partial realized 
(possibly with a delete stream as post operation if Rollback==True)
3. DELETE will be just another update with an empty to_be_realized_graph.

It would be great if we can freeze a stable approach by mid-week as Christmas 
vacations are round the corner.  :) :)

> -Original Message-
> From: Zane Bitter [mailto:zbit...@redhat.com]
> Sent: Saturday, December 13, 2014 5:43 AM
> To: openstack-dev@lists.openstack.org
> Subject: Re: [openstack-dev] [Heat] Convergence proof-of-concept
> showdown
> 
> On 12/12/14 05:29, Murugan, Visnusaran wrote:
> >
> >
> >> -Original Message-
> >> From: Zane Bitter [mailto:zbit...@redhat.com]
> >> Sent: Friday, December 12, 2014 6:37 AM
> >> To: openstack-dev@lists.openstack.org
> >> Subject: Re: [openstack-dev] [Heat] Convergence proof-of-concept
> >> showdown
> >>
> >> On 11/12/14 08:26, Murugan, Visnusaran wrote:
> >>>>> [Murugan, Visnusaran]
> >>>>> In case of rollback where we have to cleanup earlier version of
> >>>>> resources,
> >>>> we could get the order from old template. We'd prefer not to have a
> >>>> graph table.
> >>>>
> >>>> In theory you could get it by keeping old templates around. But
> >>>> that means keeping a lot of templates, and it will be hard to keep
> >>>> track of when you want to delete them. It also means that when
> >>>> starting an update you'll need to load every existing previous
> >>>> version of the template in order to calculate the dependencies. It
> >>>> also leaves the dependencies in an ambiguous state when a resource
> >>>> fails, and although that can be worked around it will be a giant pain to
> implement.
> >>>>
> >>>
> >>> Agree that looking to all templates for a delete is not good. But
> >>> baring Complexity, we feel we could achieve it by way of having an
> >>> update and a delete stream for a stack update operation. I will
> >>> elaborate in detail in the etherpad sometime tomorrow :)
> >>>
> >>>> I agree that I'd prefer not to have a graph table. After trying a
> >>>> couple of different things I decided to store the dependencies in
> >>>> the Resource table, where we can read or write them virtually for
> >>>> free because it turns out that we are always reading or updating
> >>>> the Resource itself at exactly the same time anyway.
> >>>>
> >>>
> >>> Not sure how this will work in an update scenario when a resource
> >>> does not change and its dependencies do.
> >>
> >> We'll always update the requirements, even when the properties don't
> >> change.
> >>
> >
> > Can you elaborate a bit on rollback.
> 
> I didn't do anything special to handle rollback. It's possible that we need 
> to -
> obviously the difference in t

Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

2014-12-12 Thread Zane Bitter

On 12/12/14 05:29, Murugan, Visnusaran wrote:




-Original Message-
From: Zane Bitter [mailto:zbit...@redhat.com]
Sent: Friday, December 12, 2014 6:37 AM
To: openstack-dev@lists.openstack.org
Subject: Re: [openstack-dev] [Heat] Convergence proof-of-concept
showdown

On 11/12/14 08:26, Murugan, Visnusaran wrote:

[Murugan, Visnusaran]
In case of rollback where we have to cleanup earlier version of
resources,

we could get the order from old template. We'd prefer not to have a
graph table.

In theory you could get it by keeping old templates around. But that
means keeping a lot of templates, and it will be hard to keep track
of when you want to delete them. It also means that when starting an
update you'll need to load every existing previous version of the
template in order to calculate the dependencies. It also leaves the
dependencies in an ambiguous state when a resource fails, and
although that can be worked around it will be a giant pain to implement.



Agree that looking to all templates for a delete is not good. But
baring Complexity, we feel we could achieve it by way of having an
update and a delete stream for a stack update operation. I will
elaborate in detail in the etherpad sometime tomorrow :)


I agree that I'd prefer not to have a graph table. After trying a
couple of different things I decided to store the dependencies in the
Resource table, where we can read or write them virtually for free
because it turns out that we are always reading or updating the
Resource itself at exactly the same time anyway.



Not sure how this will work in an update scenario when a resource does
not change and its dependencies do.


We'll always update the requirements, even when the properties don't
change.



Can you elaborate a bit on rollback.


I didn't do anything special to handle rollback. It's possible that we 
need to - obviously the difference in the UpdateReplace + rollback case 
is that the replaced resource is now the one we want to keep, and yet 
the replaced_by/replaces dependency will force the newer (replacement) 
resource to be checked for deletion first, which is an inversion of the 
usual order.


However, I tried to think of a scenario where that would cause problems 
and I couldn't come up with one. Provided we know the actual, real-world 
dependencies of each resource I don't think the ordering of those two 
checks matters.


In fact, I currently can't think of a case where the dependency order 
between replacement and replaced resources matters at all. It matters in 
the current Heat implementation because resources are artificially 
segmented into the current and backup stacks, but with a holistic view 
of dependencies that may well not be required. I tried taking that line 
out of the simulator code and all the tests still passed. If anybody can 
think of a scenario in which it would make a difference, I would be very 
interested to hear it.


In any event though, it should be no problem to reverse the direction of 
that one edge in these particular circumstances if it does turn out to 
be a problem.



We had an approach with depends_on
and needed_by columns in ResourceTable. But dropped it when we figured out
we had too many DB operations for Update.


Yeah, I initially ran into this problem too - you have a bunch of nodes 
that are waiting on the current node, and now you have to go look them 
all up in the database to see what else they're waiting on in order to 
tell if they're ready to be triggered.


It turns out the answer is to distribute the writes but centralise the 
reads. So at the start of the update, we read all of the Resources, 
obtain their dependencies and build one central graph[1]. We than make 
that graph available to each resource (either by passing it as a 
notification parameter, or storing it somewhere central in the DB that 
they will all have to read anyway, i.e. the Stack). But when we update a 
dependency we don't update the central graph, we update the individual 
Resource so there's no global lock required.


[1] 
https://github.com/zaneb/heat-convergence-prototype/blob/distributed-graph/converge/stack.py#L166-L168



Also taking care of deleting resources in order will be an issue.


It works fine.


This implies that there will be different versions of a resource which
will even complicate further.


No it doesn't, other than the different versions we already have due to
UpdateReplace.


This approach reduces DB queries by waiting for completion
notification

on a topic. The drawback I see is that delete stack stream will be
huge as it will have the entire graph. We can always dump such data
in ResourceLock.data Json and pass a simple flag
"load_stream_from_db" to converge RPC call as a workaround for delete

operation.


This seems to be essentially equivalent to my 'SyncPoint'
proposal[1], with

the key difference that the data is stored in-memo

Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

2014-12-12 Thread Murugan, Visnusaran
Hi zaneb,

Etherpad updated. 

https://etherpad.openstack.org/p/execution-stream-and-aggregator-based-convergence

> -Original Message-
> From: Murugan, Visnusaran
> Sent: Friday, December 12, 2014 4:00 PM
> To: OpenStack Development Mailing List (not for usage questions)
> Subject: Re: [openstack-dev] [Heat] Convergence proof-of-concept
> showdown
> 
> 
> 
> > -Original Message-
> > From: Zane Bitter [mailto:zbit...@redhat.com]
> > Sent: Friday, December 12, 2014 6:37 AM
> > To: openstack-dev@lists.openstack.org
> > Subject: Re: [openstack-dev] [Heat] Convergence proof-of-concept
> > showdown
> >
> > On 11/12/14 08:26, Murugan, Visnusaran wrote:
> > >>> [Murugan, Visnusaran]
> > >>> In case of rollback where we have to cleanup earlier version of
> > >>> resources,
> > >> we could get the order from old template. We'd prefer not to have a
> > >> graph table.
> > >>
> > >> In theory you could get it by keeping old templates around. But
> > >> that means keeping a lot of templates, and it will be hard to keep
> > >> track of when you want to delete them. It also means that when
> > >> starting an update you'll need to load every existing previous
> > >> version of the template in order to calculate the dependencies. It
> > >> also leaves the dependencies in an ambiguous state when a resource
> > >> fails, and although that can be worked around it will be a giant pain to
> implement.
> > >>
> > >
> > > Agree that looking to all templates for a delete is not good. But
> > > baring Complexity, we feel we could achieve it by way of having an
> > > update and a delete stream for a stack update operation. I will
> > > elaborate in detail in the etherpad sometime tomorrow :)
> > >
> > >> I agree that I'd prefer not to have a graph table. After trying a
> > >> couple of different things I decided to store the dependencies in
> > >> the Resource table, where we can read or write them virtually for
> > >> free because it turns out that we are always reading or updating
> > >> the Resource itself at exactly the same time anyway.
> > >>
> > >
> > > Not sure how this will work in an update scenario when a resource
> > > does not change and its dependencies do.
> >
> > We'll always update the requirements, even when the properties don't
> > change.
> >
> 
> Can you elaborate a bit on rollback.  We had an approach with depends_on
> and needed_by columns in ResourceTable. But dropped it when we figured
> out we had too many DB operations for Update.
> 
> > > Also taking care of deleting resources in order will be an issue.
> >
> > It works fine.
> >
> > > This implies that there will be different versions of a resource
> > > which will even complicate further.
> >
> > No it doesn't, other than the different versions we already have due
> > to UpdateReplace.
> >
> > >>>> This approach reduces DB queries by waiting for completion
> > >>>> notification
> > >> on a topic. The drawback I see is that delete stack stream will be
> > >> huge as it will have the entire graph. We can always dump such data
> > >> in ResourceLock.data Json and pass a simple flag
> > >> "load_stream_from_db" to converge RPC call as a workaround for
> > >> delete
> > operation.
> > >>>
> > >>> This seems to be essentially equivalent to my 'SyncPoint'
> > >>> proposal[1], with
> > >> the key difference that the data is stored in-memory in a Heat
> > >> engine rather than the database.
> > >>>
> > >>> I suspect it's probably a mistake to move it in-memory for similar
> > >>> reasons to the argument Clint made against synchronising the
> > >>> marking off
> > >> of dependencies in-memory. The database can handle that and the
> > >> problem of making the DB robust against failures of a single
> > >> machine has already been solved by someone else. If we do it
> > >> in-memory we are just creating a single point of failure for not
> > >> much gain. (I guess you could argue it doesn't matter, since if any
> > >> Heat engine dies during the traversal then we'll have to kick off
> > >> another one anyway, but it does limit our options if that changes
> > >

Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

2014-12-12 Thread Murugan, Visnusaran


> -Original Message-
> From: Zane Bitter [mailto:zbit...@redhat.com]
> Sent: Friday, December 12, 2014 6:37 AM
> To: openstack-dev@lists.openstack.org
> Subject: Re: [openstack-dev] [Heat] Convergence proof-of-concept
> showdown
> 
> On 11/12/14 08:26, Murugan, Visnusaran wrote:
> >>> [Murugan, Visnusaran]
> >>> In case of rollback where we have to cleanup earlier version of
> >>> resources,
> >> we could get the order from old template. We'd prefer not to have a
> >> graph table.
> >>
> >> In theory you could get it by keeping old templates around. But that
> >> means keeping a lot of templates, and it will be hard to keep track
> >> of when you want to delete them. It also means that when starting an
> >> update you'll need to load every existing previous version of the
> >> template in order to calculate the dependencies. It also leaves the
> >> dependencies in an ambiguous state when a resource fails, and
> >> although that can be worked around it will be a giant pain to implement.
> >>
> >
> > Agree that looking to all templates for a delete is not good. But
> > baring Complexity, we feel we could achieve it by way of having an
> > update and a delete stream for a stack update operation. I will
> > elaborate in detail in the etherpad sometime tomorrow :)
> >
> >> I agree that I'd prefer not to have a graph table. After trying a
> >> couple of different things I decided to store the dependencies in the
> >> Resource table, where we can read or write them virtually for free
> >> because it turns out that we are always reading or updating the
> >> Resource itself at exactly the same time anyway.
> >>
> >
> > Not sure how this will work in an update scenario when a resource does
> > not change and its dependencies do.
> 
> We'll always update the requirements, even when the properties don't
> change.
> 

Can you elaborate a bit on rollback.  We had an approach with depends_on
and needed_by columns in ResourceTable. But dropped it when we figured out
we had too many DB operations for Update.

> > Also taking care of deleting resources in order will be an issue.
> 
> It works fine.
> 
> > This implies that there will be different versions of a resource which
> > will even complicate further.
> 
> No it doesn't, other than the different versions we already have due to
> UpdateReplace.
> 
> >>>> This approach reduces DB queries by waiting for completion
> >>>> notification
> >> on a topic. The drawback I see is that delete stack stream will be
> >> huge as it will have the entire graph. We can always dump such data
> >> in ResourceLock.data Json and pass a simple flag
> >> "load_stream_from_db" to converge RPC call as a workaround for delete
> operation.
> >>>
> >>> This seems to be essentially equivalent to my 'SyncPoint'
> >>> proposal[1], with
> >> the key difference that the data is stored in-memory in a Heat engine
> >> rather than the database.
> >>>
> >>> I suspect it's probably a mistake to move it in-memory for similar
> >>> reasons to the argument Clint made against synchronising the marking
> >>> off
> >> of dependencies in-memory. The database can handle that and the
> >> problem of making the DB robust against failures of a single machine
> >> has already been solved by someone else. If we do it in-memory we are
> >> just creating a single point of failure for not much gain. (I guess
> >> you could argue it doesn't matter, since if any Heat engine dies
> >> during the traversal then we'll have to kick off another one anyway,
> >> but it does limit our options if that changes in the
> >> future.) [Murugan, Visnusaran] Resource completes, removes itself
> >> from resource_lock and notifies engine. Engine will acquire parent
> >> lock and initiate parent only if all its children are satisfied (no child 
> >> entry in
> resource_lock).
> >> This will come in place of Aggregator.
> >>
> >> Yep, if you s/resource_lock/SyncPoint/ that's more or less exactly what I
> did.
> >> The three differences I can see are:
> >>
> >> 1) I think you are proposing to create all of the sync points at the
> >> start of the traversal, rather than on an as-needed basis. This is
> >> probably a good idea. I didn't consider it because of the way my
> >> prototype evolved, 

Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

2014-12-11 Thread Zane Bitter

On 11/12/14 08:26, Murugan, Visnusaran wrote:

[Murugan, Visnusaran]
In case of rollback where we have to cleanup earlier version of resources,

we could get the order from old template. We'd prefer not to have a graph
table.

In theory you could get it by keeping old templates around. But that means
keeping a lot of templates, and it will be hard to keep track of when you want
to delete them. It also means that when starting an update you'll need to
load every existing previous version of the template in order to calculate the
dependencies. It also leaves the dependencies in an ambiguous state when a
resource fails, and although that can be worked around it will be a giant pain
to implement.



Agree that looking to all templates for a delete is not good. But baring
Complexity, we feel we could achieve it by way of having an update and a
delete stream for a stack update operation. I will elaborate in detail in the
etherpad sometime tomorrow :)


I agree that I'd prefer not to have a graph table. After trying a couple of
different things I decided to store the dependencies in the Resource table,
where we can read or write them virtually for free because it turns out that
we are always reading or updating the Resource itself at exactly the same
time anyway.



Not sure how this will work in an update scenario when a resource does not 
change
and its dependencies do.


We'll always update the requirements, even when the properties don't change.


Also taking care of deleting resources in order will
be an issue.


It works fine.


This implies that there will be different versions of a resource which
will even complicate further.


No it doesn't, other than the different versions we already have due to 
UpdateReplace.



This approach reduces DB queries by waiting for completion notification

on a topic. The drawback I see is that delete stack stream will be huge as it
will have the entire graph. We can always dump such data in
ResourceLock.data Json and pass a simple flag "load_stream_from_db" to
converge RPC call as a workaround for delete operation.


This seems to be essentially equivalent to my 'SyncPoint' proposal[1], with

the key difference that the data is stored in-memory in a Heat engine rather
than the database.


I suspect it's probably a mistake to move it in-memory for similar
reasons to the argument Clint made against synchronising the marking off

of dependencies in-memory. The database can handle that and the problem
of making the DB robust against failures of a single machine has already been
solved by someone else. If we do it in-memory we are just creating a single
point of failure for not much gain. (I guess you could argue it doesn't matter,
since if any Heat engine dies during the traversal then we'll have to kick off
another one anyway, but it does limit our options if that changes in the
future.) [Murugan, Visnusaran] Resource completes, removes itself from
resource_lock and notifies engine. Engine will acquire parent lock and initiate
parent only if all its children are satisfied (no child entry in resource_lock).
This will come in place of Aggregator.

Yep, if you s/resource_lock/SyncPoint/ that's more or less exactly what I did.
The three differences I can see are:

1) I think you are proposing to create all of the sync points at the start of 
the
traversal, rather than on an as-needed basis. This is probably a good idea. I
didn't consider it because of the way my prototype evolved, but there's now
no reason I can see not to do this.
If we could move the data to the Resource table itself then we could even
get it for free from an efficiency point of view.


+1. But we will need engine_id to be stored somewhere for recovery purpose 
(easy to be queried format).


Yeah, so I'm starting to think you're right, maybe the/a Lock table is 
the right thing to use there. We could probably do it within the 
resource table using the same select-for-update to set the engine_id, 
but I agree that we might be starting to jam too much into that one table.



Sync points are created as-needed. Single resource is enough to restart that 
entire stream.
I think there is a disconnect in our understanding. I will detail it as well in 
the etherpad.


OK, that would be good.


2) You're using a single list from which items are removed, rather than two
lists (one static, and one to which items are added) that get compared.
Assuming (1) then this is probably a good idea too.


Yeah. We have a single list per active stream which work by removing
Complete/satisfied resources from it.


I went to change this and then remembered why I did it this way: the 
sync point is also storing data about the resources that are triggering 
it. Part of this is the RefID and attributes, and we could replace that 
by storing that data in the Resource itself and querying it rather than 
having it passed in via the notification. But the other part is the 
ID/key of those resources, which we _need_ to know in order to update

Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

2014-12-11 Thread Zane Bitter

On 11/12/14 01:14, Anant Patil wrote:

On 04-Dec-14 10:49, Zane Bitter wrote:

On 01/12/14 02:02, Anant Patil wrote:

On GitHub:https://github.com/anantpatil/heat-convergence-poc


I'm trying to review this code at the moment, and finding some stuff I
don't understand:

https://github.com/anantpatil/heat-convergence-poc/blob/master/heat/engine/stack.py#L911-L916

This appears to loop through all of the resources *prior* to kicking off
any actual updates to check if the resource will change. This is
impossible to do in general, since a resource may obtain a property
value from an attribute of another resource and there is no way to know
whether an update to said other resource would cause a change in the
attribute value.

In addition, no attempt to catch UpdateReplace is made. Although that
looks like a simple fix, I'm now worried about the level to which this
code has been tested.


We were working on new branch and as we discussed on Skype, we have
handled all these cases. Please have a look at our current branch:
https://github.com/anantpatil/heat-convergence-poc/tree/graph-version

When a new resource is taken for convergence, its children are loaded
and the resource definition is re-parsed. The frozen resource definition
will have all the "get_attr" resolved.



I'm also trying to wrap my head around how resources are cleaned up in
dependency order. If I understand correctly, you store in the
ResourceGraph table the dependencies between various resource names in
the current template (presumably there could also be some left around
from previous templates too?). For each resource name there may be a
number of rows in the Resource table, each with an incrementing version.
As far as I can tell though, there's nowhere that the dependency graph
for _previous_ templates is persisted? So if the dependency order
changes in the template we have no way of knowing the correct order to
clean up in any more? (There's not even a mechanism to associate a
resource version with a particular template, which might be one avenue
by which to recover the dependencies.)

I think this is an important case we need to be able to handle, so I
added a scenario to my test framework to exercise it and discovered that
my implementation was also buggy. Here's the fix:
https://github.com/zaneb/heat-convergence-prototype/commit/786f367210ca0acf9eb22bea78fd9d51941b0e40



Thanks for pointing this out Zane. We too had a buggy implementation for
handling inverted dependency. I had a hard look at our algorithm where
we were continuously merging the edges from new template into the edges
from previous updates. It was an optimized way of traversing the graph
in both forward and reverse order with out missing any resources. But,
when the dependencies are inverted,  this wouldn't work.

We have changed our algorithm. The changes in edges are noted down in
DB, only the delta of edges from previous template is calculated and
kept. At any given point of time, the graph table has all the edges from
current template and delta from previous templates. Each edge has
template ID associated with it.


The thing is, the cleanup dependencies aren't really about the template. 
The real resources really depend on other real resources. You can't 
delete a Volume before its VolumeAttachment, not because it says so in 
the template but because it will fail if you try. The template can give 
us a rough guide in advance to what those dependencies will be, but if 
that's all we keep then we are discarding information.


There may be multiple versions of a resource corresponding to one 
template version. Even worse, the actual dependencies of a resource 
change on a smaller time scale than an entire stack update (this is the 
reason the current implementation updates the template one resource at a 
time as we go).


Given that our Resource entries in the DB are in 1:1 correspondence with 
actual resources (we create a new one whenever we need to replace the 
underlying resource), I found it makes the most conceptual and practical 
sense to store the requirements in the resource itself, and update them 
at the time they actually change in the real world (bonus: introduces no 
new locking issues and no extra DB writes). I settled on this after a 
legitimate attempt at trying other options, but they didn't work out: 
https://github.com/zaneb/heat-convergence-prototype/commit/a62958342e8583f74e2aca90f6239ad457ba984d



For resource clean up, we start from the
first template (template which was completed and updates were made on
top of it, empty template otherwise), and move towards the current
template in the order in which the updates were issued, and for each
template the graph (edges if found for the template) is traversed in
reverse order and resources are cleaned-up.


I'm pretty sure this is backwards - you'll need to clean up newer 
resources first because they may reference resources from older 
templates. Also if you have a stubborn old resource that won't 

Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

2014-12-11 Thread Murugan, Visnusaran


> -Original Message-
> From: Zane Bitter [mailto:zbit...@redhat.com]
> Sent: Wednesday, December 10, 2014 11:17 PM
> To: openstack-dev@lists.openstack.org
> Subject: Re: [openstack-dev] [Heat] Convergence proof-of-concept
> showdown
> 
> You really need to get a real email client with quoting support ;)
Apologies :) I screwed up my mail client's configuration.

> 
> On 10/12/14 06:42, Murugan, Visnusaran wrote:
> > Well, we still have to persist the dependencies of each version of a
> resource _somehow_, because otherwise we can't know how to clean them
> up in the correct order. But what I think you meant to say is that this
> approach doesn't require it to be persisted in a separate table where the
> rows are marked as traversed as we work through the graph.
> >
> > [Murugan, Visnusaran]
> > In case of rollback where we have to cleanup earlier version of resources,
> we could get the order from old template. We'd prefer not to have a graph
> table.
> 
> In theory you could get it by keeping old templates around. But that means
> keeping a lot of templates, and it will be hard to keep track of when you want
> to delete them. It also means that when starting an update you'll need to
> load every existing previous version of the template in order to calculate the
> dependencies. It also leaves the dependencies in an ambiguous state when a
> resource fails, and although that can be worked around it will be a giant pain
> to implement.
> 

Agree that looking to all templates for a delete is not good. But baring 
Complexity, we feel we could achieve it by way of having an update and a 
delete stream for a stack update operation. I will elaborate in detail in the
etherpad sometime tomorrow :)

> I agree that I'd prefer not to have a graph table. After trying a couple of
> different things I decided to store the dependencies in the Resource table,
> where we can read or write them virtually for free because it turns out that
> we are always reading or updating the Resource itself at exactly the same
> time anyway.
> 

Not sure how this will work in an update scenario when a resource does not 
change
and its dependencies do. Also taking care of deleting resources in order will
be an issue. This implies that there will be different versions of a resource 
which 
will even complicate further.

> >> This approach reduces DB queries by waiting for completion notification
> on a topic. The drawback I see is that delete stack stream will be huge as it
> will have the entire graph. We can always dump such data in
> ResourceLock.data Json and pass a simple flag "load_stream_from_db" to
> converge RPC call as a workaround for delete operation.
> >
> > This seems to be essentially equivalent to my 'SyncPoint' proposal[1], with
> the key difference that the data is stored in-memory in a Heat engine rather
> than the database.
> >
> > I suspect it's probably a mistake to move it in-memory for similar
> > reasons to the argument Clint made against synchronising the marking off
> of dependencies in-memory. The database can handle that and the problem
> of making the DB robust against failures of a single machine has already been
> solved by someone else. If we do it in-memory we are just creating a single
> point of failure for not much gain. (I guess you could argue it doesn't 
> matter,
> since if any Heat engine dies during the traversal then we'll have to kick off
> another one anyway, but it does limit our options if that changes in the
> future.) [Murugan, Visnusaran] Resource completes, removes itself from
> resource_lock and notifies engine. Engine will acquire parent lock and 
> initiate
> parent only if all its children are satisfied (no child entry in 
> resource_lock).
> This will come in place of Aggregator.
> 
> Yep, if you s/resource_lock/SyncPoint/ that's more or less exactly what I did.
> The three differences I can see are:
> 
> 1) I think you are proposing to create all of the sync points at the start of 
> the
> traversal, rather than on an as-needed basis. This is probably a good idea. I
> didn't consider it because of the way my prototype evolved, but there's now
> no reason I can see not to do this.
> If we could move the data to the Resource table itself then we could even
> get it for free from an efficiency point of view.

+1. But we will need engine_id to be stored somewhere for recovery purpose 
(easy to be queried format).
Sync points are created as-needed. Single resource is enough to restart that 
entire stream.
I think there is a disconnect in our understanding. I will detail it as well in 
the etherpad.

> 2) You're using a single list from which items 

Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

2014-12-10 Thread Anant Patil
On 04-Dec-14 10:49, Zane Bitter wrote:
> On 01/12/14 02:02, Anant Patil wrote:
>> On GitHub:https://github.com/anantpatil/heat-convergence-poc
> 
> I'm trying to review this code at the moment, and finding some stuff I 
> don't understand:
> 
> https://github.com/anantpatil/heat-convergence-poc/blob/master/heat/engine/stack.py#L911-L916
> 
> This appears to loop through all of the resources *prior* to kicking off 
> any actual updates to check if the resource will change. This is 
> impossible to do in general, since a resource may obtain a property 
> value from an attribute of another resource and there is no way to know 
> whether an update to said other resource would cause a change in the 
> attribute value.
> 
> In addition, no attempt to catch UpdateReplace is made. Although that 
> looks like a simple fix, I'm now worried about the level to which this 
> code has been tested.
> 
We were working on new branch and as we discussed on Skype, we have
handled all these cases. Please have a look at our current branch:
https://github.com/anantpatil/heat-convergence-poc/tree/graph-version

When a new resource is taken for convergence, its children are loaded
and the resource definition is re-parsed. The frozen resource definition
will have all the "get_attr" resolved.

> 
> I'm also trying to wrap my head around how resources are cleaned up in 
> dependency order. If I understand correctly, you store in the 
> ResourceGraph table the dependencies between various resource names in 
> the current template (presumably there could also be some left around 
> from previous templates too?). For each resource name there may be a 
> number of rows in the Resource table, each with an incrementing version. 
> As far as I can tell though, there's nowhere that the dependency graph 
> for _previous_ templates is persisted? So if the dependency order 
> changes in the template we have no way of knowing the correct order to 
> clean up in any more? (There's not even a mechanism to associate a 
> resource version with a particular template, which might be one avenue 
> by which to recover the dependencies.)
> 
> I think this is an important case we need to be able to handle, so I 
> added a scenario to my test framework to exercise it and discovered that 
> my implementation was also buggy. Here's the fix: 
> https://github.com/zaneb/heat-convergence-prototype/commit/786f367210ca0acf9eb22bea78fd9d51941b0e40
> 

Thanks for pointing this out Zane. We too had a buggy implementation for
handling inverted dependency. I had a hard look at our algorithm where
we were continuously merging the edges from new template into the edges
from previous updates. It was an optimized way of traversing the graph
in both forward and reverse order with out missing any resources. But,
when the dependencies are inverted,  this wouldn't work.

We have changed our algorithm. The changes in edges are noted down in
DB, only the delta of edges from previous template is calculated and
kept. At any given point of time, the graph table has all the edges from
current template and delta from previous templates. Each edge has
template ID associated with it. For resource clean up, we start from the
first template (template which was completed and updates were made on
top of it, empty template otherwise), and move towards the current
template in the order in which the updates were issued, and for each
template the graph (edges if found for the template) is traversed in
reverse order and resources are cleaned-up. The process ends up with
current template being traversed in reverse order and resources being
cleaned up. All the update-replaced resources from the older templates
(older updates in concurrent updates) are cleaned up in the order in
which they are suppose to be.

Resources are now tied to template, they have a template_id instead of
version. As we traverse the graph, we know which template we are working
on, and can take the relevant action on resource.

For rollback, another update is issued with the last completed template
(it is designed to have an empty template as last completed template by
default). The current template being worked on becomes predecessor for
the new incoming template. In case of rollback, the last completed
template becomes incoming new template, the current becomes the new
template's predecessor and the successor of last completed template will
have no predecessor. All these changes are available in the
'graph-version' branch. (The branch name is a misnomer though!)

I think it is simpler to think about stack and concurrent updates when
we associate resources and edges with template, and stack with current
template and its predecessors (if any).

I also think that we should decouple Resource from Stack. This is really
a hindrance when workers work on individual resources. The resource
should be abstracted enough from stack for the worker to work on the
resource alone. The worker should load the required resource plug-in and

Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

2014-12-10 Thread Zane Bitter

You really need to get a real email client with quoting support ;)

On 10/12/14 06:42, Murugan, Visnusaran wrote:

Well, we still have to persist the dependencies of each version of a resource 
_somehow_, because otherwise we can't know how to clean them up in the correct 
order. But what I think you meant to say is that this approach doesn't require 
it to be persisted in a separate table where the rows are marked as traversed 
as we work through the graph.

[Murugan, Visnusaran]
In case of rollback where we have to cleanup earlier version of resources, we 
could get the order from old template. We'd prefer not to have a graph table.


In theory you could get it by keeping old templates around. But that 
means keeping a lot of templates, and it will be hard to keep track of 
when you want to delete them. It also means that when starting an update 
you'll need to load every existing previous version of the template in 
order to calculate the dependencies. It also leaves the dependencies in 
an ambiguous state when a resource fails, and although that can be 
worked around it will be a giant pain to implement.


I agree that I'd prefer not to have a graph table. After trying a couple 
of different things I decided to store the dependencies in the Resource 
table, where we can read or write them virtually for free because it 
turns out that we are always reading or updating the Resource itself at 
exactly the same time anyway.



This approach reduces DB queries by waiting for completion notification on a topic. The 
drawback I see is that delete stack stream will be huge as it will have the entire graph. 
We can always dump such data in ResourceLock.data Json and pass a simple flag 
"load_stream_from_db" to converge RPC call as a workaround for delete operation.


This seems to be essentially equivalent to my 'SyncPoint' proposal[1], with the 
key difference that the data is stored in-memory in a Heat engine rather than 
the database.

I suspect it's probably a mistake to move it in-memory for similar reasons to 
the argument Clint made against synchronising the marking off of dependencies 
in-memory. The database can handle that and the problem of making the DB robust 
against failures of a single machine has already been solved by someone else. 
If we do it in-memory we are just creating a single point of failure for not 
much gain. (I guess you could argue it doesn't matter, since if any Heat engine 
dies during the traversal then we'll have to kick off another one anyway, but 
it does limit our options if that changes in the future.)
[Murugan, Visnusaran] Resource completes, removes itself from resource_lock and 
notifies engine. Engine will acquire parent lock and initiate parent only if 
all its children are satisfied (no child entry in resource_lock). This will 
come in place of Aggregator.


Yep, if you s/resource_lock/SyncPoint/ that's more or less exactly what 
I did. The three differences I can see are:


1) I think you are proposing to create all of the sync points at the 
start of the traversal, rather than on an as-needed basis. This is 
probably a good idea. I didn't consider it because of the way my 
prototype evolved, but there's now no reason I can see not to do this. 
If we could move the data to the Resource table itself then we could 
even get it for free from an efficiency point of view.
2) You're using a single list from which items are removed, rather than 
two lists (one static, and one to which items are added) that get 
compared. Assuming (1) then this is probably a good idea too.
3) You're suggesting to notify the engine unconditionally and let the 
engine decide if the list is empty. That's probably not a good idea - 
not only does it require extra reads, it introduces a race condition 
that you then have to solve (it can be solved, it's just more work). 
Since the update to remove a child from the list is atomic, it's best to 
just trigger the engine only if the list is now empty.



It's not clear to me how the 'streams' differ in practical terms from just 
passing a serialisation of the Dependencies object, other than being 
incomprehensible to me ;). The current Dependencies implementation
(1) is a very generic implementation of a DAG, (2) works and has plenty of unit 
tests, (3) has, with I think one exception, a pretty straightforward API, (4) 
has a very simple serialisation, returned by the edges() method, which can be 
passed back into the constructor to recreate it, and (5) has an API that is to 
some extent relied upon by resources, and so won't likely be removed outright 
in any event.
Whatever code we need to handle dependencies ought to just build on this 
existing implementation.
[Murugan, Visnusaran] Our thought was to reduce payload size (template/graph). 
Just planning for worst case scenario (million resource stack) We could always 
dump them in ResourceLock.data to be loaded by Worker.


If there's a smaller representation of a graph than a list of edges then

Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

2014-12-10 Thread Murugan, Visnusaran


-Original Message-
From: Zane Bitter [mailto:zbit...@redhat.com] 
Sent: Tuesday, December 9, 2014 3:50 AM
To: openstack-dev@lists.openstack.org
Subject: Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

On 08/12/14 07:00, Murugan, Visnusaran wrote:
>
> Hi Zane & Michael,
>
> Please have a look @ 
> https://etherpad.openstack.org/p/execution-stream-and-aggregator-based
> -convergence
>
> Updated with a combined approach which does not require persisting graph and 
> backup stack removal.

Well, we still have to persist the dependencies of each version of a resource 
_somehow_, because otherwise we can't know how to clean them up in the correct 
order. But what I think you meant to say is that this approach doesn't require 
it to be persisted in a separate table where the rows are marked as traversed 
as we work through the graph.

[Murugan, Visnusaran] 
In case of rollback where we have to cleanup earlier version of resources, we 
could get the order from old template. We'd prefer not to have a graph table.

> This approach reduces DB queries by waiting for completion notification on a 
> topic. The drawback I see is that delete stack stream will be huge as it will 
> have the entire graph. We can always dump such data in ResourceLock.data Json 
> and pass a simple flag "load_stream_from_db" to converge RPC call as a 
> workaround for delete operation.

This seems to be essentially equivalent to my 'SyncPoint' proposal[1], with the 
key difference that the data is stored in-memory in a Heat engine rather than 
the database.

I suspect it's probably a mistake to move it in-memory for similar reasons to 
the argument Clint made against synchronising the marking off of dependencies 
in-memory. The database can handle that and the problem of making the DB robust 
against failures of a single machine has already been solved by someone else. 
If we do it in-memory we are just creating a single point of failure for not 
much gain. (I guess you could argue it doesn't matter, since if any Heat engine 
dies during the traversal then we'll have to kick off another one anyway, but 
it does limit our options if that changes in the future.)
[Murugan, Visnusaran] Resource completes, removes itself from resource_lock and 
notifies engine. Engine will acquire parent lock and initiate parent only if 
all its children are satisfied (no child entry in resource_lock). This will 
come in place of Aggregator.

It's not clear to me how the 'streams' differ in practical terms from just 
passing a serialisation of the Dependencies object, other than being 
incomprehensible to me ;). The current Dependencies implementation
(1) is a very generic implementation of a DAG, (2) works and has plenty of unit 
tests, (3) has, with I think one exception, a pretty straightforward API, (4) 
has a very simple serialisation, returned by the edges() method, which can be 
passed back into the constructor to recreate it, and (5) has an API that is to 
some extent relied upon by resources, and so won't likely be removed outright 
in any event. 
Whatever code we need to handle dependencies ought to just build on this 
existing implementation.
[Murugan, Visnusaran] Our thought was to reduce payload size (template/graph). 
Just planning for worst case scenario (million resource stack) We could always 
dump them in ResourceLock.data to be loaded by Worker.

I think the difference may be that the streams only include the
*shortest* paths (there will often be more than one) to each resource. i.e.

  A <--- B <--- C
  ^ |
  | |
  +-+

can just be written as:

  A <--- B <--- C

because there's only one order in which that can execute anyway. (If we're 
going to do this though, we should just add a method to the dependencies.Graph 
class to delete redundant edges, not create a whole new data structure.) There 
is a big potential advantage here in that it reduces the theoretical maximum 
number of edges in the graph from O(n^2) to O(n). (Although in practice real 
templates are typically not likely to have such dense graphs.)

There's a downside to this too though: say that A in the above diagram is 
replaced during an update. In that case not only B but also C will need to 
figure out what the latest version of A is. One option here is to pass that 
data along via B, but that will become very messy to implement in a non-trivial 
example. The other would be for C to go search in the database for resources 
with the same name as A and the current traversal_id marked as the latest. But 
that not only creates a concurrency problem we didn't have before (A could have 
been updated with a new traversal_id at some point after C had established that 
the current traversal was still valid but before

Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

2014-12-08 Thread Zane Bitter

On 08/12/14 07:00, Murugan, Visnusaran wrote:


Hi Zane & Michael,

Please have a look @ 
https://etherpad.openstack.org/p/execution-stream-and-aggregator-based-convergence

Updated with a combined approach which does not require persisting graph and 
backup stack removal.


Well, we still have to persist the dependencies of each version of a 
resource _somehow_, because otherwise we can't know how to clean them up 
in the correct order. But what I think you meant to say is that this 
approach doesn't require it to be persisted in a separate table where 
the rows are marked as traversed as we work through the graph.



This approach reduces DB queries by waiting for completion notification on a topic. The 
drawback I see is that delete stack stream will be huge as it will have the entire graph. 
We can always dump such data in ResourceLock.data Json and pass a simple flag 
"load_stream_from_db" to converge RPC call as a workaround for delete operation.


This seems to be essentially equivalent to my 'SyncPoint' proposal[1], 
with the key difference that the data is stored in-memory in a Heat 
engine rather than the database.


I suspect it's probably a mistake to move it in-memory for similar 
reasons to the argument Clint made against synchronising the marking off 
of dependencies in-memory. The database can handle that and the problem 
of making the DB robust against failures of a single machine has already 
been solved by someone else. If we do it in-memory we are just creating 
a single point of failure for not much gain. (I guess you could argue it 
doesn't matter, since if any Heat engine dies during the traversal then 
we'll have to kick off another one anyway, but it does limit our options 
if that changes in the future.)


It's not clear to me how the 'streams' differ in practical terms from 
just passing a serialisation of the Dependencies object, other than 
being incomprehensible to me ;). The current Dependencies implementation 
(1) is a very generic implementation of a DAG, (2) works and has plenty 
of unit tests, (3) has, with I think one exception, a pretty 
straightforward API, (4) has a very simple serialisation, returned by 
the edges() method, which can be passed back into the constructor to 
recreate it, and (5) has an API that is to some extent relied upon by 
resources, and so won't likely be removed outright in any event. 
Whatever code we need to handle dependencies ought to just build on this 
existing implementation.


I think the difference may be that the streams only include the 
*shortest* paths (there will often be more than one) to each resource. i.e.


 A <--- B <--- C
 ^ |
 | |
 +-+

can just be written as:

 A <--- B <--- C

because there's only one order in which that can execute anyway. (If 
we're going to do this though, we should just add a method to the 
dependencies.Graph class to delete redundant edges, not create a whole 
new data structure.) There is a big potential advantage here in that it 
reduces the theoretical maximum number of edges in the graph from O(n^2) 
to O(n). (Although in practice real templates are typically not likely 
to have such dense graphs.)


There's a downside to this too though: say that A in the above diagram 
is replaced during an update. In that case not only B but also C will 
need to figure out what the latest version of A is. One option here is 
to pass that data along via B, but that will become very messy to 
implement in a non-trivial example. The other would be for C to go 
search in the database for resources with the same name as A and the 
current traversal_id marked as the latest. But that not only creates a 
concurrency problem we didn't have before (A could have been updated 
with a new traversal_id at some point after C had established that the 
current traversal was still valid but before it went looking for A), it 
also eliminates all of the performance gains from removing that edge in 
the first place.


[1] 
https://github.com/zaneb/heat-convergence-prototype/blob/distributed-graph/converge/sync_point.py



To Stop current stack operation, we will use your traversal_id based approach.


+1 :)


If in case you feel Aggregator model creates more queues, then we might have to 
poll DB to get resource status. (Which will impact performance adversely :) )


For the reasons given above I would vote for doing this in the DB. I 
agree there will be a performance penalty for doing so, because we'll be 
paying for robustness.



Lock table: name(Unique - Resource_id), stack_id, engine_id, data (Json to 
store stream dict)


Based on our call on Thursday, I think you're taking the idea of the 
Lock table too literally. The point of referring to locks is that we can 
use the same concepts as the Lock table relies on to do atomic updates 
on a particular row of the database, and we can use those atomic updates 
to prevent race conditi

Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

2014-12-08 Thread Murugan, Visnusaran

Hi Zane & Michael,

Please have a look @ 
https://etherpad.openstack.org/p/execution-stream-and-aggregator-based-convergence

Updated with a combined approach which does not require persisting graph and 
backup stack removal. This approach reduces DB queries by waiting for 
completion notification on a topic. The drawback I see is that delete stack 
stream will be huge as it will have the entire graph. We can always dump such 
data in ResourceLock.data Json and pass a simple flag "load_stream_from_db" to 
converge RPC call as a workaround for delete operation.

To Stop current stack operation, we will use your traversal_id based approach. 
If in case you feel Aggregator model creates more queues, then we might have to 
poll DB to get resource status. (Which will impact performance adversely :) )


Lock table: name(Unique - Resource_id), stack_id, engine_id, data (Json to 
store stream dict)
   
Your thoughts.
Vishnu (irc: ckmvishnu)
Unmesh (irc: unmeshg)


-Original Message-
From: Zane Bitter [mailto:zbit...@redhat.com] 
Sent: Thursday, December 4, 2014 10:50 AM
To: openstack-dev@lists.openstack.org
Subject: Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

On 01/12/14 02:02, Anant Patil wrote:
> On GitHub:https://github.com/anantpatil/heat-convergence-poc

I'm trying to review this code at the moment, and finding some stuff I don't 
understand:

https://github.com/anantpatil/heat-convergence-poc/blob/master/heat/engine/stack.py#L911-L916

This appears to loop through all of the resources *prior* to kicking off any 
actual updates to check if the resource will change. This is impossible to do 
in general, since a resource may obtain a property value from an attribute of 
another resource and there is no way to know whether an update to said other 
resource would cause a change in the attribute value.

In addition, no attempt to catch UpdateReplace is made. Although that looks 
like a simple fix, I'm now worried about the level to which this code has been 
tested.


I'm also trying to wrap my head around how resources are cleaned up in 
dependency order. If I understand correctly, you store in the ResourceGraph 
table the dependencies between various resource names in the current template 
(presumably there could also be some left around from previous templates too?). 
For each resource name there may be a number of rows in the Resource table, 
each with an incrementing version. 
As far as I can tell though, there's nowhere that the dependency graph for 
_previous_ templates is persisted? So if the dependency order changes in the 
template we have no way of knowing the correct order to clean up in any more? 
(There's not even a mechanism to associate a resource version with a particular 
template, which might be one avenue by which to recover the dependencies.)

I think this is an important case we need to be able to handle, so I added a 
scenario to my test framework to exercise it and discovered that my 
implementation was also buggy. Here's the fix: 
https://github.com/zaneb/heat-convergence-prototype/commit/786f367210ca0acf9eb22bea78fd9d51941b0e40


> It was difficult, for me personally, to completely understand Zane's 
> PoC and how it would lay the foundation for aforementioned design 
> goals. It would be very helpful to have Zane's understanding here. I 
> could understand that there are ideas like async message passing and 
> notifying the parent which we also subscribe to.

So I guess the thing to note is that there are essentially two parts to my Poc:
1) A simulation framework that takes what will be in the final implementation 
multiple tasks running in parallel in separate processes and talking to a 
database, and replaces it with an event loop that runs the tasks sequentially 
in a single process with an in-memory data store. 
I could have built a more realistic simulator using Celery or something, but I 
preferred this way as it offers deterministic tests.
2) A toy implementation of Heat on top of this framework.

The files map roughly to Heat something like this:

converge.engine   -> heat.engine.service
converge.stack-> heat.engine.stack
converge.resource -> heat.engine.resource
converge.template -> heat.engine.template
converge.dependencies -> actually is heat.engine.dependencies
converge.sync_point   -> no equivalent
converge.converger-> no equivalent (this is convergence "worker")
converge.reality  -> represents the actual OpenStack services

For convenience, I just use the @asynchronous decorator to turn an ordinary 
method call into a simulated message.

The concept is essentially as follows:
At the start of a stack update (creates and deletes are also just
updates) we create any new resources in the DB calculate the dependency graph 
for the update from the data in the DB and template. This 

Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

2014-12-03 Thread Zane Bitter

On 01/12/14 02:02, Anant Patil wrote:

On GitHub:https://github.com/anantpatil/heat-convergence-poc


I'm trying to review this code at the moment, and finding some stuff I 
don't understand:


https://github.com/anantpatil/heat-convergence-poc/blob/master/heat/engine/stack.py#L911-L916

This appears to loop through all of the resources *prior* to kicking off 
any actual updates to check if the resource will change. This is 
impossible to do in general, since a resource may obtain a property 
value from an attribute of another resource and there is no way to know 
whether an update to said other resource would cause a change in the 
attribute value.


In addition, no attempt to catch UpdateReplace is made. Although that 
looks like a simple fix, I'm now worried about the level to which this 
code has been tested.



I'm also trying to wrap my head around how resources are cleaned up in 
dependency order. If I understand correctly, you store in the 
ResourceGraph table the dependencies between various resource names in 
the current template (presumably there could also be some left around 
from previous templates too?). For each resource name there may be a 
number of rows in the Resource table, each with an incrementing version. 
As far as I can tell though, there's nowhere that the dependency graph 
for _previous_ templates is persisted? So if the dependency order 
changes in the template we have no way of knowing the correct order to 
clean up in any more? (There's not even a mechanism to associate a 
resource version with a particular template, which might be one avenue 
by which to recover the dependencies.)


I think this is an important case we need to be able to handle, so I 
added a scenario to my test framework to exercise it and discovered that 
my implementation was also buggy. Here's the fix: 
https://github.com/zaneb/heat-convergence-prototype/commit/786f367210ca0acf9eb22bea78fd9d51941b0e40




It was difficult, for me personally, to completely understand Zane's PoC
and how it would lay the foundation for aforementioned design goals. It
would be very helpful to have Zane's understanding here. I could
understand that there are ideas like async message passing and notifying
the parent which we also subscribe to.


So I guess the thing to note is that there are essentially two parts to 
my Poc:
1) A simulation framework that takes what will be in the final 
implementation multiple tasks running in parallel in separate processes 
and talking to a database, and replaces it with an event loop that runs 
the tasks sequentially in a single process with an in-memory data store. 
I could have built a more realistic simulator using Celery or something, 
but I preferred this way as it offers deterministic tests.

2) A toy implementation of Heat on top of this framework.

The files map roughly to Heat something like this:

converge.engine   -> heat.engine.service
converge.stack-> heat.engine.stack
converge.resource -> heat.engine.resource
converge.template -> heat.engine.template
converge.dependencies -> actually is heat.engine.dependencies
converge.sync_point   -> no equivalent
converge.converger-> no equivalent (this is convergence "worker")
converge.reality  -> represents the actual OpenStack services

For convenience, I just use the @asynchronous decorator to turn an 
ordinary method call into a simulated message.


The concept is essentially as follows:
At the start of a stack update (creates and deletes are also just 
updates) we create any new resources in the DB calculate the dependency 
graph for the update from the data in the DB and template. This graph is 
the same one used by updates in Heat currently, so it contains both the 
forward and reverse (cleanup) dependencies. The stack update then kicks 
off checks of all the leaf nodes, passing the pre-calculated dependency 
graph.


Each resource check may result in a call to the create(), update() or 
delete() methods of a Resource plugin. The resource also reads any 
attributes that will be required from it. Once this is complete, it 
triggers any dependent resources that are ready, or updates a SyncPoint 
in the database if there are dependent resources that have multiple 
requirements. The message triggering the next resource will contain the 
dependency graph again, as well as the RefIds and required attributes of 
any resources it depends on.


The new dependencies thus created are added to the resource itself in 
the database at the time it is checked, allowing it to record the 
changes caused by a requirement being unexpectedly replaced without 
needing a global lock on anything.


When cleaning up resources, we also endeavour to remove any that are 
successfully deleted from the dependencies graph.


Each traversal has a unique ID that is both stored in the stack and 
passed down through the resource check triggers. (At present this is the 
template ID, but it may make more sense to have a unique ID since old 

Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

2014-12-02 Thread Clint Byrum
Excerpts from Anant Patil's message of 2014-12-02 10:37:31 -0800:
> Yes, that's the synchronization block for which we use the stack lock.
> Currently, a thread spin waits to acquire the lock to enter this critical
> section.
> 
> I don't really know how to do application level transaction. Is there an
> external library for that? AFAIK, we cannot switch from DB transaction to
> application level to do some computation to resolve next set of resources,
> submit to async queue, and again continue with the DB transaction. The
> transnational method looks attractive and clean to me but I am limited by
> my knowledge.Or do you mean to have DB transaction object made available
> via DB APIs and use them in the application? Please share your thoughts.
> 

Every DB request has a context attached to the currently running
greenthread which has the DB session object. So yes, you do begin the
transaction in one db API call, and try to commit it in another after
having attempted any application logic.  The whole thing should always be
in a try/except retry loop to handle deadlocks and conflicts introduced
by multi-master sync replication like Galera.

For non-transactional backends, they would have to use a spin lock like
you're using now.

> To avoid locking the stack, we were thinking of designating a single engine
> with responsibility of processing all notifications for a stack. All the
> workers will notify on the stack topic, which only one engine listens to
> and then the notifications end up in a queue (local to the engine, per
> stack), from where they are taken up one-by-one to continue the stack
> operation. The convergence jobs are produced by this engine for the stack
> it is responsible for, and they might end-up in any of the engines. But the
> notifications for a stack are directed to one engine to avoid contention
> for lock. The convergence load is leveled and distributed, and stack lock
> is not needed.
> 

No don't do that. You already have that engine, it is the database engine
and it is intended to be used for synchronization and will achieve a
higher degree of concurrency in bigger stacks because it will only block
two workers when they try to inspect or change the same rows.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

2014-12-02 Thread Anant Patil
Yes, that's the synchronization block for which we use the stack lock.
Currently, a thread spin waits to acquire the lock to enter this critical
section.

I don't really know how to do application level transaction. Is there an
external library for that? AFAIK, we cannot switch from DB transaction to
application level to do some computation to resolve next set of resources,
submit to async queue, and again continue with the DB transaction. The
transnational method looks attractive and clean to me but I am limited by
my knowledge.Or do you mean to have DB transaction object made available
via DB APIs and use them in the application? Please share your thoughts.

To avoid locking the stack, we were thinking of designating a single engine
with responsibility of processing all notifications for a stack. All the
workers will notify on the stack topic, which only one engine listens to
and then the notifications end up in a queue (local to the engine, per
stack), from where they are taken up one-by-one to continue the stack
operation. The convergence jobs are produced by this engine for the stack
it is responsible for, and they might end-up in any of the engines. But the
notifications for a stack are directed to one engine to avoid contention
for lock. The convergence load is leveled and distributed, and stack lock
is not needed.

- Anant

On Tue, Dec 2, 2014 at 1:49 AM, Clint Byrum  wrote:

> Excerpts from Anant Patil's message of 2014-11-30 23:02:29 -0800:
> > On 27-Nov-14 18:03, Murugan, Visnusaran wrote:
> > > Hi Zane,
> > >
> > >
> > >
> > > At this stage our implementation (as mentioned in wiki
> > > <https://wiki.openstack.org/wiki/Heat/ConvergenceDesign>) achieves
> your
> > > design goals.
> > >
> > >
> > >
> > > 1.   In case of a parallel update, our implementation adjusts graph
> > > according to new template and waits for dispatched resource tasks to
> > > complete.
> > >
> > > 2.   Reason for basing our PoC on Heat code:
> > >
> > > a.   To solve contention processing parent resource by all
> dependent
> > > resources in parallel.
> > >
> > > b.  To avoid porting issue from PoC to HeatBase. (just to be aware
> > > of potential issues asap)
> > >
> > > 3.   Resource timeout would be helpful, but I guess its resource
> > > specific and has to come from template and default values from plugins.
> > >
> > > 4.   We see resource notification aggregation and processing next
> > > level of resources without contention and with minimal DB usage as the
> > > problem area. We are working on the following approaches in *parallel.*
> > >
> > > a.   Use a Queue per stack to serialize notification.
> > >
> > > b.  Get parent ProcessLog (ResourceID, EngineID) and initiate
> > > convergence upon first child notification. Subsequent children who fail
> > > to get parent resource lock will directly send message to waiting
> parent
> > > task (topic=stack_id.parent_resource_id)
> > >
> > > Based on performance/feedback we can select either or a mashed version.
> > >
> > >
> > >
> > > Advantages:
> > >
> > > 1.   Failed Resource tasks can be re-initiated after ProcessLog
> > > table lookup.
> > >
> > > 2.   One worker == one resource.
> > >
> > > 3.   Supports concurrent updates
> > >
> > > 4.   Delete == update with empty stack
> > >
> > > 5.   Rollback == update to previous know good/completed stack.
> > >
> > >
> > >
> > > Disadvantages:
> > >
> > > 1.   Still holds stackLock (WIP to remove with ProcessLog)
> > >
> > >
> > >
> > > Completely understand your concern on reviewing our code, since commits
> > > are numerous and there is change of course at places.  Our start commit
> > > is [c1b3eb22f7ab6ea60b095f88982247dd249139bf] though this might not
> help J
> > >
> > >
> > >
> > > Your Thoughts.
> > >
> > >
> > >
> > > Happy Thanksgiving.
> > >
> > > Vishnu.
> > >
> > >
> > >
> > > *From:*Angus Salkeld [mailto:asalk...@mirantis.com]
> > > *Sent:* Thursday, November 27, 2014 9:46 AM
> > > *To:* OpenStack Development Mailing List (not for usage questions)
> > > *Subject:* Re: [openstack-dev] [Heat] Convergence proof-of-concept
> showdown
> > >
> > >
> > >
> > > On

Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

2014-12-02 Thread Anant Patil
On Tue, Dec 2, 2014 at 1:49 AM, Clint Byrum  wrote:

> Excerpts from Anant Patil's message of 2014-11-30 23:02:29 -0800:
> > On 27-Nov-14 18:03, Murugan, Visnusaran wrote:
> > > Hi Zane,
> > >
> > >
> > >
> > > At this stage our implementation (as mentioned in wiki
> > > <https://wiki.openstack.org/wiki/Heat/ConvergenceDesign>) achieves
> your
> > > design goals.
> > >
> > >
> > >
> > > 1.   In case of a parallel update, our implementation adjusts graph
> > > according to new template and waits for dispatched resource tasks to
> > > complete.
> > >
> > > 2.   Reason for basing our PoC on Heat code:
> > >
> > > a.   To solve contention processing parent resource by all
> dependent
> > > resources in parallel.
> > >
> > > b.  To avoid porting issue from PoC to HeatBase. (just to be aware
> > > of potential issues asap)
> > >
> > > 3.   Resource timeout would be helpful, but I guess its resource
> > > specific and has to come from template and default values from plugins.
> > >
> > > 4.   We see resource notification aggregation and processing next
> > > level of resources without contention and with minimal DB usage as the
> > > problem area. We are working on the following approaches in *parallel.*
> > >
> > > a.   Use a Queue per stack to serialize notification.
> > >
> > > b.  Get parent ProcessLog (ResourceID, EngineID) and initiate
> > > convergence upon first child notification. Subsequent children who fail
> > > to get parent resource lock will directly send message to waiting
> parent
> > > task (topic=stack_id.parent_resource_id)
> > >
> > > Based on performance/feedback we can select either or a mashed version.
> > >
> > >
> > >
> > > Advantages:
> > >
> > > 1.   Failed Resource tasks can be re-initiated after ProcessLog
> > > table lookup.
> > >
> > > 2.   One worker == one resource.
> > >
> > > 3.   Supports concurrent updates
> > >
> > > 4.   Delete == update with empty stack
> > >
> > > 5.   Rollback == update to previous know good/completed stack.
> > >
> > >
> > >
> > > Disadvantages:
> > >
> > > 1.   Still holds stackLock (WIP to remove with ProcessLog)
> > >
> > >
> > >
> > > Completely understand your concern on reviewing our code, since commits
> > > are numerous and there is change of course at places.  Our start commit
> > > is [c1b3eb22f7ab6ea60b095f88982247dd249139bf] though this might not
> help J
> > >
> > >
> > >
> > > Your Thoughts.
> > >
> > >
> > >
> > > Happy Thanksgiving.
> > >
> > > Vishnu.
> > >
> > >
> > >
> > > *From:*Angus Salkeld [mailto:asalk...@mirantis.com]
> > > *Sent:* Thursday, November 27, 2014 9:46 AM
> > > *To:* OpenStack Development Mailing List (not for usage questions)
> > > *Subject:* Re: [openstack-dev] [Heat] Convergence proof-of-concept
> showdown
> > >
> > >
> > >
> > > On Thu, Nov 27, 2014 at 12:20 PM, Zane Bitter  > > <mailto:zbit...@redhat.com>> wrote:
> > >
> > > A bunch of us have spent the last few weeks working independently
> on
> > > proof of concept designs for the convergence architecture. I think
> > > those efforts have now reached a sufficient level of maturity that
> > > we should start working together on synthesising them into a plan
> > > that everyone can forge ahead with. As a starting point I'm going
> to
> > > summarise my take on the three efforts; hopefully the authors of
> the
> > > other two will weigh in to give us their perspective.
> > >
> > >
> > > Zane's Proposal
> > > ===
> > >
> > >
> https://github.com/zaneb/heat-convergence-prototype/tree/distributed-graph
> > >
> > > I implemented this as a simulator of the algorithm rather than
> using
> > > the Heat codebase itself in order to be able to iterate rapidly on
> > > the design, and indeed I have changed my mind many, many times in
> > > the process of implementing it. Its notable departure from a
> > > realistic simulation is that it runs only on

Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

2014-12-01 Thread Clint Byrum
Excerpts from Anant Patil's message of 2014-11-30 23:02:29 -0800:
> On 27-Nov-14 18:03, Murugan, Visnusaran wrote:
> > Hi Zane,
> > 
> >  
> > 
> > At this stage our implementation (as mentioned in wiki
> > <https://wiki.openstack.org/wiki/Heat/ConvergenceDesign>) achieves your
> > design goals.
> > 
> >  
> > 
> > 1.   In case of a parallel update, our implementation adjusts graph
> > according to new template and waits for dispatched resource tasks to
> > complete.
> > 
> > 2.   Reason for basing our PoC on Heat code:
> > 
> > a.   To solve contention processing parent resource by all dependent
> > resources in parallel.
> > 
> > b.  To avoid porting issue from PoC to HeatBase. (just to be aware
> > of potential issues asap)
> > 
> > 3.   Resource timeout would be helpful, but I guess its resource
> > specific and has to come from template and default values from plugins.
> > 
> > 4.   We see resource notification aggregation and processing next
> > level of resources without contention and with minimal DB usage as the
> > problem area. We are working on the following approaches in *parallel.*
> > 
> > a.   Use a Queue per stack to serialize notification.
> > 
> > b.  Get parent ProcessLog (ResourceID, EngineID) and initiate
> > convergence upon first child notification. Subsequent children who fail
> > to get parent resource lock will directly send message to waiting parent
> > task (topic=stack_id.parent_resource_id)
> > 
> > Based on performance/feedback we can select either or a mashed version.
> > 
> >  
> > 
> > Advantages:
> > 
> > 1.   Failed Resource tasks can be re-initiated after ProcessLog
> > table lookup.
> > 
> > 2.   One worker == one resource.
> > 
> > 3.   Supports concurrent updates
> > 
> > 4.   Delete == update with empty stack
> > 
> > 5.   Rollback == update to previous know good/completed stack.
> > 
> >  
> > 
> > Disadvantages:
> > 
> > 1.   Still holds stackLock (WIP to remove with ProcessLog)
> > 
> >  
> > 
> > Completely understand your concern on reviewing our code, since commits
> > are numerous and there is change of course at places.  Our start commit
> > is [c1b3eb22f7ab6ea60b095f88982247dd249139bf] though this might not help J
> > 
> >  
> > 
> > Your Thoughts.
> > 
> >  
> > 
> > Happy Thanksgiving.
> > 
> > Vishnu.
> > 
> >  
> > 
> > *From:*Angus Salkeld [mailto:asalk...@mirantis.com]
> > *Sent:* Thursday, November 27, 2014 9:46 AM
> > *To:* OpenStack Development Mailing List (not for usage questions)
> > *Subject:* Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown
> > 
> >  
> > 
> > On Thu, Nov 27, 2014 at 12:20 PM, Zane Bitter  > <mailto:zbit...@redhat.com>> wrote:
> > 
> > A bunch of us have spent the last few weeks working independently on
> > proof of concept designs for the convergence architecture. I think
> > those efforts have now reached a sufficient level of maturity that
> > we should start working together on synthesising them into a plan
> > that everyone can forge ahead with. As a starting point I'm going to
> > summarise my take on the three efforts; hopefully the authors of the
> > other two will weigh in to give us their perspective.
> > 
> > 
> > Zane's Proposal
> > ===
> > 
> > 
> > https://github.com/zaneb/heat-convergence-prototype/tree/distributed-graph
> > 
> > I implemented this as a simulator of the algorithm rather than using
> > the Heat codebase itself in order to be able to iterate rapidly on
> > the design, and indeed I have changed my mind many, many times in
> > the process of implementing it. Its notable departure from a
> > realistic simulation is that it runs only one operation at a time -
> > essentially giving up the ability to detect race conditions in
> > exchange for a completely deterministic test framework. You just
> > have to imagine where the locks need to be. Incidentally, the test
> > framework is designed so that it can easily be ported to the actual
> > Heat code base as functional tests so that the same scenarios could
> > be used without modification, allowing us to have confidence that
> > the eventual implementation is a faithful replicatio

Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

2014-11-30 Thread Anant Patil
On 27-Nov-14 18:03, Murugan, Visnusaran wrote:
> Hi Zane,
> 
>  
> 
> At this stage our implementation (as mentioned in wiki
> <https://wiki.openstack.org/wiki/Heat/ConvergenceDesign>) achieves your
> design goals.
> 
>  
> 
> 1.   In case of a parallel update, our implementation adjusts graph
> according to new template and waits for dispatched resource tasks to
> complete.
> 
> 2.   Reason for basing our PoC on Heat code:
> 
> a.   To solve contention processing parent resource by all dependent
> resources in parallel.
> 
> b.  To avoid porting issue from PoC to HeatBase. (just to be aware
> of potential issues asap)
> 
> 3.   Resource timeout would be helpful, but I guess its resource
> specific and has to come from template and default values from plugins.
> 
> 4.   We see resource notification aggregation and processing next
> level of resources without contention and with minimal DB usage as the
> problem area. We are working on the following approaches in *parallel.*
> 
> a.   Use a Queue per stack to serialize notification.
> 
> b.  Get parent ProcessLog (ResourceID, EngineID) and initiate
> convergence upon first child notification. Subsequent children who fail
> to get parent resource lock will directly send message to waiting parent
> task (topic=stack_id.parent_resource_id)
> 
> Based on performance/feedback we can select either or a mashed version.
> 
>  
> 
> Advantages:
> 
> 1.   Failed Resource tasks can be re-initiated after ProcessLog
> table lookup.
> 
> 2.   One worker == one resource.
> 
> 3.   Supports concurrent updates
> 
> 4.   Delete == update with empty stack
> 
> 5.   Rollback == update to previous know good/completed stack.
> 
>  
> 
> Disadvantages:
> 
> 1.   Still holds stackLock (WIP to remove with ProcessLog)
> 
>  
> 
> Completely understand your concern on reviewing our code, since commits
> are numerous and there is change of course at places.  Our start commit
> is [c1b3eb22f7ab6ea60b095f88982247dd249139bf] though this might not help J
> 
>  
> 
> Your Thoughts.
> 
>  
> 
> Happy Thanksgiving.
> 
> Vishnu.
> 
>  
> 
> *From:*Angus Salkeld [mailto:asalk...@mirantis.com]
> *Sent:* Thursday, November 27, 2014 9:46 AM
> *To:* OpenStack Development Mailing List (not for usage questions)
> *Subject:* Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown
> 
>  
> 
> On Thu, Nov 27, 2014 at 12:20 PM, Zane Bitter  <mailto:zbit...@redhat.com>> wrote:
> 
> A bunch of us have spent the last few weeks working independently on
> proof of concept designs for the convergence architecture. I think
> those efforts have now reached a sufficient level of maturity that
> we should start working together on synthesising them into a plan
> that everyone can forge ahead with. As a starting point I'm going to
> summarise my take on the three efforts; hopefully the authors of the
> other two will weigh in to give us their perspective.
> 
> 
> Zane's Proposal
> ===
> 
> https://github.com/zaneb/heat-convergence-prototype/tree/distributed-graph
> 
> I implemented this as a simulator of the algorithm rather than using
> the Heat codebase itself in order to be able to iterate rapidly on
> the design, and indeed I have changed my mind many, many times in
> the process of implementing it. Its notable departure from a
> realistic simulation is that it runs only one operation at a time -
> essentially giving up the ability to detect race conditions in
> exchange for a completely deterministic test framework. You just
> have to imagine where the locks need to be. Incidentally, the test
> framework is designed so that it can easily be ported to the actual
> Heat code base as functional tests so that the same scenarios could
> be used without modification, allowing us to have confidence that
> the eventual implementation is a faithful replication of the
> simulation (which can be rapidly experimented on, adjusted and
> tested when we inevitably run into implementation issues).
> 
> This is a complete implementation of Phase 1 (i.e. using existing
> resource plugins), including update-during-update, resource
> clean-up, replace on update and rollback; with tests.
> 
> Some of the design goals which were successfully incorporated:
> - Minimise changes to Heat (it's essentially a distributed version
> of the existing algorithm), and in particular to the database
> - Work with the existing plugin API
> - L

Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

2014-11-27 Thread Murugan, Visnusaran
Hi Zane,

At this stage our implementation (as mentioned in 
wiki<https://wiki.openstack.org/wiki/Heat/ConvergenceDesign>) achieves your 
design goals.


1.   In case of a parallel update, our implementation adjusts graph 
according to new template and waits for dispatched resource tasks to complete.

2.   Reason for basing our PoC on Heat code:

a.   To solve contention processing parent resource by all dependent 
resources in parallel.

b.  To avoid porting issue from PoC to HeatBase. (just to be aware of 
potential issues asap)

3.   Resource timeout would be helpful, but I guess its resource specific 
and has to come from template and default values from plugins.

4.   We see resource notification aggregation and processing next level of 
resources without contention and with minimal DB usage as the problem area. We 
are working on the following approaches in parallel.

a.   Use a Queue per stack to serialize notification.

b.  Get parent ProcessLog (ResourceID, EngineID) and initiate convergence 
upon first child notification. Subsequent children who fail to get parent 
resource lock will directly send message to waiting parent task 
(topic=stack_id.parent_resource_id)
Based on performance/feedback we can select either or a mashed version.

Advantages:

1.   Failed Resource tasks can be re-initiated after ProcessLog table 
lookup.

2.   One worker == one resource.

3.   Supports concurrent updates

4.   Delete == update with empty stack

5.   Rollback == update to previous know good/completed stack.

Disadvantages:

1.   Still holds stackLock (WIP to remove with ProcessLog)

Completely understand your concern on reviewing our code, since commits are 
numerous and there is change of course at places.  Our start commit is 
[c1b3eb22f7ab6ea60b095f88982247dd249139bf] though this might not help ☺

Your Thoughts.

Happy Thanksgiving.
Vishnu.

From: Angus Salkeld [mailto:asalk...@mirantis.com]
Sent: Thursday, November 27, 2014 9:46 AM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

On Thu, Nov 27, 2014 at 12:20 PM, Zane Bitter 
mailto:zbit...@redhat.com>> wrote:
A bunch of us have spent the last few weeks working independently on proof of 
concept designs for the convergence architecture. I think those efforts have 
now reached a sufficient level of maturity that we should start working 
together on synthesising them into a plan that everyone can forge ahead with. 
As a starting point I'm going to summarise my take on the three efforts; 
hopefully the authors of the other two will weigh in to give us their 
perspective.


Zane's Proposal
===

https://github.com/zaneb/heat-convergence-prototype/tree/distributed-graph

I implemented this as a simulator of the algorithm rather than using the Heat 
codebase itself in order to be able to iterate rapidly on the design, and 
indeed I have changed my mind many, many times in the process of implementing 
it. Its notable departure from a realistic simulation is that it runs only one 
operation at a time - essentially giving up the ability to detect race 
conditions in exchange for a completely deterministic test framework. You just 
have to imagine where the locks need to be. Incidentally, the test framework is 
designed so that it can easily be ported to the actual Heat code base as 
functional tests so that the same scenarios could be used without modification, 
allowing us to have confidence that the eventual implementation is a faithful 
replication of the simulation (which can be rapidly experimented on, adjusted 
and tested when we inevitably run into implementation issues).

This is a complete implementation of Phase 1 (i.e. using existing resource 
plugins), including update-during-update, resource clean-up, replace on update 
and rollback; with tests.

Some of the design goals which were successfully incorporated:
- Minimise changes to Heat (it's essentially a distributed version of the 
existing algorithm), and in particular to the database
- Work with the existing plugin API
- Limit total DB access for Resource/Stack to O(n) in the number of resources
- Limit overall DB access to O(m) in the number of edges
- Limit lock contention to only those operations actually contending (i.e. no 
global locks)
- Each worker task deals with only one resource
- Only read resource attributes once


Open questions:
- What do we do when we encounter a resource that is in progress from a 
previous update while doing a subsequent update? Obviously we don't want to 
interrupt it, as it will likely be left in an unknown state. Making a 
replacement is one obvious answer, but in many cases there could be serious 
down-sides to that. How long should we wait before trying it? What if it's 
still in progress because the engine processing the resource already died?

Also, how do we

Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

2014-11-27 Thread Jastrzebski, Michal
W dniu 11/27/2014 o 5:15 AM, Angus Salkeld pisze:> On Thu, Nov 27, 2014 
at 12:20 PM, Zane Bitter  > wrote:
> 
> A bunch of us have spent the last few weeks working independently on
> proof of concept designs for the convergence architecture. I think
> those efforts have now reached a sufficient level of maturity that
> we should start working together on synthesising them into a plan
> that everyone can forge ahead with. As a starting point I'm going to
> summarise my take on the three efforts; hopefully the authors of the
> other two will weigh in to give us their perspective.
> 
> 
> Zane's Proposal
> ===
> 
> 
> https://github.com/zaneb/heat-__convergence-prototype/tree/__distributed-graph
> 
> 
> 
> I implemented this as a simulator of the algorithm rather than using
> the Heat codebase itself in order to be able to iterate rapidly on
> the design, and indeed I have changed my mind many, many times in
> the process of implementing it. Its notable departure from a
> realistic simulation is that it runs only one operation at a time -
> essentially giving up the ability to detect race conditions in
> exchange for a completely deterministic test framework. You just
> have to imagine where the locks need to be. Incidentally, the test
> framework is designed so that it can easily be ported to the actual
> Heat code base as functional tests so that the same scenarios could
> be used without modification, allowing us to have confidence that
> the eventual implementation is a faithful replication of the
> simulation (which can be rapidly experimented on, adjusted and
> tested when we inevitably run into implementation issues).
> 
> This is a complete implementation of Phase 1 (i.e. using existing
> resource plugins), including update-during-update, resource
> clean-up, replace on update and rollback; with tests.
> 
> Some of the design goals which were successfully incorporated:
> - Minimise changes to Heat (it's essentially a distributed version
> of the existing algorithm), and in particular to the database
> - Work with the existing plugin API
> - Limit total DB access for Resource/Stack to O(n) in the number of
> resources
> - Limit overall DB access to O(m) in the number of edges
> - Limit lock contention to only those operations actually contending
> (i.e. no global locks)
> - Each worker task deals with only one resource
> - Only read resource attributes once
> 
> Open questions:
> - What do we do when we encounter a resource that is in progress
> from a previous update while doing a subsequent update? Obviously we
> don't want to interrupt it, as it will likely be left in an unknown
> state. Making a replacement is one obvious answer, but in many cases
> there could be serious down-sides to that. How long should we wait
> before trying it? What if it's still in progress because the engine
> processing the resource already died?
> 
> 
> Also, how do we implement resource level timeouts in general?
> 
> 
> Michał's Proposal
> =
> 
> https://github.com/inc0/heat-__convergence-prototype/tree/__iterative 
> 
> 
> Note that a version modified by me to use the same test scenario
> format (but not the same scenarios) is here:
> 
> 
> https://github.com/zaneb/heat-__convergence-prototype/tree/__iterative-adapted
> 
> 
> 
> This is based on my simulation framework after a fashion, but with
> everything implemented synchronously and a lot of handwaving about
> how the actual implementation could be distributed. The central
> premise is that at each step of the algorithm, the entire graph is
> examined for tasks that can be performed next, and those are then
> started. Once all are complete (it's synchronous, remember), the
> next step is run. Keen observers will be asking how we know when it
> is time to run the next step in a distributed version of this
> algorithm, where it will be run and what to do about resources that
> are in an intermediate state at that time. All of these questions
> remain unanswered.

Q1. When is time to run next step?
There is no really next step or previous step, there is step which can be run 
*right now*, which effectively means when current step finishes, because then 
and only then all requirements are met and we can proceed.

Q2. I can see 3 main processes there:

* Converger (I'd assume thats current engine):
Process which parses graph and schedules next steps, it will be run after 
change in reality is detected.

* Observer:
Process which keeps r

Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

2014-11-26 Thread Angus Salkeld
On Thu, Nov 27, 2014 at 12:20 PM, Zane Bitter  wrote:

> A bunch of us have spent the last few weeks working independently on proof
> of concept designs for the convergence architecture. I think those efforts
> have now reached a sufficient level of maturity that we should start
> working together on synthesising them into a plan that everyone can forge
> ahead with. As a starting point I'm going to summarise my take on the three
> efforts; hopefully the authors of the other two will weigh in to give us
> their perspective.
>
>
> Zane's Proposal
> ===
>
> https://github.com/zaneb/heat-convergence-prototype/tree/distributed-graph
>
> I implemented this as a simulator of the algorithm rather than using the
> Heat codebase itself in order to be able to iterate rapidly on the design,
> and indeed I have changed my mind many, many times in the process of
> implementing it. Its notable departure from a realistic simulation is that
> it runs only one operation at a time - essentially giving up the ability to
> detect race conditions in exchange for a completely deterministic test
> framework. You just have to imagine where the locks need to be.
> Incidentally, the test framework is designed so that it can easily be
> ported to the actual Heat code base as functional tests so that the same
> scenarios could be used without modification, allowing us to have
> confidence that the eventual implementation is a faithful replication of
> the simulation (which can be rapidly experimented on, adjusted and tested
> when we inevitably run into implementation issues).
>
> This is a complete implementation of Phase 1 (i.e. using existing resource
> plugins), including update-during-update, resource clean-up, replace on
> update and rollback; with tests.
>
> Some of the design goals which were successfully incorporated:
> - Minimise changes to Heat (it's essentially a distributed version of the
> existing algorithm), and in particular to the database
> - Work with the existing plugin API
> - Limit total DB access for Resource/Stack to O(n) in the number of
> resources
> - Limit overall DB access to O(m) in the number of edges
> - Limit lock contention to only those operations actually contending (i.e.
> no global locks)
> - Each worker task deals with only one resource
> - Only read resource attributes once
>
> Open questions:
> - What do we do when we encounter a resource that is in progress from a
> previous update while doing a subsequent update? Obviously we don't want to
> interrupt it, as it will likely be left in an unknown state. Making a
> replacement is one obvious answer, but in many cases there could be serious
> down-sides to that. How long should we wait before trying it? What if it's
> still in progress because the engine processing the resource already died?
>
>
Also, how do we implement resource level timeouts in general?


>
> Michał's Proposal
> =
>
> https://github.com/inc0/heat-convergence-prototype/tree/iterative
>
> Note that a version modified by me to use the same test scenario format
> (but not the same scenarios) is here:
>
> https://github.com/zaneb/heat-convergence-prototype/tree/iterative-adapted
>
> This is based on my simulation framework after a fashion, but with
> everything implemented synchronously and a lot of handwaving about how the
> actual implementation could be distributed. The central premise is that at
> each step of the algorithm, the entire graph is examined for tasks that can
> be performed next, and those are then started. Once all are complete (it's
> synchronous, remember), the next step is run. Keen observers will be asking
> how we know when it is time to run the next step in a distributed version
> of this algorithm, where it will be run and what to do about resources that
> are in an intermediate state at that time. All of these questions remain
> unanswered.
>

Yes, I was struggling to figure out how it could manage an IN_PROGRESS
state as it's stateless. So you end up treading on the other action's toes.
Assuming we use the resource's state (IN_PROGRESS) you could get around
that. Then you kick off a converge when ever an action completes (if there
is nothing new to be
done then do nothing).


>
> A non-exhaustive list of concerns I have:
> - Replace on update is not implemented yet
> - AFAIK rollback is not implemented yet
> - The simulation doesn't actually implement the proposed architecture
> - This approach is punishingly heavy on the database - O(n^2) or worse
>

Yes, re-reading the state of all resources when ever run a new converge is
worrying, but I think Michal had some ideas to minimize this.


> - A lot of phase 2 is mixed in with phase 1 here, making it difficult to
> evaluate which changes need to be made first and whether this approach
> works with existing plugins
> - The code is not really based on how Heat works at the moment, so there
> would be either a major redesign required or lots of radical changes in
> Heat or both
>