Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

2015-01-09 Thread Clint Byrum
Excerpts from Zane Bitter's message of 2015-01-09 14:57:21 -0800:
 On 08/01/15 05:39, Anant Patil wrote:
  1. The stack was failing when there were single disjoint resources or
  just one resource in template. The graph did not include this resource
  due to a minor bug in dependency_names(). I have added a test case and
  fix here:
  https://github.com/anantpatil/heat-convergence-prototype/commit/b58abd77cf596475ecf3f19ed38adf8ad3bb6b3b
 
 Thanks, sorry about that! I will push a patch to fix it up.
 
  2. The resource graph is created with keys in both forward order
  traversal and reverse order traversal and the update will finish the
  forward order and attempt the reverse order. If this is the case, then
  the update-replaced resources will be deleted before the update is
  complete and if the update fails, the old resource is not available for
  roll-back; a new resource has to be created then. I have added a test
  case at the above mentioned location.
 
  In our PoC, the updates (concurrent updates) won't remove a
  update-replaced resource until all the resources are updated, and
  resource clean-up phase is started.
 
 Hmmm, this is a really interesting question actually. That's certainly 
 not how Heat works at the moment; we've always assumed that rollback is 
 best-effort at recovering the exact resources you had before. It would 
 be great to have users weigh in on how they expect this to behave. I'm 
 curious now what CloudFormation does.
 
 I'm reluctant to change it though because I'm pretty sure this is 
 definitely *not* how you would want e.g. a rolling update of an 
 autoscaling group to happen.
 
  It is unacceptable to remove the old
  resource to be rolled-back to since it may have changes which the user
  doesn't want to loose;
 
 If they didn't want to lose it they shouldn't have tried an update that 
 would replace it. If an update causes a replacement or an interruption 
 to service then I consider the same fair game for the rollback - the 
 user has already given us permission for that kind of change. (Whether 
 the user's consent was informed is a separate question, addressed by 
 Ryan's update-preview work.)
 

In the original vision we had for using scaled groups to manage, say,
nova-compute nodes, you definitely can't create new servers, so you
can't just create all the new instances without de-allocating some.

That said, thats why we are using in-place methods like rebuild.

I think it would be acceptable to have cleanup run asynchronously,
and to have rollback re-create anything that has already been cleaned up.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

2015-01-09 Thread Zane Bitter

On 08/01/15 05:39, Anant Patil wrote:

1. The stack was failing when there were single disjoint resources or
just one resource in template. The graph did not include this resource
due to a minor bug in dependency_names(). I have added a test case and
fix here:
https://github.com/anantpatil/heat-convergence-prototype/commit/b58abd77cf596475ecf3f19ed38adf8ad3bb6b3b


Thanks, sorry about that! I will push a patch to fix it up.


2. The resource graph is created with keys in both forward order
traversal and reverse order traversal and the update will finish the
forward order and attempt the reverse order. If this is the case, then
the update-replaced resources will be deleted before the update is
complete and if the update fails, the old resource is not available for
roll-back; a new resource has to be created then. I have added a test
case at the above mentioned location.

In our PoC, the updates (concurrent updates) won't remove a
update-replaced resource until all the resources are updated, and
resource clean-up phase is started.


Hmmm, this is a really interesting question actually. That's certainly 
not how Heat works at the moment; we've always assumed that rollback is 
best-effort at recovering the exact resources you had before. It would 
be great to have users weigh in on how they expect this to behave. I'm 
curious now what CloudFormation does.


I'm reluctant to change it though because I'm pretty sure this is 
definitely *not* how you would want e.g. a rolling update of an 
autoscaling group to happen.



It is unacceptable to remove the old
resource to be rolled-back to since it may have changes which the user
doesn't want to loose;


If they didn't want to lose it they shouldn't have tried an update that 
would replace it. If an update causes a replacement or an interruption 
to service then I consider the same fair game for the rollback - the 
user has already given us permission for that kind of change. (Whether 
the user's consent was informed is a separate question, addressed by 
Ryan's update-preview work.)



and that's why probably they use the roll-back
flag.


I don't think there's any basis for saying that. People use the rollback 
flag because they want the stack left in a consistent state even if an 
error occurs.


cheers,
Zane.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

2015-01-08 Thread Anant Patil
On 08-Jan-15 16:09, Anant Patil wrote:
 On 16-Dec-14 09:41, Zane Bitter wrote:
 On 15/12/14 09:32, Anant Patil wrote:
 On 12-Dec-14 06:29, Zane Bitter wrote:
 On 11/12/14 01:14, Anant Patil wrote:
 On 04-Dec-14 10:49, Zane Bitter wrote:
 On 01/12/14 02:02, Anant Patil wrote:
 On GitHub:https://github.com/anantpatil/heat-convergence-poc

 I'm trying to review this code at the moment, and finding some stuff I
 don't understand:

 https://github.com/anantpatil/heat-convergence-poc/blob/master/heat/engine/stack.py#L911-L916

 This appears to loop through all of the resources *prior* to kicking off
 any actual updates to check if the resource will change. This is
 impossible to do in general, since a resource may obtain a property
 value from an attribute of another resource and there is no way to know
 whether an update to said other resource would cause a change in the
 attribute value.

 In addition, no attempt to catch UpdateReplace is made. Although that
 looks like a simple fix, I'm now worried about the level to which this
 code has been tested.

 We were working on new branch and as we discussed on Skype, we have
 handled all these cases. Please have a look at our current branch:
 https://github.com/anantpatil/heat-convergence-poc/tree/graph-version

 When a new resource is taken for convergence, its children are loaded
 and the resource definition is re-parsed. The frozen resource definition
 will have all the get_attr resolved.


 I'm also trying to wrap my head around how resources are cleaned up in
 dependency order. If I understand correctly, you store in the
 ResourceGraph table the dependencies between various resource names in
 the current template (presumably there could also be some left around
 from previous templates too?). For each resource name there may be a
 number of rows in the Resource table, each with an incrementing version.
 As far as I can tell though, there's nowhere that the dependency graph
 for _previous_ templates is persisted? So if the dependency order
 changes in the template we have no way of knowing the correct order to
 clean up in any more? (There's not even a mechanism to associate a
 resource version with a particular template, which might be one avenue
 by which to recover the dependencies.)

 I think this is an important case we need to be able to handle, so I
 added a scenario to my test framework to exercise it and discovered that
 my implementation was also buggy. Here's the fix:
 https://github.com/zaneb/heat-convergence-prototype/commit/786f367210ca0acf9eb22bea78fd9d51941b0e40


 Thanks for pointing this out Zane. We too had a buggy implementation for
 handling inverted dependency. I had a hard look at our algorithm where
 we were continuously merging the edges from new template into the edges
 from previous updates. It was an optimized way of traversing the graph
 in both forward and reverse order with out missing any resources. But,
 when the dependencies are inverted,  this wouldn't work.

 We have changed our algorithm. The changes in edges are noted down in
 DB, only the delta of edges from previous template is calculated and
 kept. At any given point of time, the graph table has all the edges from
 current template and delta from previous templates. Each edge has
 template ID associated with it.

 The thing is, the cleanup dependencies aren't really about the template.
 The real resources really depend on other real resources. You can't
 delete a Volume before its VolumeAttachment, not because it says so in
 the template but because it will fail if you try. The template can give
 us a rough guide in advance to what those dependencies will be, but if
 that's all we keep then we are discarding information.

 There may be multiple versions of a resource corresponding to one
 template version. Even worse, the actual dependencies of a resource
 change on a smaller time scale than an entire stack update (this is the
 reason the current implementation updates the template one resource at a
 time as we go).


 Absolutely! The edges from the template are kept only for the reference
 purposes. When we have a resource in new template, its template ID will
 also be marked to current template. At any point of time, realized
 resource will from current template, even if it were found in previous
 templates. The template ID moves for a resource if it is found.

 In theory (disclaimer: I didn't implement this yet) it can change on an 
 even smaller timescale than that. The existing plugins are something of 
 a black box to us: if a failure occurs we don't necessarily know whether 
 the real-world dependency is on the old or new version of another resource.

 
 Yes, and that's why we rollback the failed resource and its dependent
 resources to older versions, provided that, the older resources are not
 deleted unless update is done. It is easier with template Id as we know
 the previous complete template.
 
 Given that our Resource entries in the DB are in 1:1 

Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

2015-01-08 Thread Anant Patil
On 16-Dec-14 09:41, Zane Bitter wrote:
 On 15/12/14 09:32, Anant Patil wrote:
 On 12-Dec-14 06:29, Zane Bitter wrote:
 On 11/12/14 01:14, Anant Patil wrote:
 On 04-Dec-14 10:49, Zane Bitter wrote:
 On 01/12/14 02:02, Anant Patil wrote:
 On GitHub:https://github.com/anantpatil/heat-convergence-poc

 I'm trying to review this code at the moment, and finding some stuff I
 don't understand:

 https://github.com/anantpatil/heat-convergence-poc/blob/master/heat/engine/stack.py#L911-L916

 This appears to loop through all of the resources *prior* to kicking off
 any actual updates to check if the resource will change. This is
 impossible to do in general, since a resource may obtain a property
 value from an attribute of another resource and there is no way to know
 whether an update to said other resource would cause a change in the
 attribute value.

 In addition, no attempt to catch UpdateReplace is made. Although that
 looks like a simple fix, I'm now worried about the level to which this
 code has been tested.

 We were working on new branch and as we discussed on Skype, we have
 handled all these cases. Please have a look at our current branch:
 https://github.com/anantpatil/heat-convergence-poc/tree/graph-version

 When a new resource is taken for convergence, its children are loaded
 and the resource definition is re-parsed. The frozen resource definition
 will have all the get_attr resolved.


 I'm also trying to wrap my head around how resources are cleaned up in
 dependency order. If I understand correctly, you store in the
 ResourceGraph table the dependencies between various resource names in
 the current template (presumably there could also be some left around
 from previous templates too?). For each resource name there may be a
 number of rows in the Resource table, each with an incrementing version.
 As far as I can tell though, there's nowhere that the dependency graph
 for _previous_ templates is persisted? So if the dependency order
 changes in the template we have no way of knowing the correct order to
 clean up in any more? (There's not even a mechanism to associate a
 resource version with a particular template, which might be one avenue
 by which to recover the dependencies.)

 I think this is an important case we need to be able to handle, so I
 added a scenario to my test framework to exercise it and discovered that
 my implementation was also buggy. Here's the fix:
 https://github.com/zaneb/heat-convergence-prototype/commit/786f367210ca0acf9eb22bea78fd9d51941b0e40


 Thanks for pointing this out Zane. We too had a buggy implementation for
 handling inverted dependency. I had a hard look at our algorithm where
 we were continuously merging the edges from new template into the edges
 from previous updates. It was an optimized way of traversing the graph
 in both forward and reverse order with out missing any resources. But,
 when the dependencies are inverted,  this wouldn't work.

 We have changed our algorithm. The changes in edges are noted down in
 DB, only the delta of edges from previous template is calculated and
 kept. At any given point of time, the graph table has all the edges from
 current template and delta from previous templates. Each edge has
 template ID associated with it.

 The thing is, the cleanup dependencies aren't really about the template.
 The real resources really depend on other real resources. You can't
 delete a Volume before its VolumeAttachment, not because it says so in
 the template but because it will fail if you try. The template can give
 us a rough guide in advance to what those dependencies will be, but if
 that's all we keep then we are discarding information.

 There may be multiple versions of a resource corresponding to one
 template version. Even worse, the actual dependencies of a resource
 change on a smaller time scale than an entire stack update (this is the
 reason the current implementation updates the template one resource at a
 time as we go).


 Absolutely! The edges from the template are kept only for the reference
 purposes. When we have a resource in new template, its template ID will
 also be marked to current template. At any point of time, realized
 resource will from current template, even if it were found in previous
 templates. The template ID moves for a resource if it is found.
 
 In theory (disclaimer: I didn't implement this yet) it can change on an 
 even smaller timescale than that. The existing plugins are something of 
 a black box to us: if a failure occurs we don't necessarily know whether 
 the real-world dependency is on the old or new version of another resource.
 

Yes, and that's why we rollback the failed resource and its dependent
resources to older versions, provided that, the older resources are not
deleted unless update is done. It is easier with template Id as we know
the previous complete template.

 Given that our Resource entries in the DB are in 1:1 correspondence with
 actual resources (we create a 

Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

2014-12-18 Thread Gurjar, Unmesh

 -Original Message-
 From: Zane Bitter [mailto:zbit...@redhat.com]
 Sent: Thursday, December 18, 2014 7:42 AM
 To: openstack-dev@lists.openstack.org
 Subject: Re: [openstack-dev] [Heat] Convergence proof-of-concept
 showdown
 
 On 17/12/14 13:05, Gurjar, Unmesh wrote:
  I'm storing a tuple of its name and database ID. The data structure
  is resource.GraphKey. I was originally using the name for something,
  but I suspect I could probably drop it now and just store the
  database ID, but I haven't tried it yet. (Having the name in there
  definitely makes debugging more pleasant though ;)
 
 
  I agree, having name might come in handy while debugging!
 
  When I build the traversal graph each node is a tuple of the GraphKey
  and a boolean to indicate whether it corresponds to an update or a
  cleanup operation (both can appear for a single resource in the same
 graph).
 
  Just to confirm my understanding, cleanup operation takes care of both:
  1. resources which are deleted as a part of update and 2. previous
  versioned resource which was updated by replacing with a new resource
  (UpdateReplace scenario)
 
 Yes, correct. Also:
 
 3. resource versions which failed to delete for whatever reason on a previous
 traversal
 
  Also, the cleanup operation is performed after the update completes
 successfully.
 
 NO! They are not separate things!
 
 https://github.com/openstack/heat/blob/stable/juno/heat/engine/update.
 py#L177-L198
 
  If I am correct, you are updating all resources on update regardless
  of their change which will be inefficient if stack contains a million
 resource.
 
  I'm calling update() on all resources regardless of change, but
  update() will only call handle_update() if something has changed
  (unless the plugin has overridden Resource._needs_update()).
 
  There's no way to know whether a resource needs to be updated before
  you're ready to update it, so I don't think of this as 'inefficient', just
 'correct'.
 
  We have similar questions regarding other areas in your
  implementation, which we believe if we understand the outline of
  your implementation. It is difficult to get a hold on your approach
  just by looking
  at code. Docs strings / Etherpad will help.
 
 
  About streams, Yes in a million resource stack, the data will be
  huge, but
  less than template.
 
  No way, it's O(n^3) (cubed!) in the worst case to store streams for
  each resource.
 
  Also this stream is stored
  only In IN_PROGRESS resources.
 
  Now I'm really confused. Where does it come from if the resource
  doesn't get it until it's already in progress? And how will that 
  information
 help it?
 
 
  When an operation on stack is initiated, the stream will be identified.
 
 OK, this may be one of the things I was getting confused about - I though a
 'stream' belonged to one particular resource and just contained all of the
 paths to reaching that resource. But here it seems like you're saying that a
 'stream' is a representation of the entire graph?
 So it's essentially just a gratuitously bloated NIH serialisation of the
 Dependencies graph?
 
  To begin
  the operation, the action is initiated on the leaf (or root)
  resource(s) and the stream is stored (only) in this/these IN_PROGRESS
 resource(s).
 
 How does that work? Does it get deleted again when the resource moves to
 COMPLETE?
 

Yes, IMO, upon resource completion, the stream can be deleted. I do not
foresee any situation where-in the storing the stream is required.
Since, when another operation is initiated on the stack, that template should
be parsed and the new stream should be identified and used.

  The stream should then keep getting passed to the next/previous level
  of resource(s) as and when the dependencies for the next/previous level
 of resource(s) are met.
 
 That sounds... identical to the way it's implemented in my prototype (passing
 a serialisation of the graph down through the notification triggers), except 
 for
 the part about storing it in the Resource table.
 Why would we persist to the database data that we only need for the
 duration that we already have it in memory anyway?
 

Earlier we thought of passing it along while initiating the next level of 
resource(s).
However, for the million resource stack, it will be quite large and passing it
around will be inefficient. So, we intended to have it stored in database.

Also, it can be used for resuming a stack operation when the processing engine 
goes
down and another engine has to resume that stack operation.

 If we're going to persist it we should do so once, in the Stack table, at the
 time that we're preparing to start the traversal.
 
  The reason to have entire dependency list to reduce DB queries while
  a
  stack update.
 
  But we never need to know that. We only need to know what just
  happened and what to do next.
 
 
  As mentioned earlier, each level of resources in a graph pass on the
  dependency list/stream to their next/previous level

Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

2014-12-18 Thread Clint Byrum
Excerpts from Anant Patil's message of 2014-12-16 07:36:58 -0800:
 On 16-Dec-14 00:59, Clint Byrum wrote:
  Excerpts from Anant Patil's message of 2014-12-15 07:15:30 -0800:
  On 13-Dec-14 05:42, Zane Bitter wrote:
  On 12/12/14 05:29, Murugan, Visnusaran wrote:
 
 
  -Original Message-
  From: Zane Bitter [mailto:zbit...@redhat.com]
  Sent: Friday, December 12, 2014 6:37 AM
  To: openstack-dev@lists.openstack.org
  Subject: Re: [openstack-dev] [Heat] Convergence proof-of-concept
  showdown
 
  On 11/12/14 08:26, Murugan, Visnusaran wrote:
  [Murugan, Visnusaran]
  In case of rollback where we have to cleanup earlier version of
  resources,
  we could get the order from old template. We'd prefer not to have a
  graph table.
 
  In theory you could get it by keeping old templates around. But that
  means keeping a lot of templates, and it will be hard to keep track
  of when you want to delete them. It also means that when starting an
  update you'll need to load every existing previous version of the
  template in order to calculate the dependencies. It also leaves the
  dependencies in an ambiguous state when a resource fails, and
  although that can be worked around it will be a giant pain to 
  implement.
 
 
  Agree that looking to all templates for a delete is not good. But
  baring Complexity, we feel we could achieve it by way of having an
  update and a delete stream for a stack update operation. I will
  elaborate in detail in the etherpad sometime tomorrow :)
 
  I agree that I'd prefer not to have a graph table. After trying a
  couple of different things I decided to store the dependencies in the
  Resource table, where we can read or write them virtually for free
  because it turns out that we are always reading or updating the
  Resource itself at exactly the same time anyway.
 
 
  Not sure how this will work in an update scenario when a resource does
  not change and its dependencies do.
 
  We'll always update the requirements, even when the properties don't
  change.
 
 
  Can you elaborate a bit on rollback.
 
  I didn't do anything special to handle rollback. It's possible that we 
  need to - obviously the difference in the UpdateReplace + rollback case 
  is that the replaced resource is now the one we want to keep, and yet 
  the replaced_by/replaces dependency will force the newer (replacement) 
  resource to be checked for deletion first, which is an inversion of the 
  usual order.
 
 
  This is where the version is so handy! For UpdateReplaced ones, there is
  an older version to go back to. This version could just be template ID,
  as I mentioned in another e-mail. All resources are at the current
  template ID if they are found in the current template, even if they is
  no need to update them. Otherwise, they need to be cleaned-up in the
  order given in the previous templates.
 
  I think the template ID is used as version as far as I can see in Zane's
  PoC. If the resource template key doesn't match the current template
  key, the resource is deleted. The version is misnomer here, but that
  field (template id) is used as though we had versions of resources.
 
  However, I tried to think of a scenario where that would cause problems 
  and I couldn't come up with one. Provided we know the actual, real-world 
  dependencies of each resource I don't think the ordering of those two 
  checks matters.
 
  In fact, I currently can't think of a case where the dependency order 
  between replacement and replaced resources matters at all. It matters in 
  the current Heat implementation because resources are artificially 
  segmented into the current and backup stacks, but with a holistic view 
  of dependencies that may well not be required. I tried taking that line 
  out of the simulator code and all the tests still passed. If anybody can 
  think of a scenario in which it would make a difference, I would be very 
  interested to hear it.
 
  In any event though, it should be no problem to reverse the direction of 
  that one edge in these particular circumstances if it does turn out to 
  be a problem.
 
  We had an approach with depends_on
  and needed_by columns in ResourceTable. But dropped it when we figured 
  out
  we had too many DB operations for Update.
 
  Yeah, I initially ran into this problem too - you have a bunch of nodes 
  that are waiting on the current node, and now you have to go look them 
  all up in the database to see what else they're waiting on in order to 
  tell if they're ready to be triggered.
 
  It turns out the answer is to distribute the writes but centralise the 
  reads. So at the start of the update, we read all of the Resources, 
  obtain their dependencies and build one central graph[1]. We than make 
  that graph available to each resource (either by passing it as a 
  notification parameter, or storing it somewhere central in the DB that 
  they will all have to read anyway, i.e. the Stack). But when we update a 
  dependency we

Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

2014-12-18 Thread Zane Bitter

On 15/12/14 07:47, Murugan, Visnusaran wrote:

We have similar questions regarding other
areas in your implementation, which we believe if we understand the outline of 
your implementation. It is difficult to get
a hold on your approach just by looking at code. Docs strings / Etherpad will 
help.


I added a bunch of extra docstrings and comments:

https://github.com/zaneb/heat-convergence-prototype/commit/5d79e009196dc224bd588e19edef5f0939b04607

I also implemented a --pdb option that will automatically set 
breakpoints at the beginning of all of the asynchronous events, so that 
you'll be dropped into the debugger and can single-step through the 
code, look at variables and so on:


https://github.com/zaneb/heat-convergence-prototype/commit/2a7a56dde21cad979fae25acc9fb01c6b4d9c6f7

I hope that helps. If you have more questions, please feel free to ask.

cheers,
Zane.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

2014-12-17 Thread Gurjar, Unmesh
 -Original Message-
 From: Zane Bitter [mailto:zbit...@redhat.com]
 Sent: Tuesday, December 16, 2014 9:43 AM
 To: openstack-dev@lists.openstack.org
 Subject: Re: [openstack-dev] [Heat] Convergence proof-of-concept
 showdown
 
 On 15/12/14 07:47, Murugan, Visnusaran wrote:
  Hi Zane,
 
  We have been going through this chain for quite some time now and we
 still feel a disconnect in our understanding.
 
 Yes, I thought last week that we were on the same page, but now it looks like
 we're drifting off again :(
 
  Can you put up a etherpad where we can understand your approach.
 
 Maybe you could put up an etherpad with your questions. Practically all of
 the relevant code is in Stack._create_or_update, Stack._dependencies and
 Converger.check_resource. That's 134 lines of code by my count.
 There's not a lot more I can usefully say about it without knowing which parts
 exactly you're stuck on, but I can definitely answer specific questions.
 
  For example: for storing resource dependencies, Are you storing its
  name, version tuple or just its ID.
 
 I'm storing a tuple of its name and database ID. The data structure is
 resource.GraphKey. I was originally using the name for something, but I
 suspect I could probably drop it now and just store the database ID, but I
 haven't tried it yet. (Having the name in there definitely makes debugging
 more pleasant though ;)
 

I agree, having name might come in handy while debugging!

 When I build the traversal graph each node is a tuple of the GraphKey and a
 boolean to indicate whether it corresponds to an update or a cleanup
 operation (both can appear for a single resource in the same graph).

Just to confirm my understanding, cleanup operation takes care of both:
1. resources which are deleted as a part of update and
2. previous versioned resource which was updated by replacing with a new 
resource (UpdateReplace scenario)
Also, the cleanup operation is performed after the update completes 
successfully. 

 
  If I am correct, you are updating all resources on update regardless
  of their change which will be inefficient if stack contains a million 
  resource.
 
 I'm calling update() on all resources regardless of change, but update() will
 only call handle_update() if something has changed (unless the plugin has
 overridden Resource._needs_update()).
 
 There's no way to know whether a resource needs to be updated before
 you're ready to update it, so I don't think of this as 'inefficient', just 
 'correct'.
 
  We have similar questions regarding other areas in your
  implementation, which we believe if we understand the outline of your
  implementation. It is difficult to get a hold on your approach just by 
  looking
 at code. Docs strings / Etherpad will help.
 
 
  About streams, Yes in a million resource stack, the data will be huge, but
 less than template.
 
 No way, it's O(n^3) (cubed!) in the worst case to store streams for each
 resource.
 
  Also this stream is stored
  only In IN_PROGRESS resources.
 
 Now I'm really confused. Where does it come from if the resource doesn't
 get it until it's already in progress? And how will that information help it?
 

When an operation on stack is initiated, the stream will be identified. To begin
the operation, the action is initiated on the leaf (or root) resource(s) and the
stream is stored (only) in this/these IN_PROGRESS resource(s).
The stream should then keep getting passed to the next/previous level of 
resource(s) as
and when the dependencies for the next/previous level of resource(s) are met.

  The reason to have entire dependency list to reduce DB queries while a
 stack update.
 
 But we never need to know that. We only need to know what just happened
 and what to do next.
 

As mentioned earlier, each level of resources in a graph pass on the dependency
list/stream to their next/previous level of resources. This is information 
should further
be used to determine what is to be done next and drive the operation to 
completion.

  When you have a singular dependency on each resources similar to your
  implantation, then we will end up loading Dependencies one at a time and
 altering almost all resource's dependency regardless of their change.
 
  Regarding a 2 template approach for delete, it is not actually 2
  different templates. Its just that we have a delete stream To be taken up
 post update.
 
 That would be a regression from Heat's current behaviour, where we start
 cleaning up resources as soon as they have nothing depending on them.
 There's not even a reason to make it worse than what we already have,
 because it's actually a lot _easier_ to treat update and clean up as the same
 kind of operation and throw both into the same big graph. The dual
 implementations and all of the edge cases go away and you can just trust in
 the graph traversal to do the Right Thing in the most parallel way possible.
 
  (Any post operation will be handled as an update) This approach is
  True

Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

2014-12-17 Thread Zane Bitter

On 17/12/14 13:05, Gurjar, Unmesh wrote:

I'm storing a tuple of its name and database ID. The data structure is
resource.GraphKey. I was originally using the name for something, but I
suspect I could probably drop it now and just store the database ID, but I
haven't tried it yet. (Having the name in there definitely makes debugging
more pleasant though ;)



I agree, having name might come in handy while debugging!


When I build the traversal graph each node is a tuple of the GraphKey and a
boolean to indicate whether it corresponds to an update or a cleanup
operation (both can appear for a single resource in the same graph).


Just to confirm my understanding, cleanup operation takes care of both:
1. resources which are deleted as a part of update and
2. previous versioned resource which was updated by replacing with a new
resource (UpdateReplace scenario)


Yes, correct. Also:

3. resource versions which failed to delete for whatever reason on a 
previous traversal



Also, the cleanup operation is performed after the update completes 
successfully.


NO! They are not separate things!

https://github.com/openstack/heat/blob/stable/juno/heat/engine/update.py#L177-L198


If I am correct, you are updating all resources on update regardless
of their change which will be inefficient if stack contains a million resource.


I'm calling update() on all resources regardless of change, but update() will
only call handle_update() if something has changed (unless the plugin has
overridden Resource._needs_update()).

There's no way to know whether a resource needs to be updated before
you're ready to update it, so I don't think of this as 'inefficient', just 
'correct'.


We have similar questions regarding other areas in your
implementation, which we believe if we understand the outline of your
implementation. It is difficult to get a hold on your approach just by looking

at code. Docs strings / Etherpad will help.



About streams, Yes in a million resource stack, the data will be huge, but

less than template.

No way, it's O(n^3) (cubed!) in the worst case to store streams for each
resource.


Also this stream is stored
only In IN_PROGRESS resources.


Now I'm really confused. Where does it come from if the resource doesn't
get it until it's already in progress? And how will that information help it?



When an operation on stack is initiated, the stream will be identified.


OK, this may be one of the things I was getting confused about - I 
though a 'stream' belonged to one particular resource and just contained 
all of the paths to reaching that resource. But here it seems like 
you're saying that a 'stream' is a representation of the entire graph? 
So it's essentially just a gratuitously bloated NIH serialisation of the 
Dependencies graph?



To begin
the operation, the action is initiated on the leaf (or root) resource(s) and the
stream is stored (only) in this/these IN_PROGRESS resource(s).


How does that work? Does it get deleted again when the resource moves to 
COMPLETE?



The stream should then keep getting passed to the next/previous level of 
resource(s) as
and when the dependencies for the next/previous level of resource(s) are met.


That sounds... identical to the way it's implemented in my prototype 
(passing a serialisation of the graph down through the notification 
triggers), except for the part about storing it in the Resource table. 
Why would we persist to the database data that we only need for the 
duration that we already have it in memory anyway?


If we're going to persist it we should do so once, in the Stack table, 
at the time that we're preparing to start the traversal.



The reason to have entire dependency list to reduce DB queries while a

stack update.

But we never need to know that. We only need to know what just happened
and what to do next.



As mentioned earlier, each level of resources in a graph pass on the dependency
list/stream to their next/previous level of resources. This is information 
should further
be used to determine what is to be done next and drive the operation to 
completion.


Why would we store *and* forward?


When you have a singular dependency on each resources similar to your
implantation, then we will end up loading Dependencies one at a time and

altering almost all resource's dependency regardless of their change.


Regarding a 2 template approach for delete, it is not actually 2
different templates. Its just that we have a delete stream To be taken up

post update.

That would be a regression from Heat's current behaviour, where we start
cleaning up resources as soon as they have nothing depending on them.
There's not even a reason to make it worse than what we already have,
because it's actually a lot _easier_ to treat update and clean up as the same
kind of operation and throw both into the same big graph. The dual
implementations and all of the edge cases go away and you can just trust in
the graph traversal to do the Right Thing in 

Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

2014-12-16 Thread Anant Patil
On 16-Dec-14 00:59, Clint Byrum wrote:
 Excerpts from Anant Patil's message of 2014-12-15 07:15:30 -0800:
 On 13-Dec-14 05:42, Zane Bitter wrote:
 On 12/12/14 05:29, Murugan, Visnusaran wrote:


 -Original Message-
 From: Zane Bitter [mailto:zbit...@redhat.com]
 Sent: Friday, December 12, 2014 6:37 AM
 To: openstack-dev@lists.openstack.org
 Subject: Re: [openstack-dev] [Heat] Convergence proof-of-concept
 showdown

 On 11/12/14 08:26, Murugan, Visnusaran wrote:
 [Murugan, Visnusaran]
 In case of rollback where we have to cleanup earlier version of
 resources,
 we could get the order from old template. We'd prefer not to have a
 graph table.

 In theory you could get it by keeping old templates around. But that
 means keeping a lot of templates, and it will be hard to keep track
 of when you want to delete them. It also means that when starting an
 update you'll need to load every existing previous version of the
 template in order to calculate the dependencies. It also leaves the
 dependencies in an ambiguous state when a resource fails, and
 although that can be worked around it will be a giant pain to implement.


 Agree that looking to all templates for a delete is not good. But
 baring Complexity, we feel we could achieve it by way of having an
 update and a delete stream for a stack update operation. I will
 elaborate in detail in the etherpad sometime tomorrow :)

 I agree that I'd prefer not to have a graph table. After trying a
 couple of different things I decided to store the dependencies in the
 Resource table, where we can read or write them virtually for free
 because it turns out that we are always reading or updating the
 Resource itself at exactly the same time anyway.


 Not sure how this will work in an update scenario when a resource does
 not change and its dependencies do.

 We'll always update the requirements, even when the properties don't
 change.


 Can you elaborate a bit on rollback.

 I didn't do anything special to handle rollback. It's possible that we 
 need to - obviously the difference in the UpdateReplace + rollback case 
 is that the replaced resource is now the one we want to keep, and yet 
 the replaced_by/replaces dependency will force the newer (replacement) 
 resource to be checked for deletion first, which is an inversion of the 
 usual order.


 This is where the version is so handy! For UpdateReplaced ones, there is
 an older version to go back to. This version could just be template ID,
 as I mentioned in another e-mail. All resources are at the current
 template ID if they are found in the current template, even if they is
 no need to update them. Otherwise, they need to be cleaned-up in the
 order given in the previous templates.

 I think the template ID is used as version as far as I can see in Zane's
 PoC. If the resource template key doesn't match the current template
 key, the resource is deleted. The version is misnomer here, but that
 field (template id) is used as though we had versions of resources.

 However, I tried to think of a scenario where that would cause problems 
 and I couldn't come up with one. Provided we know the actual, real-world 
 dependencies of each resource I don't think the ordering of those two 
 checks matters.

 In fact, I currently can't think of a case where the dependency order 
 between replacement and replaced resources matters at all. It matters in 
 the current Heat implementation because resources are artificially 
 segmented into the current and backup stacks, but with a holistic view 
 of dependencies that may well not be required. I tried taking that line 
 out of the simulator code and all the tests still passed. If anybody can 
 think of a scenario in which it would make a difference, I would be very 
 interested to hear it.

 In any event though, it should be no problem to reverse the direction of 
 that one edge in these particular circumstances if it does turn out to 
 be a problem.

 We had an approach with depends_on
 and needed_by columns in ResourceTable. But dropped it when we figured out
 we had too many DB operations for Update.

 Yeah, I initially ran into this problem too - you have a bunch of nodes 
 that are waiting on the current node, and now you have to go look them 
 all up in the database to see what else they're waiting on in order to 
 tell if they're ready to be triggered.

 It turns out the answer is to distribute the writes but centralise the 
 reads. So at the start of the update, we read all of the Resources, 
 obtain their dependencies and build one central graph[1]. We than make 
 that graph available to each resource (either by passing it as a 
 notification parameter, or storing it somewhere central in the DB that 
 they will all have to read anyway, i.e. the Stack). But when we update a 
 dependency we don't update the central graph, we update the individual 
 Resource so there's no global lock required.

 [1] 
 https://github.com/zaneb/heat-convergence-prototype/blob

Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

2014-12-15 Thread Murugan, Visnusaran
Hi Zane,

We have been going through this chain for quite some time now and we still feel 
a disconnect in our understanding. 
Can you put up a etherpad where we can understand your approach. For example: 
for storing resource dependencies,
Are you storing its name, version tuple or just its ID. If I am correct, you 
are updating all resources on update regardless 
of their change which will be inefficient if stack contains a million resource. 
We have similar questions regarding other
areas in your implementation, which we believe if we understand the outline of 
your implementation. It is difficult to get
a hold on your approach just by looking at code. Docs strings / Etherpad will 
help.


About streams, Yes in a million resource stack, the data will be huge, but less 
than template. Also this stream is stored
only In IN_PROGRESS resources. The reason to have entire dependency list to 
reduce DB queries while a stack update.
When you have a singular dependency on each resources similar to your 
implantation, then we will end up loading 
Dependencies one at a time and altering almost all resource's dependency 
regardless of their change.

Regarding a 2 template approach for delete, it is not actually 2 different 
templates. Its just that we have a delete stream
To be taken up post update. (Any post operation will be handled as an update) 
This approach is True when Rollback==True
We can always fall back to regular stream (non-delete stream) if Rollback=False

In our view we would like to have only one basic operation and that is UPDATE.

1. CREATE will be an update where a realized graph == Empty
2. UPDATE will be an update where realized graph == Full/Partial realized 
(possibly with a delete stream as post operation if Rollback==True)
3. DELETE will be just another update with an empty to_be_realized_graph.

It would be great if we can freeze a stable approach by mid-week as Christmas 
vacations are round the corner.  :) :)

 -Original Message-
 From: Zane Bitter [mailto:zbit...@redhat.com]
 Sent: Saturday, December 13, 2014 5:43 AM
 To: openstack-dev@lists.openstack.org
 Subject: Re: [openstack-dev] [Heat] Convergence proof-of-concept
 showdown
 
 On 12/12/14 05:29, Murugan, Visnusaran wrote:
 
 
  -Original Message-
  From: Zane Bitter [mailto:zbit...@redhat.com]
  Sent: Friday, December 12, 2014 6:37 AM
  To: openstack-dev@lists.openstack.org
  Subject: Re: [openstack-dev] [Heat] Convergence proof-of-concept
  showdown
 
  On 11/12/14 08:26, Murugan, Visnusaran wrote:
  [Murugan, Visnusaran]
  In case of rollback where we have to cleanup earlier version of
  resources,
  we could get the order from old template. We'd prefer not to have a
  graph table.
 
  In theory you could get it by keeping old templates around. But
  that means keeping a lot of templates, and it will be hard to keep
  track of when you want to delete them. It also means that when
  starting an update you'll need to load every existing previous
  version of the template in order to calculate the dependencies. It
  also leaves the dependencies in an ambiguous state when a resource
  fails, and although that can be worked around it will be a giant pain to
 implement.
 
 
  Agree that looking to all templates for a delete is not good. But
  baring Complexity, we feel we could achieve it by way of having an
  update and a delete stream for a stack update operation. I will
  elaborate in detail in the etherpad sometime tomorrow :)
 
  I agree that I'd prefer not to have a graph table. After trying a
  couple of different things I decided to store the dependencies in
  the Resource table, where we can read or write them virtually for
  free because it turns out that we are always reading or updating
  the Resource itself at exactly the same time anyway.
 
 
  Not sure how this will work in an update scenario when a resource
  does not change and its dependencies do.
 
  We'll always update the requirements, even when the properties don't
  change.
 
 
  Can you elaborate a bit on rollback.
 
 I didn't do anything special to handle rollback. It's possible that we need 
 to -
 obviously the difference in the UpdateReplace + rollback case is that the
 replaced resource is now the one we want to keep, and yet the
 replaced_by/replaces dependency will force the newer (replacement)
 resource to be checked for deletion first, which is an inversion of the usual
 order.
 
 However, I tried to think of a scenario where that would cause problems and
 I couldn't come up with one. Provided we know the actual, real-world
 dependencies of each resource I don't think the ordering of those two
 checks matters.
 
 In fact, I currently can't think of a case where the dependency order
 between replacement and replaced resources matters at all. It matters in the
 current Heat implementation because resources are artificially segmented
 into the current and backup stacks, but with a holistic view of dependencies
 that may well

Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

2014-12-15 Thread Anant Patil
On 12-Dec-14 06:29, Zane Bitter wrote:
 On 11/12/14 01:14, Anant Patil wrote:
 On 04-Dec-14 10:49, Zane Bitter wrote:
 On 01/12/14 02:02, Anant Patil wrote:
 On GitHub:https://github.com/anantpatil/heat-convergence-poc

 I'm trying to review this code at the moment, and finding some stuff I
 don't understand:

 https://github.com/anantpatil/heat-convergence-poc/blob/master/heat/engine/stack.py#L911-L916

 This appears to loop through all of the resources *prior* to kicking off
 any actual updates to check if the resource will change. This is
 impossible to do in general, since a resource may obtain a property
 value from an attribute of another resource and there is no way to know
 whether an update to said other resource would cause a change in the
 attribute value.

 In addition, no attempt to catch UpdateReplace is made. Although that
 looks like a simple fix, I'm now worried about the level to which this
 code has been tested.

 We were working on new branch and as we discussed on Skype, we have
 handled all these cases. Please have a look at our current branch:
 https://github.com/anantpatil/heat-convergence-poc/tree/graph-version

 When a new resource is taken for convergence, its children are loaded
 and the resource definition is re-parsed. The frozen resource definition
 will have all the get_attr resolved.


 I'm also trying to wrap my head around how resources are cleaned up in
 dependency order. If I understand correctly, you store in the
 ResourceGraph table the dependencies between various resource names in
 the current template (presumably there could also be some left around
 from previous templates too?). For each resource name there may be a
 number of rows in the Resource table, each with an incrementing version.
 As far as I can tell though, there's nowhere that the dependency graph
 for _previous_ templates is persisted? So if the dependency order
 changes in the template we have no way of knowing the correct order to
 clean up in any more? (There's not even a mechanism to associate a
 resource version with a particular template, which might be one avenue
 by which to recover the dependencies.)

 I think this is an important case we need to be able to handle, so I
 added a scenario to my test framework to exercise it and discovered that
 my implementation was also buggy. Here's the fix:
 https://github.com/zaneb/heat-convergence-prototype/commit/786f367210ca0acf9eb22bea78fd9d51941b0e40


 Thanks for pointing this out Zane. We too had a buggy implementation for
 handling inverted dependency. I had a hard look at our algorithm where
 we were continuously merging the edges from new template into the edges
 from previous updates. It was an optimized way of traversing the graph
 in both forward and reverse order with out missing any resources. But,
 when the dependencies are inverted,  this wouldn't work.

 We have changed our algorithm. The changes in edges are noted down in
 DB, only the delta of edges from previous template is calculated and
 kept. At any given point of time, the graph table has all the edges from
 current template and delta from previous templates. Each edge has
 template ID associated with it.
 
 The thing is, the cleanup dependencies aren't really about the template. 
 The real resources really depend on other real resources. You can't 
 delete a Volume before its VolumeAttachment, not because it says so in 
 the template but because it will fail if you try. The template can give 
 us a rough guide in advance to what those dependencies will be, but if 
 that's all we keep then we are discarding information.
 
 There may be multiple versions of a resource corresponding to one 
 template version. Even worse, the actual dependencies of a resource 
 change on a smaller time scale than an entire stack update (this is the 
 reason the current implementation updates the template one resource at a 
 time as we go).
 

Absolutely! The edges from the template are kept only for the reference
purposes. When we have a resource in new template, its template ID will
also be marked to current template. At any point of time, realized
resource will from current template, even if it were found in previous
templates. The template ID moves for a resource if it is found.

 Given that our Resource entries in the DB are in 1:1 correspondence with 
 actual resources (we create a new one whenever we need to replace the 
 underlying resource), I found it makes the most conceptual and practical 
 sense to store the requirements in the resource itself, and update them 
 at the time they actually change in the real world (bonus: introduces no 
 new locking issues and no extra DB writes). I settled on this after a 
 legitimate attempt at trying other options, but they didn't work out: 
 https://github.com/zaneb/heat-convergence-prototype/commit/a62958342e8583f74e2aca90f6239ad457ba984d
 

I am okay with the notion of graph stored in resource table.

 For resource clean up, we start from the
 first 

Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

2014-12-15 Thread Anant Patil
On 13-Dec-14 05:42, Zane Bitter wrote:
 On 12/12/14 05:29, Murugan, Visnusaran wrote:


 -Original Message-
 From: Zane Bitter [mailto:zbit...@redhat.com]
 Sent: Friday, December 12, 2014 6:37 AM
 To: openstack-dev@lists.openstack.org
 Subject: Re: [openstack-dev] [Heat] Convergence proof-of-concept
 showdown

 On 11/12/14 08:26, Murugan, Visnusaran wrote:
 [Murugan, Visnusaran]
 In case of rollback where we have to cleanup earlier version of
 resources,
 we could get the order from old template. We'd prefer not to have a
 graph table.

 In theory you could get it by keeping old templates around. But that
 means keeping a lot of templates, and it will be hard to keep track
 of when you want to delete them. It also means that when starting an
 update you'll need to load every existing previous version of the
 template in order to calculate the dependencies. It also leaves the
 dependencies in an ambiguous state when a resource fails, and
 although that can be worked around it will be a giant pain to implement.


 Agree that looking to all templates for a delete is not good. But
 baring Complexity, we feel we could achieve it by way of having an
 update and a delete stream for a stack update operation. I will
 elaborate in detail in the etherpad sometime tomorrow :)

 I agree that I'd prefer not to have a graph table. After trying a
 couple of different things I decided to store the dependencies in the
 Resource table, where we can read or write them virtually for free
 because it turns out that we are always reading or updating the
 Resource itself at exactly the same time anyway.


 Not sure how this will work in an update scenario when a resource does
 not change and its dependencies do.

 We'll always update the requirements, even when the properties don't
 change.


 Can you elaborate a bit on rollback.
 
 I didn't do anything special to handle rollback. It's possible that we 
 need to - obviously the difference in the UpdateReplace + rollback case 
 is that the replaced resource is now the one we want to keep, and yet 
 the replaced_by/replaces dependency will force the newer (replacement) 
 resource to be checked for deletion first, which is an inversion of the 
 usual order.
 

This is where the version is so handy! For UpdateReplaced ones, there is
an older version to go back to. This version could just be template ID,
as I mentioned in another e-mail. All resources are at the current
template ID if they are found in the current template, even if they is
no need to update them. Otherwise, they need to be cleaned-up in the
order given in the previous templates.

I think the template ID is used as version as far as I can see in Zane's
PoC. If the resource template key doesn't match the current template
key, the resource is deleted. The version is misnomer here, but that
field (template id) is used as though we had versions of resources.

 However, I tried to think of a scenario where that would cause problems 
 and I couldn't come up with one. Provided we know the actual, real-world 
 dependencies of each resource I don't think the ordering of those two 
 checks matters.
 
 In fact, I currently can't think of a case where the dependency order 
 between replacement and replaced resources matters at all. It matters in 
 the current Heat implementation because resources are artificially 
 segmented into the current and backup stacks, but with a holistic view 
 of dependencies that may well not be required. I tried taking that line 
 out of the simulator code and all the tests still passed. If anybody can 
 think of a scenario in which it would make a difference, I would be very 
 interested to hear it.
 
 In any event though, it should be no problem to reverse the direction of 
 that one edge in these particular circumstances if it does turn out to 
 be a problem.
 
 We had an approach with depends_on
 and needed_by columns in ResourceTable. But dropped it when we figured out
 we had too many DB operations for Update.
 
 Yeah, I initially ran into this problem too - you have a bunch of nodes 
 that are waiting on the current node, and now you have to go look them 
 all up in the database to see what else they're waiting on in order to 
 tell if they're ready to be triggered.
 
 It turns out the answer is to distribute the writes but centralise the 
 reads. So at the start of the update, we read all of the Resources, 
 obtain their dependencies and build one central graph[1]. We than make 
 that graph available to each resource (either by passing it as a 
 notification parameter, or storing it somewhere central in the DB that 
 they will all have to read anyway, i.e. the Stack). But when we update a 
 dependency we don't update the central graph, we update the individual 
 Resource so there's no global lock required.
 
 [1] 
 https://github.com/zaneb/heat-convergence-prototype/blob/distributed-graph/converge/stack.py#L166-L168
 

A centralized graph and decision making will make

Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

2014-12-15 Thread Clint Byrum
Excerpts from Anant Patil's message of 2014-12-15 07:15:30 -0800:
 On 13-Dec-14 05:42, Zane Bitter wrote:
  On 12/12/14 05:29, Murugan, Visnusaran wrote:
 
 
  -Original Message-
  From: Zane Bitter [mailto:zbit...@redhat.com]
  Sent: Friday, December 12, 2014 6:37 AM
  To: openstack-dev@lists.openstack.org
  Subject: Re: [openstack-dev] [Heat] Convergence proof-of-concept
  showdown
 
  On 11/12/14 08:26, Murugan, Visnusaran wrote:
  [Murugan, Visnusaran]
  In case of rollback where we have to cleanup earlier version of
  resources,
  we could get the order from old template. We'd prefer not to have a
  graph table.
 
  In theory you could get it by keeping old templates around. But that
  means keeping a lot of templates, and it will be hard to keep track
  of when you want to delete them. It also means that when starting an
  update you'll need to load every existing previous version of the
  template in order to calculate the dependencies. It also leaves the
  dependencies in an ambiguous state when a resource fails, and
  although that can be worked around it will be a giant pain to implement.
 
 
  Agree that looking to all templates for a delete is not good. But
  baring Complexity, we feel we could achieve it by way of having an
  update and a delete stream for a stack update operation. I will
  elaborate in detail in the etherpad sometime tomorrow :)
 
  I agree that I'd prefer not to have a graph table. After trying a
  couple of different things I decided to store the dependencies in the
  Resource table, where we can read or write them virtually for free
  because it turns out that we are always reading or updating the
  Resource itself at exactly the same time anyway.
 
 
  Not sure how this will work in an update scenario when a resource does
  not change and its dependencies do.
 
  We'll always update the requirements, even when the properties don't
  change.
 
 
  Can you elaborate a bit on rollback.
  
  I didn't do anything special to handle rollback. It's possible that we 
  need to - obviously the difference in the UpdateReplace + rollback case 
  is that the replaced resource is now the one we want to keep, and yet 
  the replaced_by/replaces dependency will force the newer (replacement) 
  resource to be checked for deletion first, which is an inversion of the 
  usual order.
  
 
 This is where the version is so handy! For UpdateReplaced ones, there is
 an older version to go back to. This version could just be template ID,
 as I mentioned in another e-mail. All resources are at the current
 template ID if they are found in the current template, even if they is
 no need to update them. Otherwise, they need to be cleaned-up in the
 order given in the previous templates.
 
 I think the template ID is used as version as far as I can see in Zane's
 PoC. If the resource template key doesn't match the current template
 key, the resource is deleted. The version is misnomer here, but that
 field (template id) is used as though we had versions of resources.
 
  However, I tried to think of a scenario where that would cause problems 
  and I couldn't come up with one. Provided we know the actual, real-world 
  dependencies of each resource I don't think the ordering of those two 
  checks matters.
  
  In fact, I currently can't think of a case where the dependency order 
  between replacement and replaced resources matters at all. It matters in 
  the current Heat implementation because resources are artificially 
  segmented into the current and backup stacks, but with a holistic view 
  of dependencies that may well not be required. I tried taking that line 
  out of the simulator code and all the tests still passed. If anybody can 
  think of a scenario in which it would make a difference, I would be very 
  interested to hear it.
  
  In any event though, it should be no problem to reverse the direction of 
  that one edge in these particular circumstances if it does turn out to 
  be a problem.
  
  We had an approach with depends_on
  and needed_by columns in ResourceTable. But dropped it when we figured out
  we had too many DB operations for Update.
  
  Yeah, I initially ran into this problem too - you have a bunch of nodes 
  that are waiting on the current node, and now you have to go look them 
  all up in the database to see what else they're waiting on in order to 
  tell if they're ready to be triggered.
  
  It turns out the answer is to distribute the writes but centralise the 
  reads. So at the start of the update, we read all of the Resources, 
  obtain their dependencies and build one central graph[1]. We than make 
  that graph available to each resource (either by passing it as a 
  notification parameter, or storing it somewhere central in the DB that 
  they will all have to read anyway, i.e. the Stack). But when we update a 
  dependency we don't update the central graph, we update the individual 
  Resource so there's no global lock required.
  
  [1

Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

2014-12-15 Thread Zane Bitter

On 15/12/14 09:32, Anant Patil wrote:

On 12-Dec-14 06:29, Zane Bitter wrote:

On 11/12/14 01:14, Anant Patil wrote:

On 04-Dec-14 10:49, Zane Bitter wrote:

On 01/12/14 02:02, Anant Patil wrote:

On GitHub:https://github.com/anantpatil/heat-convergence-poc


I'm trying to review this code at the moment, and finding some stuff I
don't understand:

https://github.com/anantpatil/heat-convergence-poc/blob/master/heat/engine/stack.py#L911-L916

This appears to loop through all of the resources *prior* to kicking off
any actual updates to check if the resource will change. This is
impossible to do in general, since a resource may obtain a property
value from an attribute of another resource and there is no way to know
whether an update to said other resource would cause a change in the
attribute value.

In addition, no attempt to catch UpdateReplace is made. Although that
looks like a simple fix, I'm now worried about the level to which this
code has been tested.


We were working on new branch and as we discussed on Skype, we have
handled all these cases. Please have a look at our current branch:
https://github.com/anantpatil/heat-convergence-poc/tree/graph-version

When a new resource is taken for convergence, its children are loaded
and the resource definition is re-parsed. The frozen resource definition
will have all the get_attr resolved.



I'm also trying to wrap my head around how resources are cleaned up in
dependency order. If I understand correctly, you store in the
ResourceGraph table the dependencies between various resource names in
the current template (presumably there could also be some left around
from previous templates too?). For each resource name there may be a
number of rows in the Resource table, each with an incrementing version.
As far as I can tell though, there's nowhere that the dependency graph
for _previous_ templates is persisted? So if the dependency order
changes in the template we have no way of knowing the correct order to
clean up in any more? (There's not even a mechanism to associate a
resource version with a particular template, which might be one avenue
by which to recover the dependencies.)

I think this is an important case we need to be able to handle, so I
added a scenario to my test framework to exercise it and discovered that
my implementation was also buggy. Here's the fix:
https://github.com/zaneb/heat-convergence-prototype/commit/786f367210ca0acf9eb22bea78fd9d51941b0e40



Thanks for pointing this out Zane. We too had a buggy implementation for
handling inverted dependency. I had a hard look at our algorithm where
we were continuously merging the edges from new template into the edges
from previous updates. It was an optimized way of traversing the graph
in both forward and reverse order with out missing any resources. But,
when the dependencies are inverted,  this wouldn't work.

We have changed our algorithm. The changes in edges are noted down in
DB, only the delta of edges from previous template is calculated and
kept. At any given point of time, the graph table has all the edges from
current template and delta from previous templates. Each edge has
template ID associated with it.


The thing is, the cleanup dependencies aren't really about the template.
The real resources really depend on other real resources. You can't
delete a Volume before its VolumeAttachment, not because it says so in
the template but because it will fail if you try. The template can give
us a rough guide in advance to what those dependencies will be, but if
that's all we keep then we are discarding information.

There may be multiple versions of a resource corresponding to one
template version. Even worse, the actual dependencies of a resource
change on a smaller time scale than an entire stack update (this is the
reason the current implementation updates the template one resource at a
time as we go).



Absolutely! The edges from the template are kept only for the reference
purposes. When we have a resource in new template, its template ID will
also be marked to current template. At any point of time, realized
resource will from current template, even if it were found in previous
templates. The template ID moves for a resource if it is found.


In theory (disclaimer: I didn't implement this yet) it can change on an 
even smaller timescale than that. The existing plugins are something of 
a black box to us: if a failure occurs we don't necessarily know whether 
the real-world dependency is on the old or new version of another resource.



Given that our Resource entries in the DB are in 1:1 correspondence with
actual resources (we create a new one whenever we need to replace the
underlying resource), I found it makes the most conceptual and practical
sense to store the requirements in the resource itself, and update them
at the time they actually change in the real world (bonus: introduces no
new locking issues and no extra DB writes). I settled on this after a
legitimate attempt at 

Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

2014-12-15 Thread Zane Bitter

On 15/12/14 07:47, Murugan, Visnusaran wrote:

Hi Zane,

We have been going through this chain for quite some time now and we still feel 
a disconnect in our understanding.


Yes, I thought last week that we were on the same page, but now it looks 
like we're drifting off again :(



Can you put up a etherpad where we can understand your approach.


Maybe you could put up an etherpad with your questions. Practically all 
of the relevant code is in Stack._create_or_update, Stack._dependencies 
and Converger.check_resource. That's 134 lines of code by my count. 
There's not a lot more I can usefully say about it without knowing which 
parts exactly you're stuck on, but I can definitely answer specific 
questions.



For example: for storing resource dependencies,
Are you storing its name, version tuple or just its ID.


I'm storing a tuple of its name and database ID. The data structure is 
resource.GraphKey. I was originally using the name for something, but I 
suspect I could probably drop it now and just store the database ID, but 
I haven't tried it yet. (Having the name in there definitely makes 
debugging more pleasant though ;)


When I build the traversal graph each node is a tuple of the GraphKey 
and a boolean to indicate whether it corresponds to an update or a 
cleanup operation (both can appear for a single resource in the same graph).



If I am correct, you are updating all resources on update regardless
of their change which will be inefficient if stack contains a million resource.


I'm calling update() on all resources regardless of change, but update() 
will only call handle_update() if something has changed (unless the 
plugin has overridden Resource._needs_update()).


There's no way to know whether a resource needs to be updated before 
you're ready to update it, so I don't think of this as 'inefficient', 
just 'correct'.



We have similar questions regarding other
areas in your implementation, which we believe if we understand the outline of 
your implementation. It is difficult to get
a hold on your approach just by looking at code. Docs strings / Etherpad will 
help.


About streams, Yes in a million resource stack, the data will be huge, but less 
than template.


No way, it's O(n^3) (cubed!) in the worst case to store streams for each 
resource.



Also this stream is stored
only In IN_PROGRESS resources.


Now I'm really confused. Where does it come from if the resource doesn't 
get it until it's already in progress? And how will that information 
help it?



The reason to have entire dependency list to reduce DB queries while a stack 
update.


But we never need to know that. We only need to know what just happened 
and what to do next.



When you have a singular dependency on each resources similar to your 
implantation, then we will end up loading
Dependencies one at a time and altering almost all resource's dependency 
regardless of their change.

Regarding a 2 template approach for delete, it is not actually 2 different 
templates. Its just that we have a delete stream
To be taken up post update.


That would be a regression from Heat's current behaviour, where we start 
cleaning up resources as soon as they have nothing depending on them. 
There's not even a reason to make it worse than what we already have, 
because it's actually a lot _easier_ to treat update and clean up as the 
same kind of operation and throw both into the same big graph. The dual 
implementations and all of the edge cases go away and you can just trust 
in the graph traversal to do the Right Thing in the most parallel way 
possible.



(Any post operation will be handled as an update) This approach is True when 
Rollback==True
We can always fall back to regular stream (non-delete stream) if Rollback=False


I don't understand what you're saying here.


In our view we would like to have only one basic operation and that is UPDATE.

1. CREATE will be an update where a realized graph == Empty
2. UPDATE will be an update where realized graph == Full/Partial realized 
(possibly with a delete stream as post operation if Rollback==True)
3. DELETE will be just another update with an empty to_be_realized_graph.


Yes, that goes without saying. In my implementation 
Stack._create_or_update() handles all three operations.



It would be great if we can freeze a stable approach by mid-week as Christmas 
vacations are round the corner.  :) :)


-Original Message-
From: Zane Bitter [mailto:zbit...@redhat.com]
Sent: Saturday, December 13, 2014 5:43 AM
To: openstack-dev@lists.openstack.org
Subject: Re: [openstack-dev] [Heat] Convergence proof-of-concept
showdown

On 12/12/14 05:29, Murugan, Visnusaran wrote:




-Original Message-
From: Zane Bitter [mailto:zbit...@redhat.com]
Sent: Friday, December 12, 2014 6:37 AM
To: openstack-dev@lists.openstack.org
Subject: Re: [openstack-dev] [Heat] Convergence proof-of-concept
showdown

On 11/12/14 08:26, Murugan, Visnusaran wrote:

[Murugan, Visnusaran

Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

2014-12-15 Thread Zane Bitter

On 15/12/14 10:15, Anant Patil wrote:

On 13-Dec-14 05:42, Zane Bitter wrote:

On 12/12/14 05:29, Murugan, Visnusaran wrote:




-Original Message-
From: Zane Bitter [mailto:zbit...@redhat.com]
Sent: Friday, December 12, 2014 6:37 AM
To: openstack-dev@lists.openstack.org
Subject: Re: [openstack-dev] [Heat] Convergence proof-of-concept
showdown

On 11/12/14 08:26, Murugan, Visnusaran wrote:

[Murugan, Visnusaran]
In case of rollback where we have to cleanup earlier version of
resources,

we could get the order from old template. We'd prefer not to have a
graph table.

In theory you could get it by keeping old templates around. But that
means keeping a lot of templates, and it will be hard to keep track
of when you want to delete them. It also means that when starting an
update you'll need to load every existing previous version of the
template in order to calculate the dependencies. It also leaves the
dependencies in an ambiguous state when a resource fails, and
although that can be worked around it will be a giant pain to implement.



Agree that looking to all templates for a delete is not good. But
baring Complexity, we feel we could achieve it by way of having an
update and a delete stream for a stack update operation. I will
elaborate in detail in the etherpad sometime tomorrow :)


I agree that I'd prefer not to have a graph table. After trying a
couple of different things I decided to store the dependencies in the
Resource table, where we can read or write them virtually for free
because it turns out that we are always reading or updating the
Resource itself at exactly the same time anyway.



Not sure how this will work in an update scenario when a resource does
not change and its dependencies do.


We'll always update the requirements, even when the properties don't
change.



Can you elaborate a bit on rollback.


I didn't do anything special to handle rollback. It's possible that we
need to - obviously the difference in the UpdateReplace + rollback case
is that the replaced resource is now the one we want to keep, and yet
the replaced_by/replaces dependency will force the newer (replacement)
resource to be checked for deletion first, which is an inversion of the
usual order.



This is where the version is so handy! For UpdateReplaced ones, there is


*sometimes*


an older version to go back to. This version could just be template ID,
as I mentioned in another e-mail. All resources are at the current
template ID if they are found in the current template, even if they is
no need to update them. Otherwise, they need to be cleaned-up in the
order given in the previous templates.

I think the template ID is used as version as far as I can see in Zane's
PoC. If the resource template key doesn't match the current template
key, the resource is deleted. The version is misnomer here, but that
field (template id) is used as though we had versions of resources.


Correct. Because if we had wanted to keep it, we would have updated it 
to the new template version already.



However, I tried to think of a scenario where that would cause problems
and I couldn't come up with one. Provided we know the actual, real-world
dependencies of each resource I don't think the ordering of those two
checks matters.

In fact, I currently can't think of a case where the dependency order
between replacement and replaced resources matters at all. It matters in
the current Heat implementation because resources are artificially
segmented into the current and backup stacks, but with a holistic view
of dependencies that may well not be required. I tried taking that line
out of the simulator code and all the tests still passed. If anybody can
think of a scenario in which it would make a difference, I would be very
interested to hear it.

In any event though, it should be no problem to reverse the direction of
that one edge in these particular circumstances if it does turn out to
be a problem.


We had an approach with depends_on
and needed_by columns in ResourceTable. But dropped it when we figured out
we had too many DB operations for Update.


Yeah, I initially ran into this problem too - you have a bunch of nodes
that are waiting on the current node, and now you have to go look them
all up in the database to see what else they're waiting on in order to
tell if they're ready to be triggered.

It turns out the answer is to distribute the writes but centralise the
reads. So at the start of the update, we read all of the Resources,
obtain their dependencies and build one central graph[1]. We than make
that graph available to each resource (either by passing it as a
notification parameter, or storing it somewhere central in the DB that
they will all have to read anyway, i.e. the Stack). But when we update a
dependency we don't update the central graph, we update the individual
Resource so there's no global lock required.

[1]
https://github.com/zaneb/heat-convergence-prototype/blob/distributed-graph/converge/stack.py#L166

Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

2014-12-12 Thread Murugan, Visnusaran


 -Original Message-
 From: Zane Bitter [mailto:zbit...@redhat.com]
 Sent: Friday, December 12, 2014 6:37 AM
 To: openstack-dev@lists.openstack.org
 Subject: Re: [openstack-dev] [Heat] Convergence proof-of-concept
 showdown
 
 On 11/12/14 08:26, Murugan, Visnusaran wrote:
  [Murugan, Visnusaran]
  In case of rollback where we have to cleanup earlier version of
  resources,
  we could get the order from old template. We'd prefer not to have a
  graph table.
 
  In theory you could get it by keeping old templates around. But that
  means keeping a lot of templates, and it will be hard to keep track
  of when you want to delete them. It also means that when starting an
  update you'll need to load every existing previous version of the
  template in order to calculate the dependencies. It also leaves the
  dependencies in an ambiguous state when a resource fails, and
  although that can be worked around it will be a giant pain to implement.
 
 
  Agree that looking to all templates for a delete is not good. But
  baring Complexity, we feel we could achieve it by way of having an
  update and a delete stream for a stack update operation. I will
  elaborate in detail in the etherpad sometime tomorrow :)
 
  I agree that I'd prefer not to have a graph table. After trying a
  couple of different things I decided to store the dependencies in the
  Resource table, where we can read or write them virtually for free
  because it turns out that we are always reading or updating the
  Resource itself at exactly the same time anyway.
 
 
  Not sure how this will work in an update scenario when a resource does
  not change and its dependencies do.
 
 We'll always update the requirements, even when the properties don't
 change.
 

Can you elaborate a bit on rollback.  We had an approach with depends_on
and needed_by columns in ResourceTable. But dropped it when we figured out
we had too many DB operations for Update.

  Also taking care of deleting resources in order will be an issue.
 
 It works fine.
 
  This implies that there will be different versions of a resource which
  will even complicate further.
 
 No it doesn't, other than the different versions we already have due to
 UpdateReplace.
 
  This approach reduces DB queries by waiting for completion
  notification
  on a topic. The drawback I see is that delete stack stream will be
  huge as it will have the entire graph. We can always dump such data
  in ResourceLock.data Json and pass a simple flag
  load_stream_from_db to converge RPC call as a workaround for delete
 operation.
 
  This seems to be essentially equivalent to my 'SyncPoint'
  proposal[1], with
  the key difference that the data is stored in-memory in a Heat engine
  rather than the database.
 
  I suspect it's probably a mistake to move it in-memory for similar
  reasons to the argument Clint made against synchronising the marking
  off
  of dependencies in-memory. The database can handle that and the
  problem of making the DB robust against failures of a single machine
  has already been solved by someone else. If we do it in-memory we are
  just creating a single point of failure for not much gain. (I guess
  you could argue it doesn't matter, since if any Heat engine dies
  during the traversal then we'll have to kick off another one anyway,
  but it does limit our options if that changes in the
  future.) [Murugan, Visnusaran] Resource completes, removes itself
  from resource_lock and notifies engine. Engine will acquire parent
  lock and initiate parent only if all its children are satisfied (no child 
  entry in
 resource_lock).
  This will come in place of Aggregator.
 
  Yep, if you s/resource_lock/SyncPoint/ that's more or less exactly what I
 did.
  The three differences I can see are:
 
  1) I think you are proposing to create all of the sync points at the
  start of the traversal, rather than on an as-needed basis. This is
  probably a good idea. I didn't consider it because of the way my
  prototype evolved, but there's now no reason I can see not to do this.
  If we could move the data to the Resource table itself then we could
  even get it for free from an efficiency point of view.
 
  +1. But we will need engine_id to be stored somewhere for recovery
 purpose (easy to be queried format).
 
 Yeah, so I'm starting to think you're right, maybe the/a Lock table is the 
 right
 thing to use there. We could probably do it within the resource table using
 the same select-for-update to set the engine_id, but I agree that we might
 be starting to jam too much into that one table.
 

yeah. Unrelated values in resource table. Upon resource completion we have to 
unset engine_id as well as compared to dropping a row from resource lock.
Both are good. Having engine_id in resource_table will reduce db operaions
in half. We should go with just resource table along with engine_id.

  Sync points are created as-needed. Single resource is enough to restart
 that entire stream.
  I

Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

2014-12-12 Thread Murugan, Visnusaran
Hi zaneb,

Etherpad updated. 

https://etherpad.openstack.org/p/execution-stream-and-aggregator-based-convergence

 -Original Message-
 From: Murugan, Visnusaran
 Sent: Friday, December 12, 2014 4:00 PM
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [Heat] Convergence proof-of-concept
 showdown
 
 
 
  -Original Message-
  From: Zane Bitter [mailto:zbit...@redhat.com]
  Sent: Friday, December 12, 2014 6:37 AM
  To: openstack-dev@lists.openstack.org
  Subject: Re: [openstack-dev] [Heat] Convergence proof-of-concept
  showdown
 
  On 11/12/14 08:26, Murugan, Visnusaran wrote:
   [Murugan, Visnusaran]
   In case of rollback where we have to cleanup earlier version of
   resources,
   we could get the order from old template. We'd prefer not to have a
   graph table.
  
   In theory you could get it by keeping old templates around. But
   that means keeping a lot of templates, and it will be hard to keep
   track of when you want to delete them. It also means that when
   starting an update you'll need to load every existing previous
   version of the template in order to calculate the dependencies. It
   also leaves the dependencies in an ambiguous state when a resource
   fails, and although that can be worked around it will be a giant pain to
 implement.
  
  
   Agree that looking to all templates for a delete is not good. But
   baring Complexity, we feel we could achieve it by way of having an
   update and a delete stream for a stack update operation. I will
   elaborate in detail in the etherpad sometime tomorrow :)
  
   I agree that I'd prefer not to have a graph table. After trying a
   couple of different things I decided to store the dependencies in
   the Resource table, where we can read or write them virtually for
   free because it turns out that we are always reading or updating
   the Resource itself at exactly the same time anyway.
  
  
   Not sure how this will work in an update scenario when a resource
   does not change and its dependencies do.
 
  We'll always update the requirements, even when the properties don't
  change.
 
 
 Can you elaborate a bit on rollback.  We had an approach with depends_on
 and needed_by columns in ResourceTable. But dropped it when we figured
 out we had too many DB operations for Update.
 
   Also taking care of deleting resources in order will be an issue.
 
  It works fine.
 
   This implies that there will be different versions of a resource
   which will even complicate further.
 
  No it doesn't, other than the different versions we already have due
  to UpdateReplace.
 
   This approach reduces DB queries by waiting for completion
   notification
   on a topic. The drawback I see is that delete stack stream will be
   huge as it will have the entire graph. We can always dump such data
   in ResourceLock.data Json and pass a simple flag
   load_stream_from_db to converge RPC call as a workaround for
   delete
  operation.
  
   This seems to be essentially equivalent to my 'SyncPoint'
   proposal[1], with
   the key difference that the data is stored in-memory in a Heat
   engine rather than the database.
  
   I suspect it's probably a mistake to move it in-memory for similar
   reasons to the argument Clint made against synchronising the
   marking off
   of dependencies in-memory. The database can handle that and the
   problem of making the DB robust against failures of a single
   machine has already been solved by someone else. If we do it
   in-memory we are just creating a single point of failure for not
   much gain. (I guess you could argue it doesn't matter, since if any
   Heat engine dies during the traversal then we'll have to kick off
   another one anyway, but it does limit our options if that changes
   in the
   future.) [Murugan, Visnusaran] Resource completes, removes itself
   from resource_lock and notifies engine. Engine will acquire parent
   lock and initiate parent only if all its children are satisfied (no
   child entry in
  resource_lock).
   This will come in place of Aggregator.
  
   Yep, if you s/resource_lock/SyncPoint/ that's more or less exactly
   what I
  did.
   The three differences I can see are:
  
   1) I think you are proposing to create all of the sync points at
   the start of the traversal, rather than on an as-needed basis. This
   is probably a good idea. I didn't consider it because of the way my
   prototype evolved, but there's now no reason I can see not to do this.
   If we could move the data to the Resource table itself then we
   could even get it for free from an efficiency point of view.
  
   +1. But we will need engine_id to be stored somewhere for recovery
  purpose (easy to be queried format).
 
  Yeah, so I'm starting to think you're right, maybe the/a Lock table is
  the right thing to use there. We could probably do it within the
  resource table using the same select-for-update to set the engine_id,
  but I agree

Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

2014-12-12 Thread Zane Bitter

On 12/12/14 05:29, Murugan, Visnusaran wrote:




-Original Message-
From: Zane Bitter [mailto:zbit...@redhat.com]
Sent: Friday, December 12, 2014 6:37 AM
To: openstack-dev@lists.openstack.org
Subject: Re: [openstack-dev] [Heat] Convergence proof-of-concept
showdown

On 11/12/14 08:26, Murugan, Visnusaran wrote:

[Murugan, Visnusaran]
In case of rollback where we have to cleanup earlier version of
resources,

we could get the order from old template. We'd prefer not to have a
graph table.

In theory you could get it by keeping old templates around. But that
means keeping a lot of templates, and it will be hard to keep track
of when you want to delete them. It also means that when starting an
update you'll need to load every existing previous version of the
template in order to calculate the dependencies. It also leaves the
dependencies in an ambiguous state when a resource fails, and
although that can be worked around it will be a giant pain to implement.



Agree that looking to all templates for a delete is not good. But
baring Complexity, we feel we could achieve it by way of having an
update and a delete stream for a stack update operation. I will
elaborate in detail in the etherpad sometime tomorrow :)


I agree that I'd prefer not to have a graph table. After trying a
couple of different things I decided to store the dependencies in the
Resource table, where we can read or write them virtually for free
because it turns out that we are always reading or updating the
Resource itself at exactly the same time anyway.



Not sure how this will work in an update scenario when a resource does
not change and its dependencies do.


We'll always update the requirements, even when the properties don't
change.



Can you elaborate a bit on rollback.


I didn't do anything special to handle rollback. It's possible that we 
need to - obviously the difference in the UpdateReplace + rollback case 
is that the replaced resource is now the one we want to keep, and yet 
the replaced_by/replaces dependency will force the newer (replacement) 
resource to be checked for deletion first, which is an inversion of the 
usual order.


However, I tried to think of a scenario where that would cause problems 
and I couldn't come up with one. Provided we know the actual, real-world 
dependencies of each resource I don't think the ordering of those two 
checks matters.


In fact, I currently can't think of a case where the dependency order 
between replacement and replaced resources matters at all. It matters in 
the current Heat implementation because resources are artificially 
segmented into the current and backup stacks, but with a holistic view 
of dependencies that may well not be required. I tried taking that line 
out of the simulator code and all the tests still passed. If anybody can 
think of a scenario in which it would make a difference, I would be very 
interested to hear it.


In any event though, it should be no problem to reverse the direction of 
that one edge in these particular circumstances if it does turn out to 
be a problem.



We had an approach with depends_on
and needed_by columns in ResourceTable. But dropped it when we figured out
we had too many DB operations for Update.


Yeah, I initially ran into this problem too - you have a bunch of nodes 
that are waiting on the current node, and now you have to go look them 
all up in the database to see what else they're waiting on in order to 
tell if they're ready to be triggered.


It turns out the answer is to distribute the writes but centralise the 
reads. So at the start of the update, we read all of the Resources, 
obtain their dependencies and build one central graph[1]. We than make 
that graph available to each resource (either by passing it as a 
notification parameter, or storing it somewhere central in the DB that 
they will all have to read anyway, i.e. the Stack). But when we update a 
dependency we don't update the central graph, we update the individual 
Resource so there's no global lock required.


[1] 
https://github.com/zaneb/heat-convergence-prototype/blob/distributed-graph/converge/stack.py#L166-L168



Also taking care of deleting resources in order will be an issue.


It works fine.


This implies that there will be different versions of a resource which
will even complicate further.


No it doesn't, other than the different versions we already have due to
UpdateReplace.


This approach reduces DB queries by waiting for completion
notification

on a topic. The drawback I see is that delete stack stream will be
huge as it will have the entire graph. We can always dump such data
in ResourceLock.data Json and pass a simple flag
load_stream_from_db to converge RPC call as a workaround for delete

operation.


This seems to be essentially equivalent to my 'SyncPoint'
proposal[1], with

the key difference that the data is stored in-memory in a Heat engine
rather than the database.


I suspect it's probably a mistake to move

Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

2014-12-11 Thread Murugan, Visnusaran


 -Original Message-
 From: Zane Bitter [mailto:zbit...@redhat.com]
 Sent: Wednesday, December 10, 2014 11:17 PM
 To: openstack-dev@lists.openstack.org
 Subject: Re: [openstack-dev] [Heat] Convergence proof-of-concept
 showdown
 
 You really need to get a real email client with quoting support ;)
Apologies :) I screwed up my mail client's configuration.

 
 On 10/12/14 06:42, Murugan, Visnusaran wrote:
  Well, we still have to persist the dependencies of each version of a
 resource _somehow_, because otherwise we can't know how to clean them
 up in the correct order. But what I think you meant to say is that this
 approach doesn't require it to be persisted in a separate table where the
 rows are marked as traversed as we work through the graph.
 
  [Murugan, Visnusaran]
  In case of rollback where we have to cleanup earlier version of resources,
 we could get the order from old template. We'd prefer not to have a graph
 table.
 
 In theory you could get it by keeping old templates around. But that means
 keeping a lot of templates, and it will be hard to keep track of when you want
 to delete them. It also means that when starting an update you'll need to
 load every existing previous version of the template in order to calculate the
 dependencies. It also leaves the dependencies in an ambiguous state when a
 resource fails, and although that can be worked around it will be a giant pain
 to implement.
 

Agree that looking to all templates for a delete is not good. But baring 
Complexity, we feel we could achieve it by way of having an update and a 
delete stream for a stack update operation. I will elaborate in detail in the
etherpad sometime tomorrow :)

 I agree that I'd prefer not to have a graph table. After trying a couple of
 different things I decided to store the dependencies in the Resource table,
 where we can read or write them virtually for free because it turns out that
 we are always reading or updating the Resource itself at exactly the same
 time anyway.
 

Not sure how this will work in an update scenario when a resource does not 
change
and its dependencies do. Also taking care of deleting resources in order will
be an issue. This implies that there will be different versions of a resource 
which 
will even complicate further.

  This approach reduces DB queries by waiting for completion notification
 on a topic. The drawback I see is that delete stack stream will be huge as it
 will have the entire graph. We can always dump such data in
 ResourceLock.data Json and pass a simple flag load_stream_from_db to
 converge RPC call as a workaround for delete operation.
 
  This seems to be essentially equivalent to my 'SyncPoint' proposal[1], with
 the key difference that the data is stored in-memory in a Heat engine rather
 than the database.
 
  I suspect it's probably a mistake to move it in-memory for similar
  reasons to the argument Clint made against synchronising the marking off
 of dependencies in-memory. The database can handle that and the problem
 of making the DB robust against failures of a single machine has already been
 solved by someone else. If we do it in-memory we are just creating a single
 point of failure for not much gain. (I guess you could argue it doesn't 
 matter,
 since if any Heat engine dies during the traversal then we'll have to kick off
 another one anyway, but it does limit our options if that changes in the
 future.) [Murugan, Visnusaran] Resource completes, removes itself from
 resource_lock and notifies engine. Engine will acquire parent lock and 
 initiate
 parent only if all its children are satisfied (no child entry in 
 resource_lock).
 This will come in place of Aggregator.
 
 Yep, if you s/resource_lock/SyncPoint/ that's more or less exactly what I did.
 The three differences I can see are:
 
 1) I think you are proposing to create all of the sync points at the start of 
 the
 traversal, rather than on an as-needed basis. This is probably a good idea. I
 didn't consider it because of the way my prototype evolved, but there's now
 no reason I can see not to do this.
 If we could move the data to the Resource table itself then we could even
 get it for free from an efficiency point of view.

+1. But we will need engine_id to be stored somewhere for recovery purpose 
(easy to be queried format).
Sync points are created as-needed. Single resource is enough to restart that 
entire stream.
I think there is a disconnect in our understanding. I will detail it as well in 
the etherpad.

 2) You're using a single list from which items are removed, rather than two
 lists (one static, and one to which items are added) that get compared.
 Assuming (1) then this is probably a good idea too.

Yeah. We have a single list per active stream which work by removing 
Complete/satisfied resources from it.

 3) You're suggesting to notify the engine unconditionally and let the engine
 decide if the list is empty. That's probably not a good idea - not only does

Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

2014-12-11 Thread Zane Bitter

On 11/12/14 01:14, Anant Patil wrote:

On 04-Dec-14 10:49, Zane Bitter wrote:

On 01/12/14 02:02, Anant Patil wrote:

On GitHub:https://github.com/anantpatil/heat-convergence-poc


I'm trying to review this code at the moment, and finding some stuff I
don't understand:

https://github.com/anantpatil/heat-convergence-poc/blob/master/heat/engine/stack.py#L911-L916

This appears to loop through all of the resources *prior* to kicking off
any actual updates to check if the resource will change. This is
impossible to do in general, since a resource may obtain a property
value from an attribute of another resource and there is no way to know
whether an update to said other resource would cause a change in the
attribute value.

In addition, no attempt to catch UpdateReplace is made. Although that
looks like a simple fix, I'm now worried about the level to which this
code has been tested.


We were working on new branch and as we discussed on Skype, we have
handled all these cases. Please have a look at our current branch:
https://github.com/anantpatil/heat-convergence-poc/tree/graph-version

When a new resource is taken for convergence, its children are loaded
and the resource definition is re-parsed. The frozen resource definition
will have all the get_attr resolved.



I'm also trying to wrap my head around how resources are cleaned up in
dependency order. If I understand correctly, you store in the
ResourceGraph table the dependencies between various resource names in
the current template (presumably there could also be some left around
from previous templates too?). For each resource name there may be a
number of rows in the Resource table, each with an incrementing version.
As far as I can tell though, there's nowhere that the dependency graph
for _previous_ templates is persisted? So if the dependency order
changes in the template we have no way of knowing the correct order to
clean up in any more? (There's not even a mechanism to associate a
resource version with a particular template, which might be one avenue
by which to recover the dependencies.)

I think this is an important case we need to be able to handle, so I
added a scenario to my test framework to exercise it and discovered that
my implementation was also buggy. Here's the fix:
https://github.com/zaneb/heat-convergence-prototype/commit/786f367210ca0acf9eb22bea78fd9d51941b0e40



Thanks for pointing this out Zane. We too had a buggy implementation for
handling inverted dependency. I had a hard look at our algorithm where
we were continuously merging the edges from new template into the edges
from previous updates. It was an optimized way of traversing the graph
in both forward and reverse order with out missing any resources. But,
when the dependencies are inverted,  this wouldn't work.

We have changed our algorithm. The changes in edges are noted down in
DB, only the delta of edges from previous template is calculated and
kept. At any given point of time, the graph table has all the edges from
current template and delta from previous templates. Each edge has
template ID associated with it.


The thing is, the cleanup dependencies aren't really about the template. 
The real resources really depend on other real resources. You can't 
delete a Volume before its VolumeAttachment, not because it says so in 
the template but because it will fail if you try. The template can give 
us a rough guide in advance to what those dependencies will be, but if 
that's all we keep then we are discarding information.


There may be multiple versions of a resource corresponding to one 
template version. Even worse, the actual dependencies of a resource 
change on a smaller time scale than an entire stack update (this is the 
reason the current implementation updates the template one resource at a 
time as we go).


Given that our Resource entries in the DB are in 1:1 correspondence with 
actual resources (we create a new one whenever we need to replace the 
underlying resource), I found it makes the most conceptual and practical 
sense to store the requirements in the resource itself, and update them 
at the time they actually change in the real world (bonus: introduces no 
new locking issues and no extra DB writes). I settled on this after a 
legitimate attempt at trying other options, but they didn't work out: 
https://github.com/zaneb/heat-convergence-prototype/commit/a62958342e8583f74e2aca90f6239ad457ba984d



For resource clean up, we start from the
first template (template which was completed and updates were made on
top of it, empty template otherwise), and move towards the current
template in the order in which the updates were issued, and for each
template the graph (edges if found for the template) is traversed in
reverse order and resources are cleaned-up.


I'm pretty sure this is backwards - you'll need to clean up newer 
resources first because they may reference resources from older 
templates. Also if you have a stubborn old resource that won't 

Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

2014-12-11 Thread Zane Bitter

On 11/12/14 08:26, Murugan, Visnusaran wrote:

[Murugan, Visnusaran]
In case of rollback where we have to cleanup earlier version of resources,

we could get the order from old template. We'd prefer not to have a graph
table.

In theory you could get it by keeping old templates around. But that means
keeping a lot of templates, and it will be hard to keep track of when you want
to delete them. It also means that when starting an update you'll need to
load every existing previous version of the template in order to calculate the
dependencies. It also leaves the dependencies in an ambiguous state when a
resource fails, and although that can be worked around it will be a giant pain
to implement.



Agree that looking to all templates for a delete is not good. But baring
Complexity, we feel we could achieve it by way of having an update and a
delete stream for a stack update operation. I will elaborate in detail in the
etherpad sometime tomorrow :)


I agree that I'd prefer not to have a graph table. After trying a couple of
different things I decided to store the dependencies in the Resource table,
where we can read or write them virtually for free because it turns out that
we are always reading or updating the Resource itself at exactly the same
time anyway.



Not sure how this will work in an update scenario when a resource does not 
change
and its dependencies do.


We'll always update the requirements, even when the properties don't change.


Also taking care of deleting resources in order will
be an issue.


It works fine.


This implies that there will be different versions of a resource which
will even complicate further.


No it doesn't, other than the different versions we already have due to 
UpdateReplace.



This approach reduces DB queries by waiting for completion notification

on a topic. The drawback I see is that delete stack stream will be huge as it
will have the entire graph. We can always dump such data in
ResourceLock.data Json and pass a simple flag load_stream_from_db to
converge RPC call as a workaround for delete operation.


This seems to be essentially equivalent to my 'SyncPoint' proposal[1], with

the key difference that the data is stored in-memory in a Heat engine rather
than the database.


I suspect it's probably a mistake to move it in-memory for similar
reasons to the argument Clint made against synchronising the marking off

of dependencies in-memory. The database can handle that and the problem
of making the DB robust against failures of a single machine has already been
solved by someone else. If we do it in-memory we are just creating a single
point of failure for not much gain. (I guess you could argue it doesn't matter,
since if any Heat engine dies during the traversal then we'll have to kick off
another one anyway, but it does limit our options if that changes in the
future.) [Murugan, Visnusaran] Resource completes, removes itself from
resource_lock and notifies engine. Engine will acquire parent lock and initiate
parent only if all its children are satisfied (no child entry in resource_lock).
This will come in place of Aggregator.

Yep, if you s/resource_lock/SyncPoint/ that's more or less exactly what I did.
The three differences I can see are:

1) I think you are proposing to create all of the sync points at the start of 
the
traversal, rather than on an as-needed basis. This is probably a good idea. I
didn't consider it because of the way my prototype evolved, but there's now
no reason I can see not to do this.
If we could move the data to the Resource table itself then we could even
get it for free from an efficiency point of view.


+1. But we will need engine_id to be stored somewhere for recovery purpose 
(easy to be queried format).


Yeah, so I'm starting to think you're right, maybe the/a Lock table is 
the right thing to use there. We could probably do it within the 
resource table using the same select-for-update to set the engine_id, 
but I agree that we might be starting to jam too much into that one table.



Sync points are created as-needed. Single resource is enough to restart that 
entire stream.
I think there is a disconnect in our understanding. I will detail it as well in 
the etherpad.


OK, that would be good.


2) You're using a single list from which items are removed, rather than two
lists (one static, and one to which items are added) that get compared.
Assuming (1) then this is probably a good idea too.


Yeah. We have a single list per active stream which work by removing
Complete/satisfied resources from it.


I went to change this and then remembered why I did it this way: the 
sync point is also storing data about the resources that are triggering 
it. Part of this is the RefID and attributes, and we could replace that 
by storing that data in the Resource itself and querying it rather than 
having it passed in via the notification. But the other part is the 
ID/key of those resources, which we _need_ to know in order to update 

Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

2014-12-10 Thread Murugan, Visnusaran


-Original Message-
From: Zane Bitter [mailto:zbit...@redhat.com] 
Sent: Tuesday, December 9, 2014 3:50 AM
To: openstack-dev@lists.openstack.org
Subject: Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

On 08/12/14 07:00, Murugan, Visnusaran wrote:

 Hi Zane  Michael,

 Please have a look @ 
 https://etherpad.openstack.org/p/execution-stream-and-aggregator-based
 -convergence

 Updated with a combined approach which does not require persisting graph and 
 backup stack removal.

Well, we still have to persist the dependencies of each version of a resource 
_somehow_, because otherwise we can't know how to clean them up in the correct 
order. But what I think you meant to say is that this approach doesn't require 
it to be persisted in a separate table where the rows are marked as traversed 
as we work through the graph.

[Murugan, Visnusaran] 
In case of rollback where we have to cleanup earlier version of resources, we 
could get the order from old template. We'd prefer not to have a graph table.

 This approach reduces DB queries by waiting for completion notification on a 
 topic. The drawback I see is that delete stack stream will be huge as it will 
 have the entire graph. We can always dump such data in ResourceLock.data Json 
 and pass a simple flag load_stream_from_db to converge RPC call as a 
 workaround for delete operation.

This seems to be essentially equivalent to my 'SyncPoint' proposal[1], with the 
key difference that the data is stored in-memory in a Heat engine rather than 
the database.

I suspect it's probably a mistake to move it in-memory for similar reasons to 
the argument Clint made against synchronising the marking off of dependencies 
in-memory. The database can handle that and the problem of making the DB robust 
against failures of a single machine has already been solved by someone else. 
If we do it in-memory we are just creating a single point of failure for not 
much gain. (I guess you could argue it doesn't matter, since if any Heat engine 
dies during the traversal then we'll have to kick off another one anyway, but 
it does limit our options if that changes in the future.)
[Murugan, Visnusaran] Resource completes, removes itself from resource_lock and 
notifies engine. Engine will acquire parent lock and initiate parent only if 
all its children are satisfied (no child entry in resource_lock). This will 
come in place of Aggregator.

It's not clear to me how the 'streams' differ in practical terms from just 
passing a serialisation of the Dependencies object, other than being 
incomprehensible to me ;). The current Dependencies implementation
(1) is a very generic implementation of a DAG, (2) works and has plenty of unit 
tests, (3) has, with I think one exception, a pretty straightforward API, (4) 
has a very simple serialisation, returned by the edges() method, which can be 
passed back into the constructor to recreate it, and (5) has an API that is to 
some extent relied upon by resources, and so won't likely be removed outright 
in any event. 
Whatever code we need to handle dependencies ought to just build on this 
existing implementation.
[Murugan, Visnusaran] Our thought was to reduce payload size (template/graph). 
Just planning for worst case scenario (million resource stack) We could always 
dump them in ResourceLock.data to be loaded by Worker.

I think the difference may be that the streams only include the
*shortest* paths (there will often be more than one) to each resource. i.e.

  A --- B --- C
  ^ |
  | |
  +-+

can just be written as:

  A --- B --- C

because there's only one order in which that can execute anyway. (If we're 
going to do this though, we should just add a method to the dependencies.Graph 
class to delete redundant edges, not create a whole new data structure.) There 
is a big potential advantage here in that it reduces the theoretical maximum 
number of edges in the graph from O(n^2) to O(n). (Although in practice real 
templates are typically not likely to have such dense graphs.)

There's a downside to this too though: say that A in the above diagram is 
replaced during an update. In that case not only B but also C will need to 
figure out what the latest version of A is. One option here is to pass that 
data along via B, but that will become very messy to implement in a non-trivial 
example. The other would be for C to go search in the database for resources 
with the same name as A and the current traversal_id marked as the latest. But 
that not only creates a concurrency problem we didn't have before (A could have 
been updated with a new traversal_id at some point after C had established that 
the current traversal was still valid but before it went looking for A), it 
also eliminates all of the performance gains from removing that edge in the 
first place.

[1]
https://github.com/zaneb/heat-convergence

Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

2014-12-10 Thread Zane Bitter

You really need to get a real email client with quoting support ;)

On 10/12/14 06:42, Murugan, Visnusaran wrote:

Well, we still have to persist the dependencies of each version of a resource 
_somehow_, because otherwise we can't know how to clean them up in the correct 
order. But what I think you meant to say is that this approach doesn't require 
it to be persisted in a separate table where the rows are marked as traversed 
as we work through the graph.

[Murugan, Visnusaran]
In case of rollback where we have to cleanup earlier version of resources, we 
could get the order from old template. We'd prefer not to have a graph table.


In theory you could get it by keeping old templates around. But that 
means keeping a lot of templates, and it will be hard to keep track of 
when you want to delete them. It also means that when starting an update 
you'll need to load every existing previous version of the template in 
order to calculate the dependencies. It also leaves the dependencies in 
an ambiguous state when a resource fails, and although that can be 
worked around it will be a giant pain to implement.


I agree that I'd prefer not to have a graph table. After trying a couple 
of different things I decided to store the dependencies in the Resource 
table, where we can read or write them virtually for free because it 
turns out that we are always reading or updating the Resource itself at 
exactly the same time anyway.



This approach reduces DB queries by waiting for completion notification on a topic. The 
drawback I see is that delete stack stream will be huge as it will have the entire graph. 
We can always dump such data in ResourceLock.data Json and pass a simple flag 
load_stream_from_db to converge RPC call as a workaround for delete operation.


This seems to be essentially equivalent to my 'SyncPoint' proposal[1], with the 
key difference that the data is stored in-memory in a Heat engine rather than 
the database.

I suspect it's probably a mistake to move it in-memory for similar reasons to 
the argument Clint made against synchronising the marking off of dependencies 
in-memory. The database can handle that and the problem of making the DB robust 
against failures of a single machine has already been solved by someone else. 
If we do it in-memory we are just creating a single point of failure for not 
much gain. (I guess you could argue it doesn't matter, since if any Heat engine 
dies during the traversal then we'll have to kick off another one anyway, but 
it does limit our options if that changes in the future.)
[Murugan, Visnusaran] Resource completes, removes itself from resource_lock and 
notifies engine. Engine will acquire parent lock and initiate parent only if 
all its children are satisfied (no child entry in resource_lock). This will 
come in place of Aggregator.


Yep, if you s/resource_lock/SyncPoint/ that's more or less exactly what 
I did. The three differences I can see are:


1) I think you are proposing to create all of the sync points at the 
start of the traversal, rather than on an as-needed basis. This is 
probably a good idea. I didn't consider it because of the way my 
prototype evolved, but there's now no reason I can see not to do this. 
If we could move the data to the Resource table itself then we could 
even get it for free from an efficiency point of view.
2) You're using a single list from which items are removed, rather than 
two lists (one static, and one to which items are added) that get 
compared. Assuming (1) then this is probably a good idea too.
3) You're suggesting to notify the engine unconditionally and let the 
engine decide if the list is empty. That's probably not a good idea - 
not only does it require extra reads, it introduces a race condition 
that you then have to solve (it can be solved, it's just more work). 
Since the update to remove a child from the list is atomic, it's best to 
just trigger the engine only if the list is now empty.



It's not clear to me how the 'streams' differ in practical terms from just 
passing a serialisation of the Dependencies object, other than being 
incomprehensible to me ;). The current Dependencies implementation
(1) is a very generic implementation of a DAG, (2) works and has plenty of unit 
tests, (3) has, with I think one exception, a pretty straightforward API, (4) 
has a very simple serialisation, returned by the edges() method, which can be 
passed back into the constructor to recreate it, and (5) has an API that is to 
some extent relied upon by resources, and so won't likely be removed outright 
in any event.
Whatever code we need to handle dependencies ought to just build on this 
existing implementation.
[Murugan, Visnusaran] Our thought was to reduce payload size (template/graph). 
Just planning for worst case scenario (million resource stack) We could always 
dump them in ResourceLock.data to be loaded by Worker.


If there's a smaller representation of a graph than a list of edges then 

Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

2014-12-08 Thread Zane Bitter

On 08/12/14 07:00, Murugan, Visnusaran wrote:


Hi Zane  Michael,

Please have a look @ 
https://etherpad.openstack.org/p/execution-stream-and-aggregator-based-convergence

Updated with a combined approach which does not require persisting graph and 
backup stack removal.


Well, we still have to persist the dependencies of each version of a 
resource _somehow_, because otherwise we can't know how to clean them up 
in the correct order. But what I think you meant to say is that this 
approach doesn't require it to be persisted in a separate table where 
the rows are marked as traversed as we work through the graph.



This approach reduces DB queries by waiting for completion notification on a topic. The 
drawback I see is that delete stack stream will be huge as it will have the entire graph. 
We can always dump such data in ResourceLock.data Json and pass a simple flag 
load_stream_from_db to converge RPC call as a workaround for delete operation.


This seems to be essentially equivalent to my 'SyncPoint' proposal[1], 
with the key difference that the data is stored in-memory in a Heat 
engine rather than the database.


I suspect it's probably a mistake to move it in-memory for similar 
reasons to the argument Clint made against synchronising the marking off 
of dependencies in-memory. The database can handle that and the problem 
of making the DB robust against failures of a single machine has already 
been solved by someone else. If we do it in-memory we are just creating 
a single point of failure for not much gain. (I guess you could argue it 
doesn't matter, since if any Heat engine dies during the traversal then 
we'll have to kick off another one anyway, but it does limit our options 
if that changes in the future.)


It's not clear to me how the 'streams' differ in practical terms from 
just passing a serialisation of the Dependencies object, other than 
being incomprehensible to me ;). The current Dependencies implementation 
(1) is a very generic implementation of a DAG, (2) works and has plenty 
of unit tests, (3) has, with I think one exception, a pretty 
straightforward API, (4) has a very simple serialisation, returned by 
the edges() method, which can be passed back into the constructor to 
recreate it, and (5) has an API that is to some extent relied upon by 
resources, and so won't likely be removed outright in any event. 
Whatever code we need to handle dependencies ought to just build on this 
existing implementation.


I think the difference may be that the streams only include the 
*shortest* paths (there will often be more than one) to each resource. i.e.


 A --- B --- C
 ^ |
 | |
 +-+

can just be written as:

 A --- B --- C

because there's only one order in which that can execute anyway. (If 
we're going to do this though, we should just add a method to the 
dependencies.Graph class to delete redundant edges, not create a whole 
new data structure.) There is a big potential advantage here in that it 
reduces the theoretical maximum number of edges in the graph from O(n^2) 
to O(n). (Although in practice real templates are typically not likely 
to have such dense graphs.)


There's a downside to this too though: say that A in the above diagram 
is replaced during an update. In that case not only B but also C will 
need to figure out what the latest version of A is. One option here is 
to pass that data along via B, but that will become very messy to 
implement in a non-trivial example. The other would be for C to go 
search in the database for resources with the same name as A and the 
current traversal_id marked as the latest. But that not only creates a 
concurrency problem we didn't have before (A could have been updated 
with a new traversal_id at some point after C had established that the 
current traversal was still valid but before it went looking for A), it 
also eliminates all of the performance gains from removing that edge in 
the first place.


[1] 
https://github.com/zaneb/heat-convergence-prototype/blob/distributed-graph/converge/sync_point.py



To Stop current stack operation, we will use your traversal_id based approach.


+1 :)


If in case you feel Aggregator model creates more queues, then we might have to 
poll DB to get resource status. (Which will impact performance adversely :) )


For the reasons given above I would vote for doing this in the DB. I 
agree there will be a performance penalty for doing so, because we'll be 
paying for robustness.



Lock table: name(Unique - Resource_id), stack_id, engine_id, data (Json to 
store stream dict)


Based on our call on Thursday, I think you're taking the idea of the 
Lock table too literally. The point of referring to locks is that we can 
use the same concepts as the Lock table relies on to do atomic updates 
on a particular row of the database, and we can use those atomic updates 
to prevent race conditions 

Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

2014-12-03 Thread Zane Bitter

On 01/12/14 02:02, Anant Patil wrote:

On GitHub:https://github.com/anantpatil/heat-convergence-poc


I'm trying to review this code at the moment, and finding some stuff I 
don't understand:


https://github.com/anantpatil/heat-convergence-poc/blob/master/heat/engine/stack.py#L911-L916

This appears to loop through all of the resources *prior* to kicking off 
any actual updates to check if the resource will change. This is 
impossible to do in general, since a resource may obtain a property 
value from an attribute of another resource and there is no way to know 
whether an update to said other resource would cause a change in the 
attribute value.


In addition, no attempt to catch UpdateReplace is made. Although that 
looks like a simple fix, I'm now worried about the level to which this 
code has been tested.



I'm also trying to wrap my head around how resources are cleaned up in 
dependency order. If I understand correctly, you store in the 
ResourceGraph table the dependencies between various resource names in 
the current template (presumably there could also be some left around 
from previous templates too?). For each resource name there may be a 
number of rows in the Resource table, each with an incrementing version. 
As far as I can tell though, there's nowhere that the dependency graph 
for _previous_ templates is persisted? So if the dependency order 
changes in the template we have no way of knowing the correct order to 
clean up in any more? (There's not even a mechanism to associate a 
resource version with a particular template, which might be one avenue 
by which to recover the dependencies.)


I think this is an important case we need to be able to handle, so I 
added a scenario to my test framework to exercise it and discovered that 
my implementation was also buggy. Here's the fix: 
https://github.com/zaneb/heat-convergence-prototype/commit/786f367210ca0acf9eb22bea78fd9d51941b0e40




It was difficult, for me personally, to completely understand Zane's PoC
and how it would lay the foundation for aforementioned design goals. It
would be very helpful to have Zane's understanding here. I could
understand that there are ideas like async message passing and notifying
the parent which we also subscribe to.


So I guess the thing to note is that there are essentially two parts to 
my Poc:
1) A simulation framework that takes what will be in the final 
implementation multiple tasks running in parallel in separate processes 
and talking to a database, and replaces it with an event loop that runs 
the tasks sequentially in a single process with an in-memory data store. 
I could have built a more realistic simulator using Celery or something, 
but I preferred this way as it offers deterministic tests.

2) A toy implementation of Heat on top of this framework.

The files map roughly to Heat something like this:

converge.engine   - heat.engine.service
converge.stack- heat.engine.stack
converge.resource - heat.engine.resource
converge.template - heat.engine.template
converge.dependencies - actually is heat.engine.dependencies
converge.sync_point   - no equivalent
converge.converger- no equivalent (this is convergence worker)
converge.reality  - represents the actual OpenStack services

For convenience, I just use the @asynchronous decorator to turn an 
ordinary method call into a simulated message.


The concept is essentially as follows:
At the start of a stack update (creates and deletes are also just 
updates) we create any new resources in the DB calculate the dependency 
graph for the update from the data in the DB and template. This graph is 
the same one used by updates in Heat currently, so it contains both the 
forward and reverse (cleanup) dependencies. The stack update then kicks 
off checks of all the leaf nodes, passing the pre-calculated dependency 
graph.


Each resource check may result in a call to the create(), update() or 
delete() methods of a Resource plugin. The resource also reads any 
attributes that will be required from it. Once this is complete, it 
triggers any dependent resources that are ready, or updates a SyncPoint 
in the database if there are dependent resources that have multiple 
requirements. The message triggering the next resource will contain the 
dependency graph again, as well as the RefIds and required attributes of 
any resources it depends on.


The new dependencies thus created are added to the resource itself in 
the database at the time it is checked, allowing it to record the 
changes caused by a requirement being unexpectedly replaced without 
needing a global lock on anything.


When cleaning up resources, we also endeavour to remove any that are 
successfully deleted from the dependencies graph.


Each traversal has a unique ID that is both stored in the stack and 
passed down through the resource check triggers. (At present this is the 
template ID, but it may make more sense to have a unique ID since old 
template 

Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

2014-12-02 Thread Anant Patil
On Tue, Dec 2, 2014 at 1:49 AM, Clint Byrum cl...@fewbar.com wrote:

 Excerpts from Anant Patil's message of 2014-11-30 23:02:29 -0800:
  On 27-Nov-14 18:03, Murugan, Visnusaran wrote:
   Hi Zane,
  
  
  
   At this stage our implementation (as mentioned in wiki
   https://wiki.openstack.org/wiki/Heat/ConvergenceDesign) achieves
 your
   design goals.
  
  
  
   1.   In case of a parallel update, our implementation adjusts graph
   according to new template and waits for dispatched resource tasks to
   complete.
  
   2.   Reason for basing our PoC on Heat code:
  
   a.   To solve contention processing parent resource by all
 dependent
   resources in parallel.
  
   b.  To avoid porting issue from PoC to HeatBase. (just to be aware
   of potential issues asap)
  
   3.   Resource timeout would be helpful, but I guess its resource
   specific and has to come from template and default values from plugins.
  
   4.   We see resource notification aggregation and processing next
   level of resources without contention and with minimal DB usage as the
   problem area. We are working on the following approaches in *parallel.*
  
   a.   Use a Queue per stack to serialize notification.
  
   b.  Get parent ProcessLog (ResourceID, EngineID) and initiate
   convergence upon first child notification. Subsequent children who fail
   to get parent resource lock will directly send message to waiting
 parent
   task (topic=stack_id.parent_resource_id)
  
   Based on performance/feedback we can select either or a mashed version.
  
  
  
   Advantages:
  
   1.   Failed Resource tasks can be re-initiated after ProcessLog
   table lookup.
  
   2.   One worker == one resource.
  
   3.   Supports concurrent updates
  
   4.   Delete == update with empty stack
  
   5.   Rollback == update to previous know good/completed stack.
  
  
  
   Disadvantages:
  
   1.   Still holds stackLock (WIP to remove with ProcessLog)
  
  
  
   Completely understand your concern on reviewing our code, since commits
   are numerous and there is change of course at places.  Our start commit
   is [c1b3eb22f7ab6ea60b095f88982247dd249139bf] though this might not
 help J
  
  
  
   Your Thoughts.
  
  
  
   Happy Thanksgiving.
  
   Vishnu.
  
  
  
   *From:*Angus Salkeld [mailto:asalk...@mirantis.com]
   *Sent:* Thursday, November 27, 2014 9:46 AM
   *To:* OpenStack Development Mailing List (not for usage questions)
   *Subject:* Re: [openstack-dev] [Heat] Convergence proof-of-concept
 showdown
  
  
  
   On Thu, Nov 27, 2014 at 12:20 PM, Zane Bitter zbit...@redhat.com
   mailto:zbit...@redhat.com wrote:
  
   A bunch of us have spent the last few weeks working independently
 on
   proof of concept designs for the convergence architecture. I think
   those efforts have now reached a sufficient level of maturity that
   we should start working together on synthesising them into a plan
   that everyone can forge ahead with. As a starting point I'm going
 to
   summarise my take on the three efforts; hopefully the authors of
 the
   other two will weigh in to give us their perspective.
  
  
   Zane's Proposal
   ===
  
  
 https://github.com/zaneb/heat-convergence-prototype/tree/distributed-graph
  
   I implemented this as a simulator of the algorithm rather than
 using
   the Heat codebase itself in order to be able to iterate rapidly on
   the design, and indeed I have changed my mind many, many times in
   the process of implementing it. Its notable departure from a
   realistic simulation is that it runs only one operation at a time -
   essentially giving up the ability to detect race conditions in
   exchange for a completely deterministic test framework. You just
   have to imagine where the locks need to be. Incidentally, the test
   framework is designed so that it can easily be ported to the actual
   Heat code base as functional tests so that the same scenarios could
   be used without modification, allowing us to have confidence that
   the eventual implementation is a faithful replication of the
   simulation (which can be rapidly experimented on, adjusted and
   tested when we inevitably run into implementation issues).
  
   This is a complete implementation of Phase 1 (i.e. using existing
   resource plugins), including update-during-update, resource
   clean-up, replace on update and rollback; with tests.
  
   Some of the design goals which were successfully incorporated:
   - Minimise changes to Heat (it's essentially a distributed version
   of the existing algorithm), and in particular to the database
   - Work with the existing plugin API
   - Limit total DB access for Resource/Stack to O(n) in the number of
   resources
   - Limit overall DB access to O(m) in the number of edges
   - Limit lock contention

Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

2014-12-02 Thread Anant Patil
Yes, that's the synchronization block for which we use the stack lock.
Currently, a thread spin waits to acquire the lock to enter this critical
section.

I don't really know how to do application level transaction. Is there an
external library for that? AFAIK, we cannot switch from DB transaction to
application level to do some computation to resolve next set of resources,
submit to async queue, and again continue with the DB transaction. The
transnational method looks attractive and clean to me but I am limited by
my knowledge.Or do you mean to have DB transaction object made available
via DB APIs and use them in the application? Please share your thoughts.

To avoid locking the stack, we were thinking of designating a single engine
with responsibility of processing all notifications for a stack. All the
workers will notify on the stack topic, which only one engine listens to
and then the notifications end up in a queue (local to the engine, per
stack), from where they are taken up one-by-one to continue the stack
operation. The convergence jobs are produced by this engine for the stack
it is responsible for, and they might end-up in any of the engines. But the
notifications for a stack are directed to one engine to avoid contention
for lock. The convergence load is leveled and distributed, and stack lock
is not needed.

- Anant

On Tue, Dec 2, 2014 at 1:49 AM, Clint Byrum cl...@fewbar.com wrote:

 Excerpts from Anant Patil's message of 2014-11-30 23:02:29 -0800:
  On 27-Nov-14 18:03, Murugan, Visnusaran wrote:
   Hi Zane,
  
  
  
   At this stage our implementation (as mentioned in wiki
   https://wiki.openstack.org/wiki/Heat/ConvergenceDesign) achieves
 your
   design goals.
  
  
  
   1.   In case of a parallel update, our implementation adjusts graph
   according to new template and waits for dispatched resource tasks to
   complete.
  
   2.   Reason for basing our PoC on Heat code:
  
   a.   To solve contention processing parent resource by all
 dependent
   resources in parallel.
  
   b.  To avoid porting issue from PoC to HeatBase. (just to be aware
   of potential issues asap)
  
   3.   Resource timeout would be helpful, but I guess its resource
   specific and has to come from template and default values from plugins.
  
   4.   We see resource notification aggregation and processing next
   level of resources without contention and with minimal DB usage as the
   problem area. We are working on the following approaches in *parallel.*
  
   a.   Use a Queue per stack to serialize notification.
  
   b.  Get parent ProcessLog (ResourceID, EngineID) and initiate
   convergence upon first child notification. Subsequent children who fail
   to get parent resource lock will directly send message to waiting
 parent
   task (topic=stack_id.parent_resource_id)
  
   Based on performance/feedback we can select either or a mashed version.
  
  
  
   Advantages:
  
   1.   Failed Resource tasks can be re-initiated after ProcessLog
   table lookup.
  
   2.   One worker == one resource.
  
   3.   Supports concurrent updates
  
   4.   Delete == update with empty stack
  
   5.   Rollback == update to previous know good/completed stack.
  
  
  
   Disadvantages:
  
   1.   Still holds stackLock (WIP to remove with ProcessLog)
  
  
  
   Completely understand your concern on reviewing our code, since commits
   are numerous and there is change of course at places.  Our start commit
   is [c1b3eb22f7ab6ea60b095f88982247dd249139bf] though this might not
 help J
  
  
  
   Your Thoughts.
  
  
  
   Happy Thanksgiving.
  
   Vishnu.
  
  
  
   *From:*Angus Salkeld [mailto:asalk...@mirantis.com]
   *Sent:* Thursday, November 27, 2014 9:46 AM
   *To:* OpenStack Development Mailing List (not for usage questions)
   *Subject:* Re: [openstack-dev] [Heat] Convergence proof-of-concept
 showdown
  
  
  
   On Thu, Nov 27, 2014 at 12:20 PM, Zane Bitter zbit...@redhat.com
   mailto:zbit...@redhat.com wrote:
  
   A bunch of us have spent the last few weeks working independently
 on
   proof of concept designs for the convergence architecture. I think
   those efforts have now reached a sufficient level of maturity that
   we should start working together on synthesising them into a plan
   that everyone can forge ahead with. As a starting point I'm going
 to
   summarise my take on the three efforts; hopefully the authors of
 the
   other two will weigh in to give us their perspective.
  
  
   Zane's Proposal
   ===
  
  
 https://github.com/zaneb/heat-convergence-prototype/tree/distributed-graph
  
   I implemented this as a simulator of the algorithm rather than
 using
   the Heat codebase itself in order to be able to iterate rapidly on
   the design, and indeed I have changed my mind many, many times in
   the process of implementing it. Its notable departure from a
   realistic

Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

2014-12-02 Thread Clint Byrum
Excerpts from Anant Patil's message of 2014-12-02 10:37:31 -0800:
 Yes, that's the synchronization block for which we use the stack lock.
 Currently, a thread spin waits to acquire the lock to enter this critical
 section.
 
 I don't really know how to do application level transaction. Is there an
 external library for that? AFAIK, we cannot switch from DB transaction to
 application level to do some computation to resolve next set of resources,
 submit to async queue, and again continue with the DB transaction. The
 transnational method looks attractive and clean to me but I am limited by
 my knowledge.Or do you mean to have DB transaction object made available
 via DB APIs and use them in the application? Please share your thoughts.
 

Every DB request has a context attached to the currently running
greenthread which has the DB session object. So yes, you do begin the
transaction in one db API call, and try to commit it in another after
having attempted any application logic.  The whole thing should always be
in a try/except retry loop to handle deadlocks and conflicts introduced
by multi-master sync replication like Galera.

For non-transactional backends, they would have to use a spin lock like
you're using now.

 To avoid locking the stack, we were thinking of designating a single engine
 with responsibility of processing all notifications for a stack. All the
 workers will notify on the stack topic, which only one engine listens to
 and then the notifications end up in a queue (local to the engine, per
 stack), from where they are taken up one-by-one to continue the stack
 operation. The convergence jobs are produced by this engine for the stack
 it is responsible for, and they might end-up in any of the engines. But the
 notifications for a stack are directed to one engine to avoid contention
 for lock. The convergence load is leveled and distributed, and stack lock
 is not needed.
 

No don't do that. You already have that engine, it is the database engine
and it is intended to be used for synchronization and will achieve a
higher degree of concurrency in bigger stacks because it will only block
two workers when they try to inspect or change the same rows.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

2014-12-01 Thread Clint Byrum
Excerpts from Anant Patil's message of 2014-11-30 23:02:29 -0800:
 On 27-Nov-14 18:03, Murugan, Visnusaran wrote:
  Hi Zane,
  
   
  
  At this stage our implementation (as mentioned in wiki
  https://wiki.openstack.org/wiki/Heat/ConvergenceDesign) achieves your
  design goals.
  
   
  
  1.   In case of a parallel update, our implementation adjusts graph
  according to new template and waits for dispatched resource tasks to
  complete.
  
  2.   Reason for basing our PoC on Heat code:
  
  a.   To solve contention processing parent resource by all dependent
  resources in parallel.
  
  b.  To avoid porting issue from PoC to HeatBase. (just to be aware
  of potential issues asap)
  
  3.   Resource timeout would be helpful, but I guess its resource
  specific and has to come from template and default values from plugins.
  
  4.   We see resource notification aggregation and processing next
  level of resources without contention and with minimal DB usage as the
  problem area. We are working on the following approaches in *parallel.*
  
  a.   Use a Queue per stack to serialize notification.
  
  b.  Get parent ProcessLog (ResourceID, EngineID) and initiate
  convergence upon first child notification. Subsequent children who fail
  to get parent resource lock will directly send message to waiting parent
  task (topic=stack_id.parent_resource_id)
  
  Based on performance/feedback we can select either or a mashed version.
  
   
  
  Advantages:
  
  1.   Failed Resource tasks can be re-initiated after ProcessLog
  table lookup.
  
  2.   One worker == one resource.
  
  3.   Supports concurrent updates
  
  4.   Delete == update with empty stack
  
  5.   Rollback == update to previous know good/completed stack.
  
   
  
  Disadvantages:
  
  1.   Still holds stackLock (WIP to remove with ProcessLog)
  
   
  
  Completely understand your concern on reviewing our code, since commits
  are numerous and there is change of course at places.  Our start commit
  is [c1b3eb22f7ab6ea60b095f88982247dd249139bf] though this might not help J
  
   
  
  Your Thoughts.
  
   
  
  Happy Thanksgiving.
  
  Vishnu.
  
   
  
  *From:*Angus Salkeld [mailto:asalk...@mirantis.com]
  *Sent:* Thursday, November 27, 2014 9:46 AM
  *To:* OpenStack Development Mailing List (not for usage questions)
  *Subject:* Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown
  
   
  
  On Thu, Nov 27, 2014 at 12:20 PM, Zane Bitter zbit...@redhat.com
  mailto:zbit...@redhat.com wrote:
  
  A bunch of us have spent the last few weeks working independently on
  proof of concept designs for the convergence architecture. I think
  those efforts have now reached a sufficient level of maturity that
  we should start working together on synthesising them into a plan
  that everyone can forge ahead with. As a starting point I'm going to
  summarise my take on the three efforts; hopefully the authors of the
  other two will weigh in to give us their perspective.
  
  
  Zane's Proposal
  ===
  
  
  https://github.com/zaneb/heat-convergence-prototype/tree/distributed-graph
  
  I implemented this as a simulator of the algorithm rather than using
  the Heat codebase itself in order to be able to iterate rapidly on
  the design, and indeed I have changed my mind many, many times in
  the process of implementing it. Its notable departure from a
  realistic simulation is that it runs only one operation at a time -
  essentially giving up the ability to detect race conditions in
  exchange for a completely deterministic test framework. You just
  have to imagine where the locks need to be. Incidentally, the test
  framework is designed so that it can easily be ported to the actual
  Heat code base as functional tests so that the same scenarios could
  be used without modification, allowing us to have confidence that
  the eventual implementation is a faithful replication of the
  simulation (which can be rapidly experimented on, adjusted and
  tested when we inevitably run into implementation issues).
  
  This is a complete implementation of Phase 1 (i.e. using existing
  resource plugins), including update-during-update, resource
  clean-up, replace on update and rollback; with tests.
  
  Some of the design goals which were successfully incorporated:
  - Minimise changes to Heat (it's essentially a distributed version
  of the existing algorithm), and in particular to the database
  - Work with the existing plugin API
  - Limit total DB access for Resource/Stack to O(n) in the number of
  resources
  - Limit overall DB access to O(m) in the number of edges
  - Limit lock contention to only those operations actually contending
  (i.e. no global locks)
  - Each worker task deals with only one resource
  - Only read

Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

2014-11-30 Thread Anant Patil
On 27-Nov-14 18:03, Murugan, Visnusaran wrote:
 Hi Zane,
 
  
 
 At this stage our implementation (as mentioned in wiki
 https://wiki.openstack.org/wiki/Heat/ConvergenceDesign) achieves your
 design goals.
 
  
 
 1.   In case of a parallel update, our implementation adjusts graph
 according to new template and waits for dispatched resource tasks to
 complete.
 
 2.   Reason for basing our PoC on Heat code:
 
 a.   To solve contention processing parent resource by all dependent
 resources in parallel.
 
 b.  To avoid porting issue from PoC to HeatBase. (just to be aware
 of potential issues asap)
 
 3.   Resource timeout would be helpful, but I guess its resource
 specific and has to come from template and default values from plugins.
 
 4.   We see resource notification aggregation and processing next
 level of resources without contention and with minimal DB usage as the
 problem area. We are working on the following approaches in *parallel.*
 
 a.   Use a Queue per stack to serialize notification.
 
 b.  Get parent ProcessLog (ResourceID, EngineID) and initiate
 convergence upon first child notification. Subsequent children who fail
 to get parent resource lock will directly send message to waiting parent
 task (topic=stack_id.parent_resource_id)
 
 Based on performance/feedback we can select either or a mashed version.
 
  
 
 Advantages:
 
 1.   Failed Resource tasks can be re-initiated after ProcessLog
 table lookup.
 
 2.   One worker == one resource.
 
 3.   Supports concurrent updates
 
 4.   Delete == update with empty stack
 
 5.   Rollback == update to previous know good/completed stack.
 
  
 
 Disadvantages:
 
 1.   Still holds stackLock (WIP to remove with ProcessLog)
 
  
 
 Completely understand your concern on reviewing our code, since commits
 are numerous and there is change of course at places.  Our start commit
 is [c1b3eb22f7ab6ea60b095f88982247dd249139bf] though this might not help J
 
  
 
 Your Thoughts.
 
  
 
 Happy Thanksgiving.
 
 Vishnu.
 
  
 
 *From:*Angus Salkeld [mailto:asalk...@mirantis.com]
 *Sent:* Thursday, November 27, 2014 9:46 AM
 *To:* OpenStack Development Mailing List (not for usage questions)
 *Subject:* Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown
 
  
 
 On Thu, Nov 27, 2014 at 12:20 PM, Zane Bitter zbit...@redhat.com
 mailto:zbit...@redhat.com wrote:
 
 A bunch of us have spent the last few weeks working independently on
 proof of concept designs for the convergence architecture. I think
 those efforts have now reached a sufficient level of maturity that
 we should start working together on synthesising them into a plan
 that everyone can forge ahead with. As a starting point I'm going to
 summarise my take on the three efforts; hopefully the authors of the
 other two will weigh in to give us their perspective.
 
 
 Zane's Proposal
 ===
 
 https://github.com/zaneb/heat-convergence-prototype/tree/distributed-graph
 
 I implemented this as a simulator of the algorithm rather than using
 the Heat codebase itself in order to be able to iterate rapidly on
 the design, and indeed I have changed my mind many, many times in
 the process of implementing it. Its notable departure from a
 realistic simulation is that it runs only one operation at a time -
 essentially giving up the ability to detect race conditions in
 exchange for a completely deterministic test framework. You just
 have to imagine where the locks need to be. Incidentally, the test
 framework is designed so that it can easily be ported to the actual
 Heat code base as functional tests so that the same scenarios could
 be used without modification, allowing us to have confidence that
 the eventual implementation is a faithful replication of the
 simulation (which can be rapidly experimented on, adjusted and
 tested when we inevitably run into implementation issues).
 
 This is a complete implementation of Phase 1 (i.e. using existing
 resource plugins), including update-during-update, resource
 clean-up, replace on update and rollback; with tests.
 
 Some of the design goals which were successfully incorporated:
 - Minimise changes to Heat (it's essentially a distributed version
 of the existing algorithm), and in particular to the database
 - Work with the existing plugin API
 - Limit total DB access for Resource/Stack to O(n) in the number of
 resources
 - Limit overall DB access to O(m) in the number of edges
 - Limit lock contention to only those operations actually contending
 (i.e. no global locks)
 - Each worker task deals with only one resource
 - Only read resource attributes once
 
 
 Open questions:
 - What do we do when we encounter a resource that is in progress
 from a previous update while doing a subsequent update? Obviously we
 don't

Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

2014-11-27 Thread Jastrzebski, Michal
W dniu 11/27/2014 o 5:15 AM, Angus Salkeld pisze: On Thu, Nov 27, 2014 
at 12:20 PM, Zane Bitter zbit...@redhat.com 
 mailto:zbit...@redhat.com wrote:
 
 A bunch of us have spent the last few weeks working independently on
 proof of concept designs for the convergence architecture. I think
 those efforts have now reached a sufficient level of maturity that
 we should start working together on synthesising them into a plan
 that everyone can forge ahead with. As a starting point I'm going to
 summarise my take on the three efforts; hopefully the authors of the
 other two will weigh in to give us their perspective.
 
 
 Zane's Proposal
 ===
 
 
 https://github.com/zaneb/heat-__convergence-prototype/tree/__distributed-graph
 
 https://github.com/zaneb/heat-convergence-prototype/tree/distributed-graph
 
 I implemented this as a simulator of the algorithm rather than using
 the Heat codebase itself in order to be able to iterate rapidly on
 the design, and indeed I have changed my mind many, many times in
 the process of implementing it. Its notable departure from a
 realistic simulation is that it runs only one operation at a time -
 essentially giving up the ability to detect race conditions in
 exchange for a completely deterministic test framework. You just
 have to imagine where the locks need to be. Incidentally, the test
 framework is designed so that it can easily be ported to the actual
 Heat code base as functional tests so that the same scenarios could
 be used without modification, allowing us to have confidence that
 the eventual implementation is a faithful replication of the
 simulation (which can be rapidly experimented on, adjusted and
 tested when we inevitably run into implementation issues).
 
 This is a complete implementation of Phase 1 (i.e. using existing
 resource plugins), including update-during-update, resource
 clean-up, replace on update and rollback; with tests.
 
 Some of the design goals which were successfully incorporated:
 - Minimise changes to Heat (it's essentially a distributed version
 of the existing algorithm), and in particular to the database
 - Work with the existing plugin API
 - Limit total DB access for Resource/Stack to O(n) in the number of
 resources
 - Limit overall DB access to O(m) in the number of edges
 - Limit lock contention to only those operations actually contending
 (i.e. no global locks)
 - Each worker task deals with only one resource
 - Only read resource attributes once
 
 Open questions:
 - What do we do when we encounter a resource that is in progress
 from a previous update while doing a subsequent update? Obviously we
 don't want to interrupt it, as it will likely be left in an unknown
 state. Making a replacement is one obvious answer, but in many cases
 there could be serious down-sides to that. How long should we wait
 before trying it? What if it's still in progress because the engine
 processing the resource already died?
 
 
 Also, how do we implement resource level timeouts in general?
 
 
 Michał's Proposal
 =
 
 https://github.com/inc0/heat-__convergence-prototype/tree/__iterative 
 https://github.com/inc0/heat-convergence-prototype/tree/iterative
 
 Note that a version modified by me to use the same test scenario
 format (but not the same scenarios) is here:
 
 
 https://github.com/zaneb/heat-__convergence-prototype/tree/__iterative-adapted
 
 https://github.com/zaneb/heat-convergence-prototype/tree/iterative-adapted
 
 This is based on my simulation framework after a fashion, but with
 everything implemented synchronously and a lot of handwaving about
 how the actual implementation could be distributed. The central
 premise is that at each step of the algorithm, the entire graph is
 examined for tasks that can be performed next, and those are then
 started. Once all are complete (it's synchronous, remember), the
 next step is run. Keen observers will be asking how we know when it
 is time to run the next step in a distributed version of this
 algorithm, where it will be run and what to do about resources that
 are in an intermediate state at that time. All of these questions
 remain unanswered.

Q1. When is time to run next step?
There is no really next step or previous step, there is step which can be run 
*right now*, which effectively means when current step finishes, because then 
and only then all requirements are met and we can proceed.

Q2. I can see 3 main processes there:

* Converger (I'd assume thats current engine):
Process which parses graph and schedules next steps, it will be run after 
change in reality is detected.

* Observer:
Process which keeps reality_db and actual resources aligned. Its mostly for 
phase2. This one 

Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

2014-11-27 Thread Murugan, Visnusaran
Hi Zane,

At this stage our implementation (as mentioned in 
wikihttps://wiki.openstack.org/wiki/Heat/ConvergenceDesign) achieves your 
design goals.


1.   In case of a parallel update, our implementation adjusts graph 
according to new template and waits for dispatched resource tasks to complete.

2.   Reason for basing our PoC on Heat code:

a.   To solve contention processing parent resource by all dependent 
resources in parallel.

b.  To avoid porting issue from PoC to HeatBase. (just to be aware of 
potential issues asap)

3.   Resource timeout would be helpful, but I guess its resource specific 
and has to come from template and default values from plugins.

4.   We see resource notification aggregation and processing next level of 
resources without contention and with minimal DB usage as the problem area. We 
are working on the following approaches in parallel.

a.   Use a Queue per stack to serialize notification.

b.  Get parent ProcessLog (ResourceID, EngineID) and initiate convergence 
upon first child notification. Subsequent children who fail to get parent 
resource lock will directly send message to waiting parent task 
(topic=stack_id.parent_resource_id)
Based on performance/feedback we can select either or a mashed version.

Advantages:

1.   Failed Resource tasks can be re-initiated after ProcessLog table 
lookup.

2.   One worker == one resource.

3.   Supports concurrent updates

4.   Delete == update with empty stack

5.   Rollback == update to previous know good/completed stack.

Disadvantages:

1.   Still holds stackLock (WIP to remove with ProcessLog)

Completely understand your concern on reviewing our code, since commits are 
numerous and there is change of course at places.  Our start commit is 
[c1b3eb22f7ab6ea60b095f88982247dd249139bf] though this might not help ☺

Your Thoughts.

Happy Thanksgiving.
Vishnu.

From: Angus Salkeld [mailto:asalk...@mirantis.com]
Sent: Thursday, November 27, 2014 9:46 AM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

On Thu, Nov 27, 2014 at 12:20 PM, Zane Bitter 
zbit...@redhat.commailto:zbit...@redhat.com wrote:
A bunch of us have spent the last few weeks working independently on proof of 
concept designs for the convergence architecture. I think those efforts have 
now reached a sufficient level of maturity that we should start working 
together on synthesising them into a plan that everyone can forge ahead with. 
As a starting point I'm going to summarise my take on the three efforts; 
hopefully the authors of the other two will weigh in to give us their 
perspective.


Zane's Proposal
===

https://github.com/zaneb/heat-convergence-prototype/tree/distributed-graph

I implemented this as a simulator of the algorithm rather than using the Heat 
codebase itself in order to be able to iterate rapidly on the design, and 
indeed I have changed my mind many, many times in the process of implementing 
it. Its notable departure from a realistic simulation is that it runs only one 
operation at a time - essentially giving up the ability to detect race 
conditions in exchange for a completely deterministic test framework. You just 
have to imagine where the locks need to be. Incidentally, the test framework is 
designed so that it can easily be ported to the actual Heat code base as 
functional tests so that the same scenarios could be used without modification, 
allowing us to have confidence that the eventual implementation is a faithful 
replication of the simulation (which can be rapidly experimented on, adjusted 
and tested when we inevitably run into implementation issues).

This is a complete implementation of Phase 1 (i.e. using existing resource 
plugins), including update-during-update, resource clean-up, replace on update 
and rollback; with tests.

Some of the design goals which were successfully incorporated:
- Minimise changes to Heat (it's essentially a distributed version of the 
existing algorithm), and in particular to the database
- Work with the existing plugin API
- Limit total DB access for Resource/Stack to O(n) in the number of resources
- Limit overall DB access to O(m) in the number of edges
- Limit lock contention to only those operations actually contending (i.e. no 
global locks)
- Each worker task deals with only one resource
- Only read resource attributes once


Open questions:
- What do we do when we encounter a resource that is in progress from a 
previous update while doing a subsequent update? Obviously we don't want to 
interrupt it, as it will likely be left in an unknown state. Making a 
replacement is one obvious answer, but in many cases there could be serious 
down-sides to that. How long should we wait before trying it? What if it's 
still in progress because the engine processing the resource already died?

Also, how do we implement resource level

[openstack-dev] [Heat] Convergence proof-of-concept showdown

2014-11-26 Thread Zane Bitter
A bunch of us have spent the last few weeks working independently on 
proof of concept designs for the convergence architecture. I think those 
efforts have now reached a sufficient level of maturity that we should 
start working together on synthesising them into a plan that everyone 
can forge ahead with. As a starting point I'm going to summarise my take 
on the three efforts; hopefully the authors of the other two will weigh 
in to give us their perspective.



Zane's Proposal
===

https://github.com/zaneb/heat-convergence-prototype/tree/distributed-graph

I implemented this as a simulator of the algorithm rather than using the 
Heat codebase itself in order to be able to iterate rapidly on the 
design, and indeed I have changed my mind many, many times in the 
process of implementing it. Its notable departure from a realistic 
simulation is that it runs only one operation at a time - essentially 
giving up the ability to detect race conditions in exchange for a 
completely deterministic test framework. You just have to imagine where 
the locks need to be. Incidentally, the test framework is designed so 
that it can easily be ported to the actual Heat code base as functional 
tests so that the same scenarios could be used without modification, 
allowing us to have confidence that the eventual implementation is a 
faithful replication of the simulation (which can be rapidly 
experimented on, adjusted and tested when we inevitably run into 
implementation issues).


This is a complete implementation of Phase 1 (i.e. using existing 
resource plugins), including update-during-update, resource clean-up, 
replace on update and rollback; with tests.


Some of the design goals which were successfully incorporated:
- Minimise changes to Heat (it's essentially a distributed version of 
the existing algorithm), and in particular to the database

- Work with the existing plugin API
- Limit total DB access for Resource/Stack to O(n) in the number of 
resources

- Limit overall DB access to O(m) in the number of edges
- Limit lock contention to only those operations actually contending 
(i.e. no global locks)

- Each worker task deals with only one resource
- Only read resource attributes once

Open questions:
- What do we do when we encounter a resource that is in progress from a 
previous update while doing a subsequent update? Obviously we don't want 
to interrupt it, as it will likely be left in an unknown state. Making a 
replacement is one obvious answer, but in many cases there could be 
serious down-sides to that. How long should we wait before trying it? 
What if it's still in progress because the engine processing the 
resource already died?



Michał's Proposal
=

https://github.com/inc0/heat-convergence-prototype/tree/iterative

Note that a version modified by me to use the same test scenario format 
(but not the same scenarios) is here:


https://github.com/zaneb/heat-convergence-prototype/tree/iterative-adapted

This is based on my simulation framework after a fashion, but with 
everything implemented synchronously and a lot of handwaving about how 
the actual implementation could be distributed. The central premise is 
that at each step of the algorithm, the entire graph is examined for 
tasks that can be performed next, and those are then started. Once all 
are complete (it's synchronous, remember), the next step is run. Keen 
observers will be asking how we know when it is time to run the next 
step in a distributed version of this algorithm, where it will be run 
and what to do about resources that are in an intermediate state at that 
time. All of these questions remain unanswered.


A non-exhaustive list of concerns I have:
- Replace on update is not implemented yet
- AFAIK rollback is not implemented yet
- The simulation doesn't actually implement the proposed architecture
- This approach is punishingly heavy on the database - O(n^2) or worse
- A lot of phase 2 is mixed in with phase 1 here, making it difficult to 
evaluate which changes need to be made first and whether this approach 
works with existing plugins
- The code is not really based on how Heat works at the moment, so there 
would be either a major redesign required or lots of radical changes in 
Heat or both


I think there's a fair chance that given another 3-4 weeks to work on 
this, all of these issues and others could probably be resolved. The 
question for me at this point is not so much if but why.


Michał believes that this approach will make Phase 2 easier to 
implement, which is a valid reason to consider it. However, I'm not 
aware of any particular issues that my approach would cause in 
implementing phase 2 (note that I have barely looked into it at all 
though). In fact, I very much want Phase 2 to be entirely encapsulated 
by the Resource class, so that the plugin type (legacy vs. 
convergence-enabled) is transparent to the rest of the system. Only in 
this way can we be sure that we'll be 

Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

2014-11-26 Thread Angus Salkeld
On Thu, Nov 27, 2014 at 12:20 PM, Zane Bitter zbit...@redhat.com wrote:

 A bunch of us have spent the last few weeks working independently on proof
 of concept designs for the convergence architecture. I think those efforts
 have now reached a sufficient level of maturity that we should start
 working together on synthesising them into a plan that everyone can forge
 ahead with. As a starting point I'm going to summarise my take on the three
 efforts; hopefully the authors of the other two will weigh in to give us
 their perspective.


 Zane's Proposal
 ===

 https://github.com/zaneb/heat-convergence-prototype/tree/distributed-graph

 I implemented this as a simulator of the algorithm rather than using the
 Heat codebase itself in order to be able to iterate rapidly on the design,
 and indeed I have changed my mind many, many times in the process of
 implementing it. Its notable departure from a realistic simulation is that
 it runs only one operation at a time - essentially giving up the ability to
 detect race conditions in exchange for a completely deterministic test
 framework. You just have to imagine where the locks need to be.
 Incidentally, the test framework is designed so that it can easily be
 ported to the actual Heat code base as functional tests so that the same
 scenarios could be used without modification, allowing us to have
 confidence that the eventual implementation is a faithful replication of
 the simulation (which can be rapidly experimented on, adjusted and tested
 when we inevitably run into implementation issues).

 This is a complete implementation of Phase 1 (i.e. using existing resource
 plugins), including update-during-update, resource clean-up, replace on
 update and rollback; with tests.

 Some of the design goals which were successfully incorporated:
 - Minimise changes to Heat (it's essentially a distributed version of the
 existing algorithm), and in particular to the database
 - Work with the existing plugin API
 - Limit total DB access for Resource/Stack to O(n) in the number of
 resources
 - Limit overall DB access to O(m) in the number of edges
 - Limit lock contention to only those operations actually contending (i.e.
 no global locks)
 - Each worker task deals with only one resource
 - Only read resource attributes once

 Open questions:
 - What do we do when we encounter a resource that is in progress from a
 previous update while doing a subsequent update? Obviously we don't want to
 interrupt it, as it will likely be left in an unknown state. Making a
 replacement is one obvious answer, but in many cases there could be serious
 down-sides to that. How long should we wait before trying it? What if it's
 still in progress because the engine processing the resource already died?


Also, how do we implement resource level timeouts in general?



 Michał's Proposal
 =

 https://github.com/inc0/heat-convergence-prototype/tree/iterative

 Note that a version modified by me to use the same test scenario format
 (but not the same scenarios) is here:

 https://github.com/zaneb/heat-convergence-prototype/tree/iterative-adapted

 This is based on my simulation framework after a fashion, but with
 everything implemented synchronously and a lot of handwaving about how the
 actual implementation could be distributed. The central premise is that at
 each step of the algorithm, the entire graph is examined for tasks that can
 be performed next, and those are then started. Once all are complete (it's
 synchronous, remember), the next step is run. Keen observers will be asking
 how we know when it is time to run the next step in a distributed version
 of this algorithm, where it will be run and what to do about resources that
 are in an intermediate state at that time. All of these questions remain
 unanswered.


Yes, I was struggling to figure out how it could manage an IN_PROGRESS
state as it's stateless. So you end up treading on the other action's toes.
Assuming we use the resource's state (IN_PROGRESS) you could get around
that. Then you kick off a converge when ever an action completes (if there
is nothing new to be
done then do nothing).



 A non-exhaustive list of concerns I have:
 - Replace on update is not implemented yet
 - AFAIK rollback is not implemented yet
 - The simulation doesn't actually implement the proposed architecture
 - This approach is punishingly heavy on the database - O(n^2) or worse


Yes, re-reading the state of all resources when ever run a new converge is
worrying, but I think Michal had some ideas to minimize this.


 - A lot of phase 2 is mixed in with phase 1 here, making it difficult to
 evaluate which changes need to be made first and whether this approach
 works with existing plugins
 - The code is not really based on how Heat works at the moment, so there
 would be either a major redesign required or lots of radical changes in
 Heat or both

 I think there's a fair chance that given another 3-4 weeks to work on