[openstack-dev] [Heat] Final steps toward a Convergence design

Zane Bitter Mon, 19 Jan 2015 17:38:45 -0800

Hi folks,

I'd like to come to agreement on the last major questions of theconvergence design. I well aware that I am the current bottleneck as Ihave been struggling to find enough time to make progress on it, but Ithink we are now actually very close.

I believe the last remaining issue to be addressed is the question ofwhat to do when we want to update a resource that is still IN_PROGRESSas the result of a previous (now cancelled, obviously) update.


There are, of course, a couple of trivial and wrong ways to handle it:

1) Throw UpdateReplace and make a new one
 - This is obviously a terrible solution for the user

2) Poll the DB in a loop until the previous update finishes
 - This is obviously horribly inefficient

So the preferred solution here needs to involve retriggering theresource's task in the current update once the one from the previousupdate is complete.

I've implemented some changes in the simulator - although note thatunlike stuff I implemented previously, this is extremely poorly tested(if at all) since the simulator runs the tasks serially and thereforenever hits this case. So code review would be appreciated. I committedthe changes on a new branch, "resumable":


https://github.com/zaneb/heat-convergence-prototype/commits/resumable

Here is a brief summary:
- The SyncPoints are now:
  * created for every resource, regardless of how many dependencies it has.

* created at the beginning of an update and deleted before beginninganother update.* contain only the list of satisfied dependencies (and their RefIdand attribute values).- The graph is now stored in the Stack table, rather than passed throughthe chain of trigger notifications.- We'll use locks in the Resource table to ensure that only one actionat a time can happen on a Resource.- When a trigger is received for a resource that is locked (i.e. statusis IN_PROGRESS and the engine owning it is still alive), the trigger isignored.- When processing of a resource completes, a failure to find any of thesync points that are to be notified (every resource has at least one,since the last resource in each chain must notify the stack that it iscomplete) indicates that the current update has been cancelled andtriggers a new check on the resource with the data for the currentupdate (retrieved from the Stack table) if it is ready (as indicated byits SyncPoint entry).

I'm not 100% happy with the amount of extra load this puts on thedatabase, but I can't see a way to do significantly better and stillsolve this locking issue. Suggestions are welcome. At least the commoncase is considerably better than the worst case.

There are two races here that we need to satisfy ourselves we haveanswers for (I think we do):1) Since old SyncPoints are deleted before a new transition begins andwe only look for them after unlocking the resource being processed, Idon't believe that both the previous and the new update can fail totrigger the check on the resource in the new update's traversal. (Ifthere are any DB experts out there, I'd be interested in their input onthis one.)2) When both the previous and the new update end up triggering a checkon the resource in the new update's traversal, we'll only perform onebecause one will succeed in locking the resource and the other will justbe ignored after it fails to acquire the lock. (This one is watertight,since both processes are acting on the same lock.)

I believe that this model is very close to what Anant and his team areproposing. Arguably this means I've been wasting everyone's time, but ahappier way to look at it is that two mostly independent design effortsconverging on a similar solution is something we can take a lot ofconfidence from ;)

My next task is to start breaking this down into blueprints that folkscan start implementing. In the meantime, it would be great if we couldidentify any remaining discrepancies between the two designs andcompletely close those last gaps.


cheers,
Zane.

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [Heat] Final steps toward a Convergence design

Reply via email to