Hi folks,
I'd like to come to agreement on the last major questions of the convergence design. I well aware that I am the current bottleneck as I have been struggling to find enough time to make progress on it, but I think we are now actually very close.

I believe the last remaining issue to be addressed is the question of what to do when we want to update a resource that is still IN_PROGRESS as the result of a previous (now cancelled, obviously) update.

There are, of course, a couple of trivial and wrong ways to handle it:

1) Throw UpdateReplace and make a new one
 - This is obviously a terrible solution for the user

2) Poll the DB in a loop until the previous update finishes
 - This is obviously horribly inefficient

So the preferred solution here needs to involve retriggering the resource's task in the current update once the one from the previous update is complete.


I've implemented some changes in the simulator - although note that unlike stuff I implemented previously, this is extremely poorly tested (if at all) since the simulator runs the tasks serially and therefore never hits this case. So code review would be appreciated. I committed the changes on a new branch, "resumable":

https://github.com/zaneb/heat-convergence-prototype/commits/resumable

Here is a brief summary:
- The SyncPoints are now:
  * created for every resource, regardless of how many dependencies it has.
* created at the beginning of an update and deleted before beginning another update. * contain only the list of satisfied dependencies (and their RefId and attribute values). - The graph is now stored in the Stack table, rather than passed through the chain of trigger notifications. - We'll use locks in the Resource table to ensure that only one action at a time can happen on a Resource. - When a trigger is received for a resource that is locked (i.e. status is IN_PROGRESS and the engine owning it is still alive), the trigger is ignored. - When processing of a resource completes, a failure to find any of the sync points that are to be notified (every resource has at least one, since the last resource in each chain must notify the stack that it is complete) indicates that the current update has been cancelled and triggers a new check on the resource with the data for the current update (retrieved from the Stack table) if it is ready (as indicated by its SyncPoint entry).

I'm not 100% happy with the amount of extra load this puts on the database, but I can't see a way to do significantly better and still solve this locking issue. Suggestions are welcome. At least the common case is considerably better than the worst case.

There are two races here that we need to satisfy ourselves we have answers for (I think we do): 1) Since old SyncPoints are deleted before a new transition begins and we only look for them after unlocking the resource being processed, I don't believe that both the previous and the new update can fail to trigger the check on the resource in the new update's traversal. (If there are any DB experts out there, I'd be interested in their input on this one.) 2) When both the previous and the new update end up triggering a check on the resource in the new update's traversal, we'll only perform one because one will succeed in locking the resource and the other will just be ignored after it fails to acquire the lock. (This one is watertight, since both processes are acting on the same lock.)


I believe that this model is very close to what Anant and his team are proposing. Arguably this means I've been wasting everyone's time, but a happier way to look at it is that two mostly independent design efforts converging on a similar solution is something we can take a lot of confidence from ;)

My next task is to start breaking this down into blueprints that folks can start implementing. In the meantime, it would be great if we could identify any remaining discrepancies between the two designs and completely close those last gaps.

cheers,
Zane.

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to