Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

Zane Bitter Wed, 10 Dec 2014 10:16:21 -0800

You really need to get a real email client with quoting support ;)


On 10/12/14 06:42, Murugan, Visnusaran wrote:

Well, we still have to persist the dependencies of each version of a resource 
_somehow_, because otherwise we can't know how to clean them up in the correct 
order. But what I think you meant to say is that this approach doesn't require 
it to be persisted in a separate table where the rows are marked as traversed 
as we work through the graph.

[Murugan, Visnusaran]
In case of rollback where we have to cleanup earlier version of resources, we 
could get the order from old template. We'd prefer not to have a graph table.

In theory you could get it by keeping old templates around. But thatmeans keeping a lot of templates, and it will be hard to keep track ofwhen you want to delete them. It also means that when starting an updateyou'll need to load every existing previous version of the template inorder to calculate the dependencies. It also leaves the dependencies inan ambiguous state when a resource fails, and although that can beworked around it will be a giant pain to implement.

I agree that I'd prefer not to have a graph table. After trying a coupleof different things I decided to store the dependencies in the Resourcetable, where we can read or write them virtually for free because itturns out that we are always reading or updating the Resource itself atexactly the same time anyway.

This approach reduces DB queries by waiting for completion notification on a topic. The 
drawback I see is that delete stack stream will be huge as it will have the entire graph. 
We can always dump such data in ResourceLock.data Json and pass a simple flag 
"load_stream_from_db" to converge RPC call as a workaround for delete operation.


This seems to be essentially equivalent to my 'SyncPoint' proposal[1], with the 
key difference that the data is stored in-memory in a Heat engine rather than 
the database.

I suspect it's probably a mistake to move it in-memory for similar reasons to 
the argument Clint made against synchronising the marking off of dependencies 
in-memory. The database can handle that and the problem of making the DB robust 
against failures of a single machine has already been solved by someone else. 
If we do it in-memory we are just creating a single point of failure for not 
much gain. (I guess you could argue it doesn't matter, since if any Heat engine 
dies during the traversal then we'll have to kick off another one anyway, but 
it does limit our options if that changes in the future.)
[Murugan, Visnusaran] Resource completes, removes itself from resource_lock and 
notifies engine. Engine will acquire parent lock and initiate parent only if 
all its children are satisfied (no child entry in resource_lock). This will 
come in place of Aggregator.

Yep, if you s/resource_lock/SyncPoint/ that's more or less exactly whatI did. The three differences I can see are:

1) I think you are proposing to create all of the sync points at thestart of the traversal, rather than on an as-needed basis. This isprobably a good idea. I didn't consider it because of the way myprototype evolved, but there's now no reason I can see not to do this.If we could move the data to the Resource table itself then we couldeven get it for free from an efficiency point of view.2) You're using a single list from which items are removed, rather thantwo lists (one static, and one to which items are added) that getcompared. Assuming (1) then this is probably a good idea too.3) You're suggesting to notify the engine unconditionally and let theengine decide if the list is empty. That's probably not a good idea -not only does it require extra reads, it introduces a race conditionthat you then have to solve (it can be solved, it's just more work).Since the update to remove a child from the list is atomic, it's best tojust trigger the engine only if the list is now empty.

It's not clear to me how the 'streams' differ in practical terms from just 
passing a serialisation of the Dependencies object, other than being 
incomprehensible to me ;). The current Dependencies implementation
(1) is a very generic implementation of a DAG, (2) works and has plenty of unit 
tests, (3) has, with I think one exception, a pretty straightforward API, (4) 
has a very simple serialisation, returned by the edges() method, which can be 
passed back into the constructor to recreate it, and (5) has an API that is to 
some extent relied upon by resources, and so won't likely be removed outright 
in any event.
Whatever code we need to handle dependencies ought to just build on this 
existing implementation.
[Murugan, Visnusaran] Our thought was to reduce payload size (template/graph). 
Just planning for worst case scenario (million resource stack) We could always 
dump them in ResourceLock.data to be loaded by Worker.

If there's a smaller representation of a graph than a list of edges thenI don't know what it is. The proposed stream structure certainly isn'tit, unless you mean as an alternative to storing the entire graph oncefor each resource. A better alternative is to store it once centrally -in my current implementation it is passed down through the triggermessages, but since only one traversal can be in progress at a time itcould just as easily be stored in the Stack table of the database at theslight cost of an extra write.

I'm not opposed to doing that, BTW. In fact, I'm really interested inyour input on how that might help make recovery from failure morerobust. I know Anant mentioned that not storing enough data to recoverwhen a node dies was his big concern with my current approach.

I can see that by both creating all the sync points at the start of thetraversal and storing the dependency graph in the database instead ofletting it flow through the RPC messages, we would be able to resume atraversal where it left off, though I'm not sure what that buys us.

And I guess what you're suggesting is that by having an explicit lockwith the engine ID specified, we can detect when a resource is stuck inIN_PROGRESS due to an engine going down? That's actually pretty interesting.

Based on our call on Thursday, I think you're taking the idea of the Lock table 
too literally. The point of referring to locks is that we can use the same 
concepts as the Lock table relies on to do atomic updates on a particular row 
of the database, and we can use those atomic updates to prevent race conditions 
when implementing SyncPoints/Aggregators/whatever you want to call them. It's 
not that we'd actually use the Lock table itself, which implements a mutex and 
therefore offers only a much slower and more stateful way of doing what we want 
(lock mutex, change data, unlock mutex).
[Murugan, Visnusaran] Are you suggesting something like a select-for-update in 
resource table itself without having  a lock table?


Yes, that's exactly what I was suggesting.

cheers,
Zane.

_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

Reply via email to