> -----Original Message----- > From: Zane Bitter [mailto:[email protected]] > Sent: Friday, December 12, 2014 6:37 AM > To: [email protected] > Subject: Re: [openstack-dev] [Heat] Convergence proof-of-concept > showdown > > On 11/12/14 08:26, Murugan, Visnusaran wrote: > >>> [Murugan, Visnusaran] > >>> In case of rollback where we have to cleanup earlier version of > >>> resources, > >> we could get the order from old template. We'd prefer not to have a > >> graph table. > >> > >> In theory you could get it by keeping old templates around. But that > >> means keeping a lot of templates, and it will be hard to keep track > >> of when you want to delete them. It also means that when starting an > >> update you'll need to load every existing previous version of the > >> template in order to calculate the dependencies. It also leaves the > >> dependencies in an ambiguous state when a resource fails, and > >> although that can be worked around it will be a giant pain to implement. > >> > > > > Agree that looking to all templates for a delete is not good. But > > baring Complexity, we feel we could achieve it by way of having an > > update and a delete stream for a stack update operation. I will > > elaborate in detail in the etherpad sometime tomorrow :) > > > >> I agree that I'd prefer not to have a graph table. After trying a > >> couple of different things I decided to store the dependencies in the > >> Resource table, where we can read or write them virtually for free > >> because it turns out that we are always reading or updating the > >> Resource itself at exactly the same time anyway. > >> > > > > Not sure how this will work in an update scenario when a resource does > > not change and its dependencies do. > > We'll always update the requirements, even when the properties don't > change. >
Can you elaborate a bit on rollback. We had an approach with depends_on and needed_by columns in ResourceTable. But dropped it when we figured out we had too many DB operations for Update. > > Also taking care of deleting resources in order will be an issue. > > It works fine. > > > This implies that there will be different versions of a resource which > > will even complicate further. > > No it doesn't, other than the different versions we already have due to > UpdateReplace. > > >>>> This approach reduces DB queries by waiting for completion > >>>> notification > >> on a topic. The drawback I see is that delete stack stream will be > >> huge as it will have the entire graph. We can always dump such data > >> in ResourceLock.data Json and pass a simple flag > >> "load_stream_from_db" to converge RPC call as a workaround for delete > operation. > >>> > >>> This seems to be essentially equivalent to my 'SyncPoint' > >>> proposal[1], with > >> the key difference that the data is stored in-memory in a Heat engine > >> rather than the database. > >>> > >>> I suspect it's probably a mistake to move it in-memory for similar > >>> reasons to the argument Clint made against synchronising the marking > >>> off > >> of dependencies in-memory. The database can handle that and the > >> problem of making the DB robust against failures of a single machine > >> has already been solved by someone else. If we do it in-memory we are > >> just creating a single point of failure for not much gain. (I guess > >> you could argue it doesn't matter, since if any Heat engine dies > >> during the traversal then we'll have to kick off another one anyway, > >> but it does limit our options if that changes in the > >> future.) [Murugan, Visnusaran] Resource completes, removes itself > >> from resource_lock and notifies engine. Engine will acquire parent > >> lock and initiate parent only if all its children are satisfied (no child > >> entry in > resource_lock). > >> This will come in place of Aggregator. > >> > >> Yep, if you s/resource_lock/SyncPoint/ that's more or less exactly what I > did. > >> The three differences I can see are: > >> > >> 1) I think you are proposing to create all of the sync points at the > >> start of the traversal, rather than on an as-needed basis. This is > >> probably a good idea. I didn't consider it because of the way my > >> prototype evolved, but there's now no reason I can see not to do this. > >> If we could move the data to the Resource table itself then we could > >> even get it for free from an efficiency point of view. > > > > +1. But we will need engine_id to be stored somewhere for recovery > purpose (easy to be queried format). > > Yeah, so I'm starting to think you're right, maybe the/a Lock table is the > right > thing to use there. We could probably do it within the resource table using > the same select-for-update to set the engine_id, but I agree that we might > be starting to jam too much into that one table. > yeah. Unrelated values in resource table. Upon resource completion we have to unset engine_id as well as compared to dropping a row from resource lock. Both are good. Having engine_id in resource_table will reduce db operaions in half. We should go with just resource table along with engine_id. > > Sync points are created as-needed. Single resource is enough to restart > that entire stream. > > I think there is a disconnect in our understanding. I will detail it as > > well in > the etherpad. > > OK, that would be good. > > >> 2) You're using a single list from which items are removed, rather > >> than two lists (one static, and one to which items are added) that get > compared. > >> Assuming (1) then this is probably a good idea too. > > > > Yeah. We have a single list per active stream which work by removing > > Complete/satisfied resources from it. > > I went to change this and then remembered why I did it this way: the sync > point is also storing data about the resources that are triggering it. Part > of this > is the RefID and attributes, and we could replace that by storing that data in > the Resource itself and querying it rather than having it passed in via the > notification. But the other part is the ID/key of those resources, which we > _need_ to know in order to update the requirements in case one of them > has been replaced and thus the graph doesn't reflect it yet. (Or, for that > matter, we need it to know where to go looking for the RefId and/or > attributes if they're in the > DB.) So we have to store some data, we can't just remove items from the > required list (although we could do that as well). > > >> 3) You're suggesting to notify the engine unconditionally and let the > >> engine decide if the list is empty. That's probably not a good idea - > >> not only does it require extra reads, it introduces a race condition > >> that you then have to solve (it can be solved, it's just more work). > >> Since the update to remove a child from the list is atomic, it's best > >> to just trigger the engine only if the list is now empty. > >> > > > > No. Notify only if stream has something to be processed. The newer > > Approach based on db lock will be that the last resource will initiate its > parent. > > This is opposite to what our Aggregator model had suggested. > > OK, I think we're on the same page on this one then. > Yeah. > >>> It's not clear to me how the 'streams' differ in practical terms > >>> from just passing a serialisation of the Dependencies object, other > >>> than being incomprehensible to me ;). The current Dependencies > >>> implementation > >>> (1) is a very generic implementation of a DAG, (2) works and has > >>> plenty of > >> unit tests, (3) has, with I think one exception, a pretty > >> straightforward API, > >> (4) has a very simple serialisation, returned by the edges() method, > >> which can be passed back into the constructor to recreate it, and (5) > >> has an API that is to some extent relied upon by resources, and so > >> won't likely be removed outright in any event. > >>> Whatever code we need to handle dependencies ought to just build on > >> this existing implementation. > >>> [Murugan, Visnusaran] Our thought was to reduce payload size > >> (template/graph). Just planning for worst case scenario (million > >> resource > >> stack) We could always dump them in ResourceLock.data to be loaded by > >> Worker. > >> > >> If there's a smaller representation of a graph than a list of edges > >> then I don't know what it is. The proposed stream structure certainly > >> isn't it, unless you mean as an alternative to storing the entire > >> graph once for each resource. A better alternative is to store it > >> once centrally - in my current implementation it is passed down > >> through the trigger messages, but since only one traversal can be in > >> progress at a time it could just as easily be stored in the Stack table of > >> the > database at the slight cost of an extra write. > >> > > > > Agree that edge is the smallest representation of a graph. But it does > > not give us a complete picture without doing a DB lookup. Our > > assumption was to store streams in IN_PROGRESS resource_lock.data > > column. This could be in resource table instead. > > That's true, but I think in practice at any point where we need to look at > this > we will always have already loaded the Stack from the DB for some other > reason, so we actually can get it for free. (See detailed discussion in my > reply > to Anant.) > Aren't we planning to stop loading stack with all resource objects in future to Address scalability concerns we currently have? > >> I'm not opposed to doing that, BTW. In fact, I'm really interested in > >> your input on how that might help make recovery from failure more > >> robust. I know Anant mentioned that not storing enough data to > >> recover when a node dies was his big concern with my current approach. > >> > > > > With streams, We feel recovery will be easier. All we need is a > > trigger :) > > > >> I can see that by both creating all the sync points at the start of > >> the traversal and storing the dependency graph in the database > >> instead of letting it flow through the RPC messages, we would be able > >> to resume a traversal where it left off, though I'm not sure what that buys > us. > >> > >> And I guess what you're suggesting is that by having an explicit lock > >> with the engine ID specified, we can detect when a resource is stuck > >> in IN_PROGRESS due to an engine going down? That's actually pretty > interesting. > >> > > > > Yeah :) > > > >>> Based on our call on Thursday, I think you're taking the idea of the > >>> Lock > >> table too literally. The point of referring to locks is that we can > >> use the same concepts as the Lock table relies on to do atomic > >> updates on a particular row of the database, and we can use those > >> atomic updates to prevent race conditions when implementing > >> SyncPoints/Aggregators/whatever you want to call them. It's not that > >> we'd actually use the Lock table itself, which implements a mutex and > >> therefore offers only a much slower and more stateful way of doing > >> what we want (lock mutex, change data, unlock mutex). > >>> [Murugan, Visnusaran] Are you suggesting something like a > >>> select-for- > >> update in resource table itself without having a lock table? > >> > >> Yes, that's exactly what I was suggesting. > > > > DB is always good for sync. But we need to be careful not to overdo it. > > Yeah, I see what you mean now, it's starting to _feel_ like there'd be too > many things mixed together in the Resource table. Are you aware of some > concrete harm that might cause though? What happens if we overdo it? Is > select-for-update on a huge row more expensive than the whole overhead > of manipulating the Lock? > > Just trying to figure out if intuition is leading me astray here. > You are right. There should be no difference apart from little bump In memory usage. But I think it should be fine. > > Will update etherpad by tomorrow. > > OK, thanks. > > cheers, > Zane. > > _______________________________________________ > OpenStack-dev mailing list > [email protected] > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev _______________________________________________ OpenStack-dev mailing list [email protected] http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
