On Thu, Nov 27, 2014 at 12:20 PM, Zane Bitter <zbit...@redhat.com> wrote:
> A bunch of us have spent the last few weeks working independently on proof > of concept designs for the convergence architecture. I think those efforts > have now reached a sufficient level of maturity that we should start > working together on synthesising them into a plan that everyone can forge > ahead with. As a starting point I'm going to summarise my take on the three > efforts; hopefully the authors of the other two will weigh in to give us > their perspective. > > > Zane's Proposal > =============== > > https://github.com/zaneb/heat-convergence-prototype/tree/distributed-graph > > I implemented this as a simulator of the algorithm rather than using the > Heat codebase itself in order to be able to iterate rapidly on the design, > and indeed I have changed my mind many, many times in the process of > implementing it. Its notable departure from a realistic simulation is that > it runs only one operation at a time - essentially giving up the ability to > detect race conditions in exchange for a completely deterministic test > framework. You just have to imagine where the locks need to be. > Incidentally, the test framework is designed so that it can easily be > ported to the actual Heat code base as functional tests so that the same > scenarios could be used without modification, allowing us to have > confidence that the eventual implementation is a faithful replication of > the simulation (which can be rapidly experimented on, adjusted and tested > when we inevitably run into implementation issues). > > This is a complete implementation of Phase 1 (i.e. using existing resource > plugins), including update-during-update, resource clean-up, replace on > update and rollback; with tests. > > Some of the design goals which were successfully incorporated: > - Minimise changes to Heat (it's essentially a distributed version of the > existing algorithm), and in particular to the database > - Work with the existing plugin API > - Limit total DB access for Resource/Stack to O(n) in the number of > resources > - Limit overall DB access to O(m) in the number of edges > - Limit lock contention to only those operations actually contending (i.e. > no global locks) > - Each worker task deals with only one resource > - Only read resource attributes once > > Open questions: > - What do we do when we encounter a resource that is in progress from a > previous update while doing a subsequent update? Obviously we don't want to > interrupt it, as it will likely be left in an unknown state. Making a > replacement is one obvious answer, but in many cases there could be serious > down-sides to that. How long should we wait before trying it? What if it's > still in progress because the engine processing the resource already died? > > Also, how do we implement resource level timeouts in general? > > Michał's Proposal > ================= > > https://github.com/inc0/heat-convergence-prototype/tree/iterative > > Note that a version modified by me to use the same test scenario format > (but not the same scenarios) is here: > > https://github.com/zaneb/heat-convergence-prototype/tree/iterative-adapted > > This is based on my simulation framework after a fashion, but with > everything implemented synchronously and a lot of handwaving about how the > actual implementation could be distributed. The central premise is that at > each step of the algorithm, the entire graph is examined for tasks that can > be performed next, and those are then started. Once all are complete (it's > synchronous, remember), the next step is run. Keen observers will be asking > how we know when it is time to run the next step in a distributed version > of this algorithm, where it will be run and what to do about resources that > are in an intermediate state at that time. All of these questions remain > unanswered. > Yes, I was struggling to figure out how it could manage an IN_PROGRESS state as it's stateless. So you end up treading on the other action's toes. Assuming we use the resource's state (IN_PROGRESS) you could get around that. Then you kick off a converge when ever an action completes (if there is nothing new to be done then do nothing). > > A non-exhaustive list of concerns I have: > - Replace on update is not implemented yet > - AFAIK rollback is not implemented yet > - The simulation doesn't actually implement the proposed architecture > - This approach is punishingly heavy on the database - O(n^2) or worse > Yes, re-reading the state of all resources when ever run a new converge is worrying, but I think Michal had some ideas to minimize this. > - A lot of phase 2 is mixed in with phase 1 here, making it difficult to > evaluate which changes need to be made first and whether this approach > works with existing plugins > - The code is not really based on how Heat works at the moment, so there > would be either a major redesign required or lots of radical changes in > Heat or both > > I think there's a fair chance that given another 3-4 weeks to work on > this, all of these issues and others could probably be resolved. The > question for me at this point is not so much "if" but "why". > > Michał believes that this approach will make Phase 2 easier to implement, > which is a valid reason to consider it. However, I'm not aware of any > particular issues that my approach would cause in implementing phase 2 > (note that I have barely looked into it at all though). In fact, I very > much want Phase 2 to be entirely encapsulated by the Resource class, so > that the plugin type (legacy vs. convergence-enabled) is transparent to the > rest of the system. Only in this way can we be sure that we'll be able to > maintain support for legacy plugins. So a phase 1 that mixes in aspects of > phase 2 is actually a bad thing in my view. > > I really appreciate the effort that has gone into this already, but in the > absence of specific problems with building phase 2 on top of another > approach that are solved by this one, I'm ready to call this a distraction. > In it's defence, I like the simplicity of it. The concepts and code are easy to understand - tho' part of this is doesn't implement all the stuff on your list yet. > > > Anant & Friends' Proposal > ========================= > > First off, I have found this very difficult to review properly since the > code is not separate from the huge mass of Heat code and nor is the commit > history in the form that patch submissions would take (but rather includes > backtracking and iteration on the design). As a result, most of the > information here has been gleaned from discussions about the code rather > than direct review. I have repeatedly suggested that this proof of concept > work should be done using the simulator framework instead, unfortunately so > far to no avail. > > The last we heard on the mailing list about this, resource clean-up had > not yet been implemented. That was a major concern because that is the more > difficult half of the algorithm. Since then there have been a lot more > commits, but it's not yet clear whether resource clean-up, > update-during-update, replace-on-update and rollback have been implemented, > though it is clear that at least some progress has been made on most or all > of them. Perhaps someone can give us an update. > https://github.com/anantpatil/heat-convergence-poc > > AIUI this code also mixes phase 2 with phase 1, which is a concern. For me > the highest priority for phase 1 is to be sure that it works with existing > plugins. Not only because we need to continue to support them, but because > converting all of our existing 'integration-y' unit tests to functional > tests that operate in a distributed system is virtually impossible in the > time frame we have available. So the existing test code needs to stick > around, and the existing stack create/update/delete mechanisms need to > remain in place until such time as we have equivalent functional test > coverage to begin eliminating existing unit tests. (We'll also, of course, > need to have unit tests for the individual elements of the new distributed > workflow, functional tests to confirm that the distributed workflow works > in principle as a whole - the scenarios from the simulator can help with > _part_ of this - and, not least, an algorithm that is as similar as > possible to the current one so that our existing tests remain at least > somewhat representative and don't require too many major changes > themselves.) > > Speaking of tests, I gathered that this branch included tests, but I don't > know to what extent there are automated end-to-end functional tests of the > algorithm? > > From what I can gather, the approach seems broadly similar to the one I > eventually settled on also. The major difference appears to be in how we > merge two or more streams of execution (i.e. when one resource depends on > two or more others). In my approach, the dependencies are stored in the > resources and each joining of streams creates a database row to track it, > which is easily locked with contention on the lock extending only to those > resources which are direct dependencies of the one waiting. In this > approach, both the dependencies and the progress through the graph are > stored in a database table, necessitating (a) reading of the entire table > (as it relates to the current stack) on every resource operation, and (b) > locking of the entire table (which is hard) when marking a resource > operation complete. > > I chatted to Anant about this today and he mentioned that they had solved > the locking problem by dispatching updates to a queue that is read by a > single engine per stack. > > My approach also has the neat side-effects of pushing the data required to > resolve get_resource and get_att (without having to reload the resources > again and query them) as well as to update dependencies (e.g. because of a > replacement or deletion) along with the flow of triggers. I don't know if > anything similar is at work here. > > It's entirely possible that the best design might combine elements of both > approaches. > > The same open questions I detailed under my proposal also apply to this > one, if I understand correctly. > > > I'm certain that I won't have represented everyone's work fairly here, so > I encourage folks to dive in and correct any errors about theirs and ask > any questions you might have about mine. (In case you have been living > under a rock, note that I'll be out of the office for the rest of the week > due to Thanksgiving so don't expect immediate replies.) > > I also think this would be a great time for the wider Heat community to > dive in and start asking questions and suggesting ideas. We need to, ahem, > converge on a shared understanding of the design so we can all get to work > delivering it for Kilo. > Agree, we need to get moving on this. -Angus > > cheers, > Zane. > > _______________________________________________ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >
_______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev