Next week during the VMwareAPI subteam meeting I would like to discuss blueprint priority order and tentative scheduling for Juno. I have a proposal for the order that I would like to conduct a formal vote on and I hope that we as a community can abide by the vote's results.
In short, we currently have a number of blueprints in flight that were icehouse near-misses and new features are already going to be starved fro reviewer attention. Adding *more* features is likely to make the problem worse. I am advocating for refactorings-first and features later. If you've not read: http://lists.openstack.org/pipermail/openstack-dev/2014-February/028077.html Please do. It's good background and dove-tails with this topic. There is a tl;dr at the end. == Summary == I used to send out weekly blueprint, bug, and review tracking emails focused on VMware related changes. I've stopped doing that. The reason I have is that I have not seen a return on the investment of making those updates to the community. In this public retrospective on IceHouse, I hope that I will shed light on which practices were working and which were not. == A description of the problem == We can't get features merged upstream. Many people are expending effort and this effort is not being rewarded and the driver's evolution is suffering for it. I have been observing the VMware drivers' development since Havana opened for accepting submissions back in early 2013 and I think we have a pattern that we as a community need to address. By community, I mean those of us committing to the vmwareapi drivers in Nova. I recall working with developers in the broader community (not VMware employees) to get new features into the Nova driver for vCenter. And I recall intimately that we just missed merging in Havana-1. In fact, of the blueprints I had been tracking back then, no blueprints merged and they were all slid to Havana-2. We worked very hard and most blueprints missed Havana-3 with only a handful of exceptions. During Havana I refrained from large change suggestions because I was new to the community and any such change risked "blowing up" other developers work. Big changes can be very disruptive even if they are for good causes. So no major refactoring work occurred. In IceHouse I started tracking things much more thoroughly. This was the first time we had a significant number of developers to coordinate and we had in the neighborhood of a dozen blueprints to suggest adding features to the Nova driver for vCenter in IceHouse. A significant number of these were ready (by our group standards) for IceHouse-2. These all slipped to IceHouse-3 in the same manner all blueprints for H2 had slipped. Finally, I3 followed the same pattern as H3 with only a small set of features surviving the gauntlet. In IceHouse, only two of the dozen blueprints we as a driver sub-team had in flight managed to land. In the linked retrospective detail paste I've managed to consolidate notes I made throughout IceHouse on blueprint progress. Snapshots of these notes are publicly available on the IRC logs for the VMwareAPI sub-team if anyone would like to verify my summary of events. IceHouse retrospective detail: http://paste.openstack.org/raw/74393/ VMwareAPI team meeting details: https://wiki.openstack.org/wiki/Meetings/VMwareAPI#Next_Meeting == Learning from Successes == Of the thousands of person hours spent by VMware staff and non-staff working on the VMwareAPI drivers only a handful of feature patches merged. Why is that? I have listed all the feature patches that merged that I was able to find quickly in the previous link on retrospective detail. One particularly difficult merge was https://review.openstack.org/#/c/56416/ standing at an astonishing 74 revisions and four months of concentrated effort to achieve a change of 744 lines in a driver with a total line count on the order of 13,000 lines (including the tests.) This is an 5.6% change in the driver's code base costing 4 months of effort and thousands of person hours between multiple companies. Not to mention the developer's personal sacrifice as they worked nights and weekends to make those 744 lines happen. In that time we see that the code in review enters conflicts with another high priority feature: * https://review.openstack.org/#/c/56416/60/nova/virt/vmwareapi/vmops.py Which causes both blueprints to be revised * https://review.openstack.org/#/c/63084/23/nova/virt/vmwareapi/vmops.py March 6th becomes a very busy and confusing day as the two attention starved BP are wrestled into the code base. I'll leave parsing the details to the reader as an exercise. The interaction between these two patches is interesting enough to be worth closer examination. == A common complaint == Common complaints about the Nova vmware driver that you will find elsewhere on this mailing list include (paraphrased): * I can't tell where something is tested or how * The code is hard to follow so I hate reviewing that code-base * I can't propose a change because so much is in conflict * Who is working on what? We can't really say any particular failed BP is a prime example. In short all of these misses are 'misses' because they all starved for core-reviewer attention. The lesson here is that you can expend great amounts of effort and this does not mean you will see successful merges. == A call to action == Considering the pattern (for the vmware driver): H1 - 0 new features H2 - 0 new features I1 - 0 new features I2 - 0 new features If J1 were also see 0 new features to the VMware driver this would not be out of the ordinary. In fact if J1 were to merge *anything* not bug related for the vmware drivers that would be a *significant* improvement of the state of affairs for this driver. So any proposal that I might make should it jeopardize J1's blueprint deadlines ... would really not be anything drastic if we account for history. == step 1 address complaints about testing and "following" the code == Myself and a cadre of developers are currently executing on: https://blueprints.launchpad.net/nova/+spec/vmware-spawn-refactor This is a zero new feature blueprint. It is a coordinated refactoring of some badly abused and malformed code. We expect to have the easiest half this work completed by Friday and the more difficult portion to follow in very short order. This comprises a 500+ line refactor that with core-reviewer support we expect to be able to land successfully *before* the Atlanta design summit. I am drafting this blueprint as well: https://blueprints.launchpad.net/nova/+spec/vmware-vm-ref-refactor ... which identifies the root cause of 3 Critical or high priority bugs that forced developers into late nights and long workdays as we tried to close down IceHouse RC1. This is also a very difficult to understand pain point in the code base. And as many have pointed out, there are multiple implementations of how to accomplish the same thing in this driver, many methods that say one thing and do another, and many strange and hard to understand quirks. If we were to consolidate these at minimum we would only have to fix bugs in one location. There is also a merge effort with oslo.vmware which is the start of a major refactoring work of all the vmware drivers across OpenStack. Once again, it's an attempt to at least establish "how to do things" in the driver. == step 2 address the 'in flight' problem == To deal with the "I can't propose changes" problem I want the VMwareAPI subteam to vote on priority order of blueprints & refactorings. That means that if a blueprint conflicts with another and it is voted lower priority as a group we accept that any parallel work on that feature will have to be redone when it hits a conflict. We will need to develop a dependency order and more or less agree to work to that. That doesn't mean people sit idle. It means people don't work night and day for something that won't see light. As a general rule of thumb, a new feature should probably occur in a new mini-module (an object or similar) if not its own module (that will minimize merge conflicts) when its time to include the new feature... that feature should be "wired in" to the main flow of the driver. That means refactors and changes underneath new features should have minimal impact on the feature developer. (Hopefully these are not radical concepts to the majority of readers but if they are I'm more than willing to discuss in detail the ideas of Structured Programming and Object Oriented Programming as they pertain to these types of issues.) == Step 3 who is working on what? (and what priority) == As I've alluded to before in the VMwareAPI subteam we track who is working on what in etherpad. These are part of the public record and you can see the evolution of our pads through time as we coordinate with other developers. I have not been writing these etherpads down to the line-level in who is working on what method and what line. I sincerely hope we don't need to coordinate at that level. We do have a general idea of who is working in what module and what their goal is. == In Conclusion == I am asserting that the reason the VMwareAPI sub-team has had a difficult time upstreaming is that core-reviewers cannot be guided through why the change submitted is of high quality. This is due to the fact that the driver itself is hard to understand in ways that are non-meaningful to the problem domain. This is called "accidental complexity" and unless we tame this problem history will repeat itself. If we do not refactor the driver we can already expect 0 new features in the vmware driver for Juno-1 so mandating 'refactors first' seems drastic as it destroys feature progress... but that progress *is a lie* anyway. Historically, we have seen 0 feature progress upstream in Nova on each milestone no matter how stable or mature the feature may be. * If we are successful in this effort I may ask that we mark each milestone-1 a refactoring milestone. Let's see how this goes first. * comments and open discussion on oder/priority of refactor blueprints happen starting next IRC meeting Wednesday next (see the wiki for details) all refactors must have documents ready to go for corporate discussion over the next few days. We will vote/decide collectively the following Wednesday. * I would advocate for all refactor efforts initial development cycles to be completed no later than the Juno design summit. That is an artificial deadline of May 13th. This should give 3 weeks for reviews and merging. It also means if you don't make it... your refactor should be moved to the K time-frame. That means if you want something refactored it has to happen in April or you lose out. While I'm making demands... I also would like the 'K' release to end up named 'Kodiak' for personal reasons. The Kodiak is a majestic animal and I spent a great deal of my youth amongst them. I really think that's a fine name for a majestic product. But, I digress. == and if you don't like reading, a little video for you to watch == **tl;dr** pay back the technical debt first, then charge up the development credit card Technical Debt for those who've never heard the term before... https://www.youtube.com/watch?v=pqeJFYwnkjE -- # Shawn.Hartsock - twitter: @hartsock - plus.google.com/+ShawnHartsock _______________________________________________ OpenStack-dev mailing list [email protected] http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
