Re: [openstack-dev] [nova] An analysis of code review in Nova

John Garbutt Thu, 13 Mar 2014 05:02:46 -0700

On 13 March 2014 10:09, Matthew Booth <[email protected]> wrote:
> On 12/03/14 18:28, Matt Riedemann wrote:
>> On 2/25/2014 6:36 AM, Matthew Booth wrote:
>>> I'm new to Nova. After some frustration with the review process,
>>> specifically in the VMware driver, I decided to try to visualise how the
>>> review process is working across Nova. To that end, I've created 2
>>> graphs, both attached to this mail.
>>>
>>> Both graphs show a nova directory tree pruned at the point that a
>>> directory contains less than 2% of total LOCs. Additionally, /tests and
>>> /locale are pruned as they make the resulting graph much busier without
>>> adding a great deal of useful information. The data for both graphs was
>>> generated from the most recent 1000 changes in gerrit on Monday 24th Feb
>>> 2014. This includes all pending changes, just over 500, and just under
>>> 500 recently merged changes.
>>>
>>> pending.svg shows the percentage of LOCs which have an outstanding
>>> change against them. This is one measure of how hard it is to write new
>>> code in Nova.
>>>
>>> merged.svg shows the average length of time between the
>>> ultimately-accepted version of a change being pushed and being approved.
>>>
>>> Note that there are inaccuracies in these graphs, but they should be
>>> mostly good. Details of generation here:
>>> https://github.com/mdbooth/heatmap. This code is obviously
>>> single-purpose, but is free for re-use if anyone feels so inclined.
>>>
>>> The first graph above (pending.svg) is the one I was most interested in,
>>> and shows exactly what I expected it to. Note the size of 'vmwareapi'.
>>> If you check out Nova master, 24% of the vmwareapi driver has an
>>> outstanding change against it. It is practically impossible to write new
>>> code in vmwareapi without stomping on an oustanding patch. Compare that
>>> to the libvirt driver at a much healthier 3%.
>>>
>>> The second graph (merged.svg) is an attempt to look at why that is.
>>> Again comparing the VMware driver with the libvirt we can see that at 12
>>> days, it takes much longer for a change to be approved in the VMware
>>> driver than in the libvirt driver. I suspect that this isn't the whole
>>> story, which is likely a combination of a much longer review time with
>>> very active development.
>>>
>>> What's the impact of this? As I said above, it obviously makes it very
>>> hard to come in as a new developer of the VMware driver when almost a
>>> quarter of it has been rewritten, but you can't see it. I am very new to
>>> this and others should validate my conclusions, but I also believe this
>>> is having a detrimental impact to code quality. Remember that the above
>>> 12 day approval is only the time for the final version to be approved.
>>> If a change goes through multiple versions, each of those also has an
>>> increased review period, meaning that the time from first submission to
>>> final inclusion is typically very, very protracted. The VMware driver
>>> has its fair share of high priority issues and functionality gaps, and
>>> the developers are motived to get it in the best possible shape as
>>> quickly as possible. However, it is my impression that when problems
>>> stem from structural issues, the developers choose to add metaphorical
>>> gaffer tape rather than fix them, because fixing both creates a
>>> dependency chain which pushes the user-visible fix months into the
>>> future. In this respect the review process is dysfunctional, and is
>>> actively detrimental to code quality.
>>>
>>> Unfortunately I'm not yet sufficiently familiar with the project to
>>> offer a solution. A core reviewer who regularly looks at it is an
>>> obvious fix. A less obvious fix might involve a process which allows
>>> developers to work on a fork which is periodically merged, rather like
>>> the kernel.
>>>
>>> Matt
>>>
>>>
>>>
>>> _______________________________________________
>>> OpenStack-dev mailing list
>>> [email protected]
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>
>>
>> When I originally read this I had some ideas in mind for a response
>> regarding review latency with the vmware driver patches, but felt like
>> anything I said, albeit what I consider honest, would sound
>> bad/offensive in some way, and didn't want to convey that.
>>
>> But this came up in IRC today:
>>
>> https://blueprints.launchpad.net/nova/+spec/vmware-spawn-refactor
>>
>> That spurred some discussion around this same topic and I think
>> highlights one of the major issues, which is code quality and the design
>> of the driver.
>>
>> For example, the driver's spawn method is huge and there are a lot of
>> nested methods within it.  There are a lot of vmware patches and a lot
>> of blueprints, and a lot of them touch spawn.  When I'm reviewing them,
>> I'm looking for new conditions and checking to see if those are unit
>> tested (positive/error cases) and a lot of the time I find it very
>> difficult to tell if they are or not.  I think a lot of that has to do
>> with how the vmware tests are scaffolded with fakes to simulate a
>> vcenter backend, but it makes it very difficult if you're not extremely
>> familiar with that code to know if something is covered or not.
>>
>> And I've actually asked in bp reviews before, 'you have this new/changed
>> condition, where is it tested' and the response is more or less 'we plan
>> on refactoring this later and will add unit tests then' or 'it's hard to
>> test this path without a lot of changes to how the tests are working'.
>> That's unacceptable to me, and I generally give up on the review after
>> that.
>
> I think part of this is a (hopefully temporary) catch 22. Major
> refactors are put off because review process latency would complicate
> and delay user visible features, and user visible features are delayed
> in review because they highlight the urgent need for major refactors. I
> think Juno will be an excellent opportunity to jump out of this cycle.


+1

Just to answer this point, despite the review latency, please don't be
tempted to think one big change will get in quicker than a series of
little, easy to review, changes. All changes are not equal. A large
change often scares me away to easier to review patches.

Seems like, for Juno-1, it would be worth cancelling all non-urgent
bug fixes, and doing the refactoring we need.

I think the aim here should be better (and easier to understand) unit
test coverage. Thats a great way to drive good code structure.

>> So to move this all forward, I think that bp above should be top
>> priority for the vmware team in Juno to keep bp patches moving at the
>> pace they do, because the features and refactoring just keeps coming and
>> at least for me it's very hard to burn out on looking at those reviews.
>
> As I think has been mentioned elsewhere, the other big one is the move
> to the common vSphere API code in oslo, which is a huge source of cruft
> in the driver. This is going to require mechanical changes across the
> whole driver, so is going to be quite disruptive. As this work is
> largely done, I would put it before even the spawn() refactor.

+1 sounds like a good first step is to move to oslo.vmware

A big step forward towards the end of Juno is the driver specific CI.
That really does help give reviewers more confidence that the changes
doesn't break everything.

John

_______________________________________________
OpenStack-dev mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] An analysis of code review in Nova

Reply via email to