On Oct 16, 2013, at 02:36 , Sean Dague <s...@dague.net<mailto:s...@dague.net>> 
wrote:

On 10/15/2013 04:54 PM, Vishvananda Ishaya wrote:
Hi Everyone,

I've been following this conversation and weighing the different sides. This is 
a tricky issue but I think it is important to decouple further and extend our 
circle of trust.

When nova started it was very easy to do feature development. As it has matured 
the pace has slowed. This is expected and necessary, but we periodically must 
make decoupling decisions or we will become mired in overhead. We did this 
already with cinder and neutron, and we have discussed doing this with virt 
drivers in the past.

We have a large number of people attempting to contribute to small sections of 
nova and getting frustrated with the process.  The perception of developers is 
much more important than the actual numbers here. If people are frustrated they 
are disincentivized to help and it hurts everyone. Suggesting that these 
contributors need to learn all of nova and help with the review queue is silly 
and makes us seem elitist. We should make it as easy as possible for new 
contributors to help.

I think our current model is breaking down at our current size and we need to 
adopt something more similar to the linux model when dealing with subsystems. 
The hyper-v team is the only one suggesting changes, but there have been 
similar concerns from the vmware team. I have no doubt that there are similar 
issues with the PowerVM, Xen, Docker, lxc and even kvm driver contributors.

The Linux kernel process works for a couple of reasons...

1) the subsystem maintainers have known each other for a solid decade (i.e. 3x 
the lifespan of the OpenStack project), over a history of 10 years, of people 
doing the right things, you build trust in their judgment.

*no one* in the Linux tree was given trust first, under the hope that it would 
work out. They had to earn it, hard, by doing community work, and not just 
playing in their corner of the world.

2) This http://www.wired.com/wiredenterprise/2012/06/torvalds-nvidia-linux/ is 
completely acceptable behavior. So when someone has bad code, they are flamed 
to within an inch of their life, repeatedly, until they never ever do that 
again. This is actually a time saving measure in code review. It's a lot faster 
to just call people idiots then to help them with line by line improvements in 
their code, 10, 20, 30, or 40 iterations in gerrit.

We, as a community have decided, I think rightly, that #2 really isn't in our 
culture. But you can't start cherry picking parts of the Linux kernel community 
without considering how all the parts work together. The good and the bad are 
part of why the whole system works.

In my opinion, nova-core needs to be willing to trust the subsystem developers 
and let go of a little bit of control. I frankly don't see the drawbacks.

I actually see huge draw backs. Culture matters. Having people active and 
willing to work on real core issues matter. The long term health of Nova 
matters.

As the QA PTL I can tell you that when you look at Nova vs. Cinder vs. Neutron, 
you'll see some very clear lines about how long it takes to get to the bottom 
of a race condition, and how many deep races are in each of them. I find this 
directly related to the stance each project has taken on whether it's socially 
acceptable to only work on your own vendor code. Nova's insistence up until 
this point that if you only play in your corner, you don't get the same 
attention is important incentive for people to integrate and work beyond just 
their boundaries. I think diluting this part of the culture would be hugely 
detrimental to Nova.

Let's take an example that came up today, the compute_diagnostics API. This is 
an area where we've left it completely to the virt drivers to vomit up a random 
dictionary of the day for debugging reasons, and stamped it as an API. With a 
model where we let virt driver authors go hide in a corner, that's never going 
to become an API with any kind of contract, and given how much effort we've 
spent on ensuring RPC versioning and message formats, the idea that we are 
exposing a public rest endpoint that's randomly fluctuating data based on date 
and underlying implementation, is a bit saddening.

I'm leaning towards giving control of the subtree to the team as the best 
option because it is simple and works with our current QA system. 
Alternatively, we could split out the driver into a nova subproject (2 below) 
or we could allow them to have a separate branch and do a trusted merge of all 
changes at the end of the cycle (similar to the linux model).

I hope we can come to a solution to the summit that makes all of our 
contributors want to participate more. I believe that giving people more 
responsibility inspires them to participate more fully.

I would like nothing more than all our contributors to participate more. But 
more has to mean caring about not only your stuff.

I was called out today in the hyper-v meeting because I had the audacity to -1 
a hyper-v patch because I wanted some reference in the code somewhere to format 
references so why we had some new random seek call would be understood by 
people down the road - 
http://eavesdrop.openstack.org/meetings/hyper_v/2013/hyper_v.2013-10-15-16.03.log.html


Sean, you got "called out" in the meeting not because you asked to put a 
refernce link to the specs which was perfectly reasonable, but because after we 
did what you asked for in a timely manner, you didn't bother to review the 
patch again until asked to please review it 6 days later!!!

This is a perfect example about why we need autonomy. We cannot leave a patch 
starving in the review queue for a critical bug like that one!!


As OpenStack grows, the single biggest factor in it's success isn't going to be 
a feature in a driver, it's going to be if this crazy complicated system holds 
together. Whether or not we've got a handle on the emergent behavior that 
happens in an asynchronous message based system, with 10s of integrated 
projects, and many dozens of daemons cross talking with each other.

I mean seriously, one of the only reasons we made it through to Havana RC phase 
is because we built a search engine based system to build statistical frequency 
analysis of unique failures on our gate resets to fully expose the top race 
conditions that had gotten so bad the gate basically locked up. And a bunch of 
people went all hands on deck to drive these out. People jumped across normal 
project lines to help on some of these top bugs, because that's what makes 
OpenStack a whole system.

Things actually looked *really* bleak for release for a while. All the people 
that helped out and got us through this deserve a huge pat on the back. That's 
what OpenStack is about.

So I feel pretty strongly that optimizing the contribution process for people 
that aren't helping with that larger problem, is the tragedy of the commons, 
and I think entirely the wrong optimization to be made.

   -Sean

--
Sean Dague
http://dague.net

_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to