Re: [openstack-dev] [nova] Risk prediction model for OpenStack

2017-04-06 Thread Thierry Carrez
林泽燕 wrote:
> Dear everyone,
> 
> My name is Zoey Lin, majored in Computer Science, Peking University,
> China. I’m a candidate of Master Degree. Recently I'm making a research
> on OpenStack about the contribution composition of a code file, to
> predict the potential amount of defect that the file would have in the
> later development stage of a release.
> 
> I wonder if I could show you my study, including some metrics for the
> prediction model and a visualization tool. I would appreciate it if you
> could share your opinions or give some advices, which would really,
> really help me a lot. Thank you so much for your kindness. :)
> [...]

I'd like to echo what Jeremy said and thank you for your insightful
research. I've been interested in using risk prediction and machine
learning as a part of our review process to increase quality.

Your scientific analysis seems to match what we intuitively know: larger
files will contain more bugs than smaller files, and (beyond a few
outliers), complex files which see lots of contributions will trigger
more issues than simple files that only needed to be written once. So
I'm wondering how much of that feedback can be used to improve the code:
I think we internalize most of that risk assessment already.

One insight which I think we could take from this is that when a smaller
group of people "owns" a set of files, we raise quality (compared to
everyone owning everything). So the more we can split the code along
areas of expertise and smaller review teams, the better. But I think
that is also something we intuitively knew.

Regards,

-- 
Thierry Carrez (ttx)

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Risk prediction model for OpenStack

2017-04-05 Thread 林泽燕
Hi, Matt,
Thank you for your attention. The information you provided is very helpful.

Best Regards


> -原始邮件-
> 发件人: "Matt Riedemann" 
> 发送时间: 2017-04-05 23:33:21 (星期三)
> 收件人: openstack-dev@lists.openstack.org
> 抄送: 
> 主题: Re: [openstack-dev] [nova] Risk prediction model for OpenStack
> 
> On 4/5/2017 9:00 AM, Jeremy Stanley wrote:
> > On 2017-04-05 14:00:59 +0800 (+0800), 林泽燕 wrote:
> > [...]
> >> I wonder if I could show you my study, including some metrics for
> >> the prediction model and a visualization tool.
> > [...]
> >
> > I want to start out thanking you for your research and interest in
> > OpenStack's development practices. I love that our contribution
> > model enables such scientific analysis, a sometimes less recognized
> > benefit of our community's choice to work entirely in the open. This
> > specific study is also very insightful and well-presented.
> >
> >> In this release, 36 developers left the development of this file
> >> (they made contributions in last release but not this one).
> >> Developers leaving a code file deprive the file of the knowledge
> >> of the decisions they have made.
> > [...]
> >
> > One potentially influential aspect of our development model is that
> > we place a heavy importance on code review. For any patch to make it
> > into a branch under official revision control, it must first be
> > reviewed by multiple experienced, long-standing contributors to that
> > repository. Our hope is that even though some developers may cease
> > contributing new patches to a file, some of them would still be
> > reviewing, guiding and refining changes proposed by newer
> > contributors. It doesn't seem like this behavior was captured in
> > your analysis, or alternatively the fact that your model yielded
> > relatively accurate predictions could imply that our review process
> > has little impact on defects introduced by new commits.
> >
> > If you do at some point wish to try integrating review metrics into
> > your analysis, our code review system has a REST API you can
> > leverage, and much of the data you'd likely be interested in can be
> > queried via anonymous methods such that you wouldn't even need to
> > create an account. Documentation for the interface is available at
> > https://review.openstack.org/Documentation/rest-api.html and we also
> > have documentation of our general developer workflow at
> > https://docs.openstack.org/infra/manual/developers.html as well as
> > some background on our development model at
> > https://docs.openstack.org/project-team-guide/open-development.html
> > if that helps.
> >
> 
> Jeremy pointed out what I was going to mention, which was the lack of 
> input on code reviews. Each major component of Nova, or virt drivers, 
> generally have subteams, or some sort of subject domain expert, that is 
> consulted or at least involved in reviewing code contributions. So while 
> they may not be making the changes themselves to a component, they 
> should be reviewing those changes. For example, with the 
> nova/virt/libvirt/driver.py, danpb was the main core reviewer and 
> maintainer for that code in the past, so while he didn't write 
> everything, he was reviewing a lot of the contributions.
> 
> Some of the files are also skewed a bit, and you might want to take into 
> account logic paths in a module to exclude it. For example, exception.py 
> and the various opts.py modules are outliers. They are basically files 
> that contain constants but not logic code so the chance of those having 
> an actual owner is small, but so should be the risk for bugs. They will 
> also have a high diversity given how common they are.
> 
> I'm not sure I understood the timeline graphs, or the point those are 
> making. We definitely have an ebb and flow of contributions based on the 
> release schedule where feature development and new code is loaded toward 
> the front of the release, and then that is supposed to be cut off toward 
> the 3rd milestone at the end of the release so we can stabilize and 
> focus on bugs.
> 
> In general some of this is common sense. When one person "owns" most of 
> a module in a piece of software they are the expert and therefore bugs 
> due to lack of understanding the bigger picture of that module, or how 
> it fits into the bigger system, should be mitigated. When that person 
> leaves, if others on the team don't have the domain knowledge, there are 
> going to be mistakes. We definitely have parts of the nova codebase that 
> f

Re: [openstack-dev] [nova] Risk prediction model for OpenStack

2017-04-05 Thread 林泽燕
Hi Jeremy,
I did ignore the impact of code review. Thank you for reminding. 

Zoey Lin


> -原始邮件-
> 发件人: "Jeremy Stanley" 
> 发送时间: 2017-04-05 22:00:28 (星期三)
> 收件人: "OpenStack Development Mailing List (not for usage questions)" 
> 
> 抄送: 
> 主题: Re: [openstack-dev] [nova] Risk prediction model for OpenStack
> 
> On 2017-04-05 14:00:59 +0800 (+0800), 林泽燕 wrote:
> [...]
> > I wonder if I could show you my study, including some metrics for
> > the prediction model and a visualization tool.
> [...]
> 
> I want to start out thanking you for your research and interest in
> OpenStack's development practices. I love that our contribution
> model enables such scientific analysis, a sometimes less recognized
> benefit of our community's choice to work entirely in the open. This
> specific study is also very insightful and well-presented.
> 
> > In this release, 36 developers left the development of this file
> > (they made contributions in last release but not this one).
> > Developers leaving a code file deprive the file of the knowledge
> > of the decisions they have made.
> [...]
> 
> One potentially influential aspect of our development model is that
> we place a heavy importance on code review. For any patch to make it
> into a branch under official revision control, it must first be
> reviewed by multiple experienced, long-standing contributors to that
> repository. Our hope is that even though some developers may cease
> contributing new patches to a file, some of them would still be
> reviewing, guiding and refining changes proposed by newer
> contributors. It doesn't seem like this behavior was captured in
> your analysis, or alternatively the fact that your model yielded
> relatively accurate predictions could imply that our review process
> has little impact on defects introduced by new commits.
> 
> If you do at some point wish to try integrating review metrics into
> your analysis, our code review system has a REST API you can
> leverage, and much of the data you'd likely be interested in can be
> queried via anonymous methods such that you wouldn't even need to
> create an account. Documentation for the interface is available at
> https://review.openstack.org/Documentation/rest-api.html and we also
> have documentation of our general developer workflow at
> https://docs.openstack.org/infra/manual/developers.html as well as
> some background on our development model at
> https://docs.openstack.org/project-team-guide/open-development.html
> if that helps.
> -- 
> Jeremy Stanley
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



Best regards!
——
Zeyan Lin
Department of Computer Science
School of Electronics Engineering & Computer Science
Peking University
Beijing 100871, China
E-mail:linze...@pku.edu.cn
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Risk prediction model for OpenStack

2017-04-05 Thread Matt Riedemann

On 4/5/2017 9:00 AM, Jeremy Stanley wrote:

On 2017-04-05 14:00:59 +0800 (+0800), 林泽燕 wrote:
[...]

I wonder if I could show you my study, including some metrics for
the prediction model and a visualization tool.

[...]

I want to start out thanking you for your research and interest in
OpenStack's development practices. I love that our contribution
model enables such scientific analysis, a sometimes less recognized
benefit of our community's choice to work entirely in the open. This
specific study is also very insightful and well-presented.


In this release, 36 developers left the development of this file
(they made contributions in last release but not this one).
Developers leaving a code file deprive the file of the knowledge
of the decisions they have made.

[...]

One potentially influential aspect of our development model is that
we place a heavy importance on code review. For any patch to make it
into a branch under official revision control, it must first be
reviewed by multiple experienced, long-standing contributors to that
repository. Our hope is that even though some developers may cease
contributing new patches to a file, some of them would still be
reviewing, guiding and refining changes proposed by newer
contributors. It doesn't seem like this behavior was captured in
your analysis, or alternatively the fact that your model yielded
relatively accurate predictions could imply that our review process
has little impact on defects introduced by new commits.

If you do at some point wish to try integrating review metrics into
your analysis, our code review system has a REST API you can
leverage, and much of the data you'd likely be interested in can be
queried via anonymous methods such that you wouldn't even need to
create an account. Documentation for the interface is available at
https://review.openstack.org/Documentation/rest-api.html and we also
have documentation of our general developer workflow at
https://docs.openstack.org/infra/manual/developers.html as well as
some background on our development model at
https://docs.openstack.org/project-team-guide/open-development.html
if that helps.



Jeremy pointed out what I was going to mention, which was the lack of 
input on code reviews. Each major component of Nova, or virt drivers, 
generally have subteams, or some sort of subject domain expert, that is 
consulted or at least involved in reviewing code contributions. So while 
they may not be making the changes themselves to a component, they 
should be reviewing those changes. For example, with the 
nova/virt/libvirt/driver.py, danpb was the main core reviewer and 
maintainer for that code in the past, so while he didn't write 
everything, he was reviewing a lot of the contributions.


Some of the files are also skewed a bit, and you might want to take into 
account logic paths in a module to exclude it. For example, exception.py 
and the various opts.py modules are outliers. They are basically files 
that contain constants but not logic code so the chance of those having 
an actual owner is small, but so should be the risk for bugs. They will 
also have a high diversity given how common they are.


I'm not sure I understood the timeline graphs, or the point those are 
making. We definitely have an ebb and flow of contributions based on the 
release schedule where feature development and new code is loaded toward 
the front of the release, and then that is supposed to be cut off toward 
the 3rd milestone at the end of the release so we can stabilize and 
focus on bugs.


In general some of this is common sense. When one person "owns" most of 
a module in a piece of software they are the expert and therefore bugs 
due to lack of understanding the bigger picture of that module, or how 
it fits into the bigger system, should be mitigated. When that person 
leaves, if others on the team don't have the domain knowledge, there are 
going to be mistakes. We definitely have parts of the nova codebase that 
fall into areas that we know are just very touchy and error prone and we 
avoid changing those if at all possible (block device mappings, quotas, 
neutronv2.api, nova-network and cells v1 come to mind). This is hard in 
a big open source project, but is also why we have high standards for 
core reviewers (those that can approve code contributions) and a 
ridiculous amount of continuous integration testing.


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Risk prediction model for OpenStack

2017-04-05 Thread Jeremy Stanley
On 2017-04-05 14:00:59 +0800 (+0800), 林泽燕 wrote:
[...]
> I wonder if I could show you my study, including some metrics for
> the prediction model and a visualization tool.
[...]

I want to start out thanking you for your research and interest in
OpenStack's development practices. I love that our contribution
model enables such scientific analysis, a sometimes less recognized
benefit of our community's choice to work entirely in the open. This
specific study is also very insightful and well-presented.

> In this release, 36 developers left the development of this file
> (they made contributions in last release but not this one).
> Developers leaving a code file deprive the file of the knowledge
> of the decisions they have made.
[...]

One potentially influential aspect of our development model is that
we place a heavy importance on code review. For any patch to make it
into a branch under official revision control, it must first be
reviewed by multiple experienced, long-standing contributors to that
repository. Our hope is that even though some developers may cease
contributing new patches to a file, some of them would still be
reviewing, guiding and refining changes proposed by newer
contributors. It doesn't seem like this behavior was captured in
your analysis, or alternatively the fact that your model yielded
relatively accurate predictions could imply that our review process
has little impact on defects introduced by new commits.

If you do at some point wish to try integrating review metrics into
your analysis, our code review system has a REST API you can
leverage, and much of the data you'd likely be interested in can be
queried via anonymous methods such that you wouldn't even need to
create an account. Documentation for the interface is available at
https://review.openstack.org/Documentation/rest-api.html and we also
have documentation of our general developer workflow at
https://docs.openstack.org/infra/manual/developers.html as well as
some background on our development model at
https://docs.openstack.org/project-team-guide/open-development.html
if that helps.
-- 
Jeremy Stanley

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev