Re: [openstack-dev] [nova] Risk prediction model for OpenStack
林泽燕 wrote: > Dear everyone, > > My name is Zoey Lin, majored in Computer Science, Peking University, > China. I’m a candidate of Master Degree. Recently I'm making a research > on OpenStack about the contribution composition of a code file, to > predict the potential amount of defect that the file would have in the > later development stage of a release. > > I wonder if I could show you my study, including some metrics for the > prediction model and a visualization tool. I would appreciate it if you > could share your opinions or give some advices, which would really, > really help me a lot. Thank you so much for your kindness. :) > [...] I'd like to echo what Jeremy said and thank you for your insightful research. I've been interested in using risk prediction and machine learning as a part of our review process to increase quality. Your scientific analysis seems to match what we intuitively know: larger files will contain more bugs than smaller files, and (beyond a few outliers), complex files which see lots of contributions will trigger more issues than simple files that only needed to be written once. So I'm wondering how much of that feedback can be used to improve the code: I think we internalize most of that risk assessment already. One insight which I think we could take from this is that when a smaller group of people "owns" a set of files, we raise quality (compared to everyone owning everything). So the more we can split the code along areas of expertise and smaller review teams, the better. But I think that is also something we intuitively knew. Regards, -- Thierry Carrez (ttx) __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Risk prediction model for OpenStack
Hi, Matt, Thank you for your attention. The information you provided is very helpful. Best Regards > -原始邮件- > 发件人: "Matt Riedemann" > 发送时间: 2017-04-05 23:33:21 (星期三) > 收件人: openstack-dev@lists.openstack.org > 抄送: > 主题: Re: [openstack-dev] [nova] Risk prediction model for OpenStack > > On 4/5/2017 9:00 AM, Jeremy Stanley wrote: > > On 2017-04-05 14:00:59 +0800 (+0800), 林泽燕 wrote: > > [...] > >> I wonder if I could show you my study, including some metrics for > >> the prediction model and a visualization tool. > > [...] > > > > I want to start out thanking you for your research and interest in > > OpenStack's development practices. I love that our contribution > > model enables such scientific analysis, a sometimes less recognized > > benefit of our community's choice to work entirely in the open. This > > specific study is also very insightful and well-presented. > > > >> In this release, 36 developers left the development of this file > >> (they made contributions in last release but not this one). > >> Developers leaving a code file deprive the file of the knowledge > >> of the decisions they have made. > > [...] > > > > One potentially influential aspect of our development model is that > > we place a heavy importance on code review. For any patch to make it > > into a branch under official revision control, it must first be > > reviewed by multiple experienced, long-standing contributors to that > > repository. Our hope is that even though some developers may cease > > contributing new patches to a file, some of them would still be > > reviewing, guiding and refining changes proposed by newer > > contributors. It doesn't seem like this behavior was captured in > > your analysis, or alternatively the fact that your model yielded > > relatively accurate predictions could imply that our review process > > has little impact on defects introduced by new commits. > > > > If you do at some point wish to try integrating review metrics into > > your analysis, our code review system has a REST API you can > > leverage, and much of the data you'd likely be interested in can be > > queried via anonymous methods such that you wouldn't even need to > > create an account. Documentation for the interface is available at > > https://review.openstack.org/Documentation/rest-api.html and we also > > have documentation of our general developer workflow at > > https://docs.openstack.org/infra/manual/developers.html as well as > > some background on our development model at > > https://docs.openstack.org/project-team-guide/open-development.html > > if that helps. > > > > Jeremy pointed out what I was going to mention, which was the lack of > input on code reviews. Each major component of Nova, or virt drivers, > generally have subteams, or some sort of subject domain expert, that is > consulted or at least involved in reviewing code contributions. So while > they may not be making the changes themselves to a component, they > should be reviewing those changes. For example, with the > nova/virt/libvirt/driver.py, danpb was the main core reviewer and > maintainer for that code in the past, so while he didn't write > everything, he was reviewing a lot of the contributions. > > Some of the files are also skewed a bit, and you might want to take into > account logic paths in a module to exclude it. For example, exception.py > and the various opts.py modules are outliers. They are basically files > that contain constants but not logic code so the chance of those having > an actual owner is small, but so should be the risk for bugs. They will > also have a high diversity given how common they are. > > I'm not sure I understood the timeline graphs, or the point those are > making. We definitely have an ebb and flow of contributions based on the > release schedule where feature development and new code is loaded toward > the front of the release, and then that is supposed to be cut off toward > the 3rd milestone at the end of the release so we can stabilize and > focus on bugs. > > In general some of this is common sense. When one person "owns" most of > a module in a piece of software they are the expert and therefore bugs > due to lack of understanding the bigger picture of that module, or how > it fits into the bigger system, should be mitigated. When that person > leaves, if others on the team don't have the domain knowledge, there are > going to be mistakes. We definitely have parts of the nova codebase that > f
Re: [openstack-dev] [nova] Risk prediction model for OpenStack
Hi Jeremy, I did ignore the impact of code review. Thank you for reminding. Zoey Lin > -原始邮件- > 发件人: "Jeremy Stanley" > 发送时间: 2017-04-05 22:00:28 (星期三) > 收件人: "OpenStack Development Mailing List (not for usage questions)" > > 抄送: > 主题: Re: [openstack-dev] [nova] Risk prediction model for OpenStack > > On 2017-04-05 14:00:59 +0800 (+0800), 林泽燕 wrote: > [...] > > I wonder if I could show you my study, including some metrics for > > the prediction model and a visualization tool. > [...] > > I want to start out thanking you for your research and interest in > OpenStack's development practices. I love that our contribution > model enables such scientific analysis, a sometimes less recognized > benefit of our community's choice to work entirely in the open. This > specific study is also very insightful and well-presented. > > > In this release, 36 developers left the development of this file > > (they made contributions in last release but not this one). > > Developers leaving a code file deprive the file of the knowledge > > of the decisions they have made. > [...] > > One potentially influential aspect of our development model is that > we place a heavy importance on code review. For any patch to make it > into a branch under official revision control, it must first be > reviewed by multiple experienced, long-standing contributors to that > repository. Our hope is that even though some developers may cease > contributing new patches to a file, some of them would still be > reviewing, guiding and refining changes proposed by newer > contributors. It doesn't seem like this behavior was captured in > your analysis, or alternatively the fact that your model yielded > relatively accurate predictions could imply that our review process > has little impact on defects introduced by new commits. > > If you do at some point wish to try integrating review metrics into > your analysis, our code review system has a REST API you can > leverage, and much of the data you'd likely be interested in can be > queried via anonymous methods such that you wouldn't even need to > create an account. Documentation for the interface is available at > https://review.openstack.org/Documentation/rest-api.html and we also > have documentation of our general developer workflow at > https://docs.openstack.org/infra/manual/developers.html as well as > some background on our development model at > https://docs.openstack.org/project-team-guide/open-development.html > if that helps. > -- > Jeremy Stanley > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev Best regards! —— Zeyan Lin Department of Computer Science School of Electronics Engineering & Computer Science Peking University Beijing 100871, China E-mail:linze...@pku.edu.cn __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Risk prediction model for OpenStack
On 4/5/2017 9:00 AM, Jeremy Stanley wrote: On 2017-04-05 14:00:59 +0800 (+0800), 林泽燕 wrote: [...] I wonder if I could show you my study, including some metrics for the prediction model and a visualization tool. [...] I want to start out thanking you for your research and interest in OpenStack's development practices. I love that our contribution model enables such scientific analysis, a sometimes less recognized benefit of our community's choice to work entirely in the open. This specific study is also very insightful and well-presented. In this release, 36 developers left the development of this file (they made contributions in last release but not this one). Developers leaving a code file deprive the file of the knowledge of the decisions they have made. [...] One potentially influential aspect of our development model is that we place a heavy importance on code review. For any patch to make it into a branch under official revision control, it must first be reviewed by multiple experienced, long-standing contributors to that repository. Our hope is that even though some developers may cease contributing new patches to a file, some of them would still be reviewing, guiding and refining changes proposed by newer contributors. It doesn't seem like this behavior was captured in your analysis, or alternatively the fact that your model yielded relatively accurate predictions could imply that our review process has little impact on defects introduced by new commits. If you do at some point wish to try integrating review metrics into your analysis, our code review system has a REST API you can leverage, and much of the data you'd likely be interested in can be queried via anonymous methods such that you wouldn't even need to create an account. Documentation for the interface is available at https://review.openstack.org/Documentation/rest-api.html and we also have documentation of our general developer workflow at https://docs.openstack.org/infra/manual/developers.html as well as some background on our development model at https://docs.openstack.org/project-team-guide/open-development.html if that helps. Jeremy pointed out what I was going to mention, which was the lack of input on code reviews. Each major component of Nova, or virt drivers, generally have subteams, or some sort of subject domain expert, that is consulted or at least involved in reviewing code contributions. So while they may not be making the changes themselves to a component, they should be reviewing those changes. For example, with the nova/virt/libvirt/driver.py, danpb was the main core reviewer and maintainer for that code in the past, so while he didn't write everything, he was reviewing a lot of the contributions. Some of the files are also skewed a bit, and you might want to take into account logic paths in a module to exclude it. For example, exception.py and the various opts.py modules are outliers. They are basically files that contain constants but not logic code so the chance of those having an actual owner is small, but so should be the risk for bugs. They will also have a high diversity given how common they are. I'm not sure I understood the timeline graphs, or the point those are making. We definitely have an ebb and flow of contributions based on the release schedule where feature development and new code is loaded toward the front of the release, and then that is supposed to be cut off toward the 3rd milestone at the end of the release so we can stabilize and focus on bugs. In general some of this is common sense. When one person "owns" most of a module in a piece of software they are the expert and therefore bugs due to lack of understanding the bigger picture of that module, or how it fits into the bigger system, should be mitigated. When that person leaves, if others on the team don't have the domain knowledge, there are going to be mistakes. We definitely have parts of the nova codebase that fall into areas that we know are just very touchy and error prone and we avoid changing those if at all possible (block device mappings, quotas, neutronv2.api, nova-network and cells v1 come to mind). This is hard in a big open source project, but is also why we have high standards for core reviewers (those that can approve code contributions) and a ridiculous amount of continuous integration testing. -- Thanks, Matt __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Risk prediction model for OpenStack
On 2017-04-05 14:00:59 +0800 (+0800), 林泽燕 wrote: [...] > I wonder if I could show you my study, including some metrics for > the prediction model and a visualization tool. [...] I want to start out thanking you for your research and interest in OpenStack's development practices. I love that our contribution model enables such scientific analysis, a sometimes less recognized benefit of our community's choice to work entirely in the open. This specific study is also very insightful and well-presented. > In this release, 36 developers left the development of this file > (they made contributions in last release but not this one). > Developers leaving a code file deprive the file of the knowledge > of the decisions they have made. [...] One potentially influential aspect of our development model is that we place a heavy importance on code review. For any patch to make it into a branch under official revision control, it must first be reviewed by multiple experienced, long-standing contributors to that repository. Our hope is that even though some developers may cease contributing new patches to a file, some of them would still be reviewing, guiding and refining changes proposed by newer contributors. It doesn't seem like this behavior was captured in your analysis, or alternatively the fact that your model yielded relatively accurate predictions could imply that our review process has little impact on defects introduced by new commits. If you do at some point wish to try integrating review metrics into your analysis, our code review system has a REST API you can leverage, and much of the data you'd likely be interested in can be queried via anonymous methods such that you wouldn't even need to create an account. Documentation for the interface is available at https://review.openstack.org/Documentation/rest-api.html and we also have documentation of our general developer workflow at https://docs.openstack.org/infra/manual/developers.html as well as some background on our development model at https://docs.openstack.org/project-team-guide/open-development.html if that helps. -- Jeremy Stanley __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev