On Wed, Aug 12, 2015 at 9:45 AM, Joel Nothman <joel.noth...@gmail.com> wrote:
> I find that list somewhat obscure, and reading your section on Code > Authorship gives me some sense of why. All of those people have been very > important contributors to the project, and I'd think the absence of Gaël, > Andreas and Olivier alone would be very damaging, if only because of their > dedication to the collaborative maintenance involved. Yet despite his top > score Fabian has not actively contributed for years and would be quite > unfamiliar with many of the files he created, while I think Mathieu Blondel > and Alexandre Gramfort, for example, would provide substantial code > coverage without those seven (although they may not be interested in the > maintenance). > > I feel the approach is problematic because of the weight it puts on > "number of commits" (if that's how I should interpret "the number of > changes made in f by D"). Apart from the susceptibility of this measure to > individual author preferences, the project in infancy favoured small > commits (because the team was small), but more recently has preferred large > contributions, and has frequently squashed contributions with large commit > histories into single commits. > > Have you considered measures of "number of deliveries" apart from number > of commits? While counting lines of code presents other problems, the > number of months in which a user committed changes to a file might be a > more realistic representation. > > A number of factors attenuate developer loss: documentation and overall > code quality; fairly open and wide contribution, with regular in-person > interaction for a large number of contributors; GSoC and other > project-based involvement entailing new contributors become very familiar > with parts of the code; and the standardness of the algorithms implemented > in scikit-learn, meaning they can be maintained on the basis of reference > works (a broader documentation). > As extreme example, pydata/pandas has truck factor of one. But the one has already been "hit". I think the truck factor can be very misleading for projects in the second generation. For old projects like scipy or numpy (which I didn't see), the truckfactor might be quite large and take turnover into account, even if the short run truck factor is relatively low. Josef > > On 12 August 2015 at 22:57, Guilherme Avelino <gavel...@gmail.com> wrote: > >> As part of my PhD research on code authorship, we calculated the Truck >> Factor (TF) of some popular GitHub repositories. >> >> As you probably know, the Truck (or Bus) Factor designates the minimal >> number of developers that have to be hit by a truck (or quit) before a >> project is incapacitated. In our work, we consider that a system is in >> trouble if more than 50% of its files become orphan (i.e., without a main >> author). >> >> More details on our work in this preprint: >> https://peerj.com/preprints/1233 >> >> We calculated the TF for scikit-learn and obtained a value of 7. >> >> The developers responsible for this TF are: >> >> Fabian Pedregosa - author of 22% of the files >> Gael varoquaux - author of 13% of the files >> Andreas Mueller - author of 12% of the files >> Olivier Grisel - author of 10% of the files >> Lars Buitinck - author of 10% of the files >> Jake Vanderplas - author of 6% of the files >> Vlad Niculae - author of 5% of the files >> >> To validate our results, we would like to ask scikit-learn developers the >> following three brief questions: >> >> (a) Do you agree that the listed developers are the main developers of >> scikit-learn? >> >> (b) Do you agree that scikit-learn will be in trouble if the listed >> developers leave the project (e.g., if they win in the lottery, to be less >> morbid)? >> >> (c) Does scikit-learn have some characteristics that would attenuate the >> loss of the listed developers (e.g., detailed documentation)? >> >> Thanks in advance for your collaboration, >> >> Guilherme Avelino >> PhD Student >> Applied Software Engineering Group (ASERG) >> UFMG, Brazil >> http://aserg.labsoft.dcc.ufmg.br/ >> >> -- >> Prof. Guilherme Amaral Avelino >> Universidade Federal do Piauí >> Departamento de Computação >> >> >> ------------------------------------------------------------------------------ >> >> _______________________________________________ >> Scikit-learn-general mailing list >> Scikit-learn-general@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >> >> > > > ------------------------------------------------------------------------------ > > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > >
------------------------------------------------------------------------------
_______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general