I disagree.
Pandas has a truck-factor of one, Jeff Reback.
My impression is that Wes did not catch up with the current codebase and
would therefore not
be the ideal maintainer any more.
If there is no "hand-over" of the arcane knowledge between generations,
a project will die.
Numpy has a truck-factor between two and four I think.
On 08/12/2015 09:57 AM, josef.p...@gmail.com wrote:
On Wed, Aug 12, 2015 at 9:45 AM, Joel Nothman <joel.noth...@gmail.com
<mailto:joel.noth...@gmail.com>> wrote:
I find that list somewhat obscure, and reading your section on
Code Authorship gives me some sense of why. All of those people
have been very important contributors to the project, and I'd
think the absence of Gaël, Andreas and Olivier alone would be very
damaging, if only because of their dedication to the collaborative
maintenance involved. Yet despite his top score Fabian has not
actively contributed for years and would be quite unfamiliar with
many of the files he created, while I think Mathieu Blondel
and Alexandre Gramfort, for example, would provide substantial
code coverage without those seven (although they may not be
interested in the maintenance).
I feel the approach is problematic because of the weight it puts
on "number of commits" (if that's how I should interpret "the
number of changes made in f by D"). Apart from the susceptibility
of this measure to individual author preferences, the project in
infancy favoured small commits (because the team was small), but
more recently has preferred large contributions, and has
frequently squashed contributions with large commit histories into
single commits.
Have you considered measures of "number of deliveries" apart from
number of commits? While counting lines of code presents other
problems, the number of months in which a user committed changes
to a file might be a more realistic representation.
A number of factors attenuate developer loss: documentation and
overall code quality; fairly open and wide contribution, with
regular in-person interaction for a large number of contributors;
GSoC and other project-based involvement entailing new
contributors become very familiar with parts of the code; and the
standardness of the algorithms implemented in scikit-learn,
meaning they can be maintained on the basis of reference works (a
broader documentation).
As extreme example, pydata/pandas has truck factor of one. But the one
has already been "hit".
I think the truck factor can be very misleading for projects in the
second generation.
For old projects like scipy or numpy (which I didn't see), the
truckfactor might be quite large and take turnover into account, even
if the short run truck factor is relatively low.
Josef
On 12 August 2015 at 22:57, Guilherme Avelino <gavel...@gmail.com
<mailto:gavel...@gmail.com>> wrote:
As part of my PhD research on code authorship, we calculated
the Truck Factor (TF) of some popular GitHub repositories.
As you probably know, the Truck (or Bus) Factor designates the
minimal number of developers that have to be hit by a truck
(or quit) before a project is incapacitated. In our work, we
consider that a system is in trouble if more than 50% of its
files become orphan (i.e., without a main author).
More details on our work in this preprint:
https://peerj.com/preprints/1233
We calculated the TF for scikit-learn and obtained a value of 7.
The developers responsible for this TF are:
Fabian Pedregosa - author of 22% of the files
Gael varoquaux - author of 13% of the files
Andreas Mueller - author of 12% of the files
Olivier Grisel - author of 10% of the files
Lars Buitinck - author of 10% of the files
Jake Vanderplas - author of 6% of the files
Vlad Niculae - author of 5% of the files
To validate our results, we would like to ask scikit-learn
developers the following three brief questions:
(a) Do you agree that the listed developers are the main
developers of scikit-learn?
(b) Do you agree that scikit-learn will be in trouble if the
listed developers leave the project (e.g., if they win in the
lottery, to be less morbid)?
(c) Does scikit-learn have some characteristics that would
attenuate the loss of the listed developers (e.g., detailed
documentation)?
Thanks in advance for your collaboration,
Guilherme Avelino
PhD Student
Applied Software Engineering Group (ASERG)
UFMG, Brazil
http://aserg.labsoft.dcc.ufmg.br/
--
Prof. Guilherme Amaral Avelino
Universidade Federal do Piauí
Departamento de Computação
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
<mailto:Scikit-learn-general@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
<mailto:Scikit-learn-general@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general