I drafted a reply on the issue tracker then I saw yours here and I'd mostly agree.

a) Fabian @fabianp is not a very active developer any more. Most of his contribution have been a couple of years ago, so if he won the lottery, that wouldn't change much. I feel that @jnothman is missing from that list, as are @glouppe and @arjoly and @agramfort and @mblondel and likely others.

b) if we all win the lottery, I think the project would probably do better, or nothing changed. This is a hobby for most of us. me and @ogrisel are in the great position of being paid for my passion, therefore we already won

For the truck-factor: I think if @GaelVaroquaux, @ogrisel and me got hit by a truck, that might quite a hit for the project. If your list plus @jnothman and @glouppe and @arjoly and @agramfort and @mblondel got hit, it might not recover [not discounting other contributors, but these act as maintainers of large parts of the code]


On 08/12/2015 09:45 AM, Joel Nothman wrote:
I find that list somewhat obscure, and reading your section on Code Authorship gives me some sense of why. All of those people have been very important contributors to the project, and I'd think the absence of Gaël, Andreas and Olivier alone would be very damaging, if only because of their dedication to the collaborative maintenance involved. Yet despite his top score Fabian has not actively contributed for years and would be quite unfamiliar with many of the files he created, while I think Mathieu Blondel and Alexandre Gramfort, for example, would provide substantial code coverage without those seven (although they may not be interested in the maintenance).

I feel the approach is problematic because of the weight it puts on "number of commits" (if that's how I should interpret "the number of changes made in f by D"). Apart from the susceptibility of this measure to individual author preferences, the project in infancy favoured small commits (because the team was small), but more recently has preferred large contributions, and has frequently squashed contributions with large commit histories into single commits.

Have you considered measures of "number of deliveries" apart from number of commits? While counting lines of code presents other problems, the number of months in which a user committed changes to a file might be a more realistic representation.

A number of factors attenuate developer loss: documentation and overall code quality; fairly open and wide contribution, with regular in-person interaction for a large number of contributors; GSoC and other project-based involvement entailing new contributors become very familiar with parts of the code; and the standardness of the algorithms implemented in scikit-learn, meaning they can be maintained on the basis of reference works (a broader documentation).

On 12 August 2015 at 22:57, Guilherme Avelino <gavel...@gmail.com <mailto:gavel...@gmail.com>> wrote:

    As part of my PhD research on code authorship, we calculated the
    Truck Factor (TF) of some popular GitHub repositories.

    As you probably know, the Truck (or Bus) Factor designates the
    minimal number of developers that have to be hit by a truck (or
    quit) before a project is incapacitated. In our work, we consider
    that a system is in trouble if more than 50% of its files become
    orphan (i.e., without a main author).

    More details on our work in this preprint:
    https://peerj.com/preprints/1233

    We calculated the TF for scikit-learn and obtained a value of 7.

    The developers responsible for this TF are:

    Fabian Pedregosa - author of 22% of the files
    Gael varoquaux - author of 13% of the files
    Andreas Mueller - author of 12% of the files
    Olivier Grisel - author of 10% of the files
    Lars Buitinck - author of 10% of the files
    Jake Vanderplas - author of 6% of the files
    Vlad Niculae - author of 5% of the files

    To validate our results, we would like to ask scikit-learn
    developers the following three brief questions:

    (a) Do you agree that the listed developers are the main
    developers of scikit-learn?

    (b) Do you agree that scikit-learn will be in trouble if the
    listed developers leave the project (e.g., if they win in the
    lottery, to be less morbid)?

    (c) Does scikit-learn have some characteristics that would
    attenuate the loss of the listed developers (e.g., detailed
    documentation)?

    Thanks in advance for your collaboration,

    Guilherme Avelino
    PhD Student
    Applied Software Engineering Group (ASERG)
    UFMG, Brazil
    http://aserg.labsoft.dcc.ufmg.br/

-- Prof. Guilherme Amaral Avelino
    Universidade Federal do Piauí
    Departamento de Computação

    
------------------------------------------------------------------------------

    _______________________________________________
    Scikit-learn-general mailing list
    Scikit-learn-general@lists.sourceforge.net
    <mailto:Scikit-learn-general@lists.sourceforge.net>
    https://lists.sourceforge.net/lists/listinfo/scikit-learn-general




------------------------------------------------------------------------------


_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to