On Wed, Aug 12, 2015 at 9:45 AM, Joel Nothman <joel.noth...@gmail.com>
wrote:

> I find that list somewhat obscure, and reading your section on Code
> Authorship gives me some sense of why. All of those people have been very
> important contributors to the project, and I'd think the absence of Gaël,
> Andreas and Olivier alone would be very damaging, if only because of their
> dedication to the collaborative maintenance involved. Yet despite his top
> score Fabian has not actively contributed for years and would be quite
> unfamiliar with many of the files he created, while I think Mathieu Blondel
> and Alexandre Gramfort, for example, would provide substantial code
> coverage without those seven (although they may not be interested in the
> maintenance).
>
> I feel the approach is problematic because of the weight it puts on
> "number of commits" (if that's how I should interpret "the number of
> changes made in f by D"). Apart from the susceptibility of this measure to
> individual author preferences, the project in infancy favoured small
> commits (because the team was small), but more recently has preferred large
> contributions, and has frequently squashed contributions with large commit
> histories into single commits.
>
> Have you considered measures of "number of deliveries" apart from number
> of commits? While counting lines of code presents other problems, the
> number of months in which a user committed changes to a file might be a
> more realistic representation.
>
> A number of factors attenuate developer loss: documentation and overall
> code quality; fairly open and wide contribution, with regular in-person
> interaction for a large number of contributors; GSoC and other
> project-based involvement entailing new contributors become very familiar
> with parts of the code; and the standardness of the algorithms implemented
> in scikit-learn, meaning they can be maintained on the basis of reference
> works (a broader documentation).
>


As extreme example, pydata/pandas has truck factor of one. But the one has
already been "hit".

I think the truck factor can be very misleading for projects in the second
generation.
For old projects like scipy or numpy (which I didn't see), the truckfactor
might be quite large and take turnover into account, even if the short run
truck factor is relatively low.

Josef



>
> On 12 August 2015 at 22:57, Guilherme Avelino <gavel...@gmail.com> wrote:
>
>> As part of my PhD research on code authorship, we calculated the Truck
>> Factor (TF) of some popular GitHub repositories.
>>
>> As you probably know, the Truck (or Bus) Factor designates the minimal
>> number of developers that have to be hit by a truck (or quit) before a
>> project is incapacitated. In our work, we consider that a system is in
>> trouble if more than 50% of its files become orphan (i.e., without a main
>> author).
>>
>> More details on our work in this preprint:
>> https://peerj.com/preprints/1233
>>
>> We calculated the TF for scikit-learn and obtained a value of 7.
>>
>> The developers responsible for this TF are:
>>
>> Fabian Pedregosa - author of 22% of the files
>> Gael varoquaux - author of 13% of the files
>> Andreas Mueller - author of 12% of the files
>> Olivier Grisel - author of 10% of the files
>> Lars Buitinck - author of 10% of the files
>> Jake Vanderplas - author of 6% of the files
>> Vlad Niculae - author of 5% of the files
>>
>> To validate our results, we would like to ask scikit-learn developers the
>> following three brief questions:
>>
>> (a) Do you agree that the listed developers are the main developers of
>> scikit-learn?
>>
>> (b) Do you agree that scikit-learn will be in trouble if the listed
>> developers leave the project (e.g., if they win in the lottery, to be less
>> morbid)?
>>
>> (c) Does scikit-learn have some characteristics that would attenuate the
>> loss of the listed developers (e.g., detailed documentation)?
>>
>> Thanks in advance for your collaboration,
>>
>> Guilherme Avelino
>> PhD Student
>> Applied Software Engineering Group (ASERG)
>> UFMG, Brazil
>> http://aserg.labsoft.dcc.ufmg.br/
>>
>> --
>> Prof. Guilherme Amaral Avelino
>> Universidade Federal do Piauí
>> Departamento de Computação
>>
>>
>> ------------------------------------------------------------------------------
>>
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
>
> ------------------------------------------------------------------------------
>
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to