Re: [Scikit-learn-general] scikit-learn Truck Factor

Rafael Calsaverini Wed, 12 Aug 2015 09:34:38 -0700

Yeah, I think that while code I is trivially important, the amount of work
open source maintainers do that is NOT code is huge. And it's often easier
to replace someone that writes code than someone who reviews pull requests,
make decisions, speak at conferences, write and enforce quality guidelines,
etc, etc, etc.


Em qua, 12 de ago de 2015 13:18, Sebastian Raschka <se.rasc...@gmail.com>
escreveu:

> If something like a "truck factor" really needs to be computed, I think it
> would be more meaningful to quantify how essential each of the
> contributions is. For instance, how complex a particular
> function/class/method is, how often it gets used/called in the modules, and
> how important it is for the user base. But also here, one needs to pay
> attention to the detail. A trivial example, the iris dataset is probably
> more often loaded from the dataset module than the SVM implementation(s);
> unarguably, the latter is probably a more important part of scikit-learn.
> Also, scikit-learn is an effort of many great people in the core team, I
> would find it unfair to weight their importances against each other.
>
> Another example: when I was working on a contribution to scikit-learn, it
> was probably me who committed 95% of the code. However, I would say that
> the implementation was 95% of the core-developers work in terms of careful
> revisions, great ideas, and insightful suggestions. In short: Although it
> was me who committed the code, it was really the core team's efforts, thus,
> I wouldn't find it fair to quantify importance by number of commits or
> lines of code.
>
> Best,
> Sebastian
>
> On Aug 12, 2015, at 9:45 AM, Joel Nothman <joel.noth...@gmail.com> wrote:
>
> I find that list somewhat obscure, and reading your section on Code
> Authorship gives me some sense of why. All of those people have been very
> important contributors to the project, and I'd think the absence of Gaël,
> Andreas and Olivier alone would be very damaging, if only because of their
> dedication to the collaborative maintenance involved. Yet despite his top
> score Fabian has not actively contributed for years and would be quite
> unfamiliar with many of the files he created, while I think Mathieu Blondel
> and Alexandre Gramfort, for example, would provide substantial code
> coverage without those seven (although they may not be interested in the
> maintenance).
>
> I feel the approach is problematic because of the weight it puts on
> "number of commits" (if that's how I should interpret "the number of
> changes made in f by D"). Apart from the susceptibility of this measure to
> individual author preferences, the project in infancy favoured small
> commits (because the team was small), but more recently has preferred large
> contributions, and has frequently squashed contributions with large commit
> histories into single commits.
>
> Have you considered measures of "number of deliveries" apart from number
> of commits? While counting lines of code presents other problems, the
> number of months in which a user committed changes to a file might be a
> more realistic representation.
>
> A number of factors attenuate developer loss: documentation and overall
> code quality; fairly open and wide contribution, with regular in-person
> interaction for a large number of contributors; GSoC and other
> project-based involvement entailing new contributors become very familiar
> with parts of the code; and the standardness of the algorithms implemented
> in scikit-learn, meaning they can be maintained on the basis of reference
> works (a broader documentation).
>
> On 12 August 2015 at 22:57, Guilherme Avelino <gavel...@gmail.com> wrote:
>
>> As part of my PhD research on code authorship, we calculated the Truck
>> Factor (TF) of some popular GitHub repositories.
>>
>> As you probably know, the Truck (or Bus) Factor designates the minimal
>> number of developers that have to be hit by a truck (or quit) before a
>> project is incapacitated. In our work, we consider that a system is in
>> trouble if more than 50% of its files become orphan (i.e., without a main
>> author).
>>
>> More details on our work in this preprint:
>> https://peerj.com/preprints/1233
>>
>> We calculated the TF for scikit-learn and obtained a value of 7.
>>
>> The developers responsible for this TF are:
>>
>> Fabian Pedregosa - author of 22% of the files
>> Gael varoquaux - author of 13% of the files
>> Andreas Mueller - author of 12% of the files
>> Olivier Grisel - author of 10% of the files
>> Lars Buitinck - author of 10% of the files
>> Jake Vanderplas - author of 6% of the files
>> Vlad Niculae - author of 5% of the files
>>
>> To validate our results, we would like to ask scikit-learn developers the
>> following three brief questions:
>>
>> (a) Do you agree that the listed developers are the main developers of
>> scikit-learn?
>>
>> (b) Do you agree that scikit-learn will be in trouble if the listed
>> developers leave the project (e.g., if they win in the lottery, to be less
>> morbid)?
>>
>> (c) Does scikit-learn have some characteristics that would attenuate the
>> loss of the listed developers (e.g., detailed documentation)?
>>
>> Thanks in advance for your collaboration,
>>
>> Guilherme Avelino
>> PhD Student
>> Applied Software Engineering Group (ASERG)
>> UFMG, Brazil
>> http://aserg.labsoft.dcc.ufmg.br/
>>
>> --
>> Prof. Guilherme Amaral Avelino
>> Universidade Federal do Piauí
>> Departamento de Computação
>>
>>
>> ------------------------------------------------------------------------------
>>
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
> ------------------------------------------------------------------------------
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>
> ------------------------------------------------------------------------------
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>

------------------------------------------------------------------------------

_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] scikit-learn Truck Factor

Reply via email to