Thank you for the answers. I really appreciate the feedback.

Our research is under development and the answers we are receiving for this
survey will help to better interpret the results and improve our approach.



2015-08-12 14:36 GMT-03:00 Andreas Mueller <t3k...@gmail.com>:

> I disagree.
> Pandas has a truck-factor of one, Jeff Reback.
> My impression is that Wes did not catch up with the current codebase and
> would therefore not
> be the ideal maintainer any more.
> If there is no "hand-over" of the arcane knowledge between generations, a
> project will die.
> Numpy has a truck-factor between two and four I think.
>
>
>
> On 08/12/2015 09:57 AM, josef.p...@gmail.com wrote:
>
>
>
> On Wed, Aug 12, 2015 at 9:45 AM, Joel Nothman <joel.noth...@gmail.com>
> wrote:
>
>> I find that list somewhat obscure, and reading your section on Code
>> Authorship gives me some sense of why. All of those people have been very
>> important contributors to the project, and I'd think the absence of Gaël,
>> Andreas and Olivier alone would be very damaging, if only because of their
>> dedication to the collaborative maintenance involved. Yet despite his top
>> score Fabian has not actively contributed for years and would be quite
>> unfamiliar with many of the files he created, while I think Mathieu Blondel
>> and Alexandre Gramfort, for example, would provide substantial code
>> coverage without those seven (although they may not be interested in the
>> maintenance).
>>
>> I feel the approach is problematic because of the weight it puts on
>> "number of commits" (if that's how I should interpret "the number of
>> changes made in f by D"). Apart from the susceptibility of this measure to
>> individual author preferences, the project in infancy favoured small
>> commits (because the team was small), but more recently has preferred large
>> contributions, and has frequently squashed contributions with large commit
>> histories into single commits.
>>
>> Have you considered measures of "number of deliveries" apart from number
>> of commits? While counting lines of code presents other problems, the
>> number of months in which a user committed changes to a file might be a
>> more realistic representation.
>>
>> A number of factors attenuate developer loss: documentation and overall
>> code quality; fairly open and wide contribution, with regular in-person
>> interaction for a large number of contributors; GSoC and other
>> project-based involvement entailing new contributors become very familiar
>> with parts of the code; and the standardness of the algorithms implemented
>> in scikit-learn, meaning they can be maintained on the basis of reference
>> works (a broader documentation).
>>
>
>
> As extreme example, pydata/pandas has truck factor of one. But the one has
> already been "hit".
>
> I think the truck factor can be very misleading for projects in the second
> generation.
> For old projects like scipy or numpy (which I didn't see), the truckfactor
> might be quite large and take turnover into account, even if the short run
> truck factor is relatively low.
>
> Josef
>
>
>
>>
>> On 12 August 2015 at 22:57, Guilherme Avelino <gavel...@gmail.com> wrote:
>>
>>> As part of my PhD research on code authorship, we calculated the Truck
>>> Factor (TF) of some popular GitHub repositories.
>>>
>>> As you probably know, the Truck (or Bus) Factor designates the minimal
>>> number of developers that have to be hit by a truck (or quit) before a
>>> project is incapacitated. In our work, we consider that a system is in
>>> trouble if more than 50% of its files become orphan (i.e., without a main
>>> author).
>>>
>>> More details on our work in this preprint:
>>> https://peerj.com/preprints/1233
>>>
>>> We calculated the TF for scikit-learn and obtained a value of 7.
>>>
>>> The developers responsible for this TF are:
>>>
>>> Fabian Pedregosa - author of 22% of the files
>>> Gael varoquaux - author of 13% of the files
>>> Andreas Mueller - author of 12% of the files
>>> Olivier Grisel - author of 10% of the files
>>> Lars Buitinck - author of 10% of the files
>>> Jake Vanderplas - author of 6% of the files
>>> Vlad Niculae - author of 5% of the files
>>>
>>> To validate our results, we would like to ask scikit-learn developers
>>> the following three brief questions:
>>>
>>> (a) Do you agree that the listed developers are the main developers of
>>> scikit-learn?
>>>
>>> (b) Do you agree that scikit-learn will be in trouble if the listed
>>> developers leave the project (e.g., if they win in the lottery, to be less
>>> morbid)?
>>>
>>> (c) Does scikit-learn have some characteristics that would attenuate the
>>> loss of the listed developers (e.g., detailed documentation)?
>>>
>>> Thanks in advance for your collaboration,
>>>
>>> Guilherme Avelino
>>> PhD Student
>>> Applied Software Engineering Group (ASERG)
>>> UFMG, Brazil
>>> http://aserg.labsoft.dcc.ufmg.br/
>>>
>>> --
>>> Prof. Guilherme Amaral Avelino
>>> Universidade Federal do Piauí
>>> Departamento de Computação
>>>
>>>
>>> ------------------------------------------------------------------------------
>>>
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-general@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>>
>>
>>
>> ------------------------------------------------------------------------------
>>
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
>
> ------------------------------------------------------------------------------
>
>
>
> _______________________________________________
> Scikit-learn-general mailing 
> listScikit-learn-general@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>
>
> ------------------------------------------------------------------------------
>
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>


-- 
Prof. Guilherme Amaral Avelino
Universidade Federal do Piauí
Departamento de Computação
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to