Thank you for the answers. I really appreciate the feedback. Our research is under development and the answers we are receiving for this survey will help to better interpret the results and improve our approach.
2015-08-12 14:36 GMT-03:00 Andreas Mueller <t3k...@gmail.com>: > I disagree. > Pandas has a truck-factor of one, Jeff Reback. > My impression is that Wes did not catch up with the current codebase and > would therefore not > be the ideal maintainer any more. > If there is no "hand-over" of the arcane knowledge between generations, a > project will die. > Numpy has a truck-factor between two and four I think. > > > > On 08/12/2015 09:57 AM, josef.p...@gmail.com wrote: > > > > On Wed, Aug 12, 2015 at 9:45 AM, Joel Nothman <joel.noth...@gmail.com> > wrote: > >> I find that list somewhat obscure, and reading your section on Code >> Authorship gives me some sense of why. All of those people have been very >> important contributors to the project, and I'd think the absence of Gaël, >> Andreas and Olivier alone would be very damaging, if only because of their >> dedication to the collaborative maintenance involved. Yet despite his top >> score Fabian has not actively contributed for years and would be quite >> unfamiliar with many of the files he created, while I think Mathieu Blondel >> and Alexandre Gramfort, for example, would provide substantial code >> coverage without those seven (although they may not be interested in the >> maintenance). >> >> I feel the approach is problematic because of the weight it puts on >> "number of commits" (if that's how I should interpret "the number of >> changes made in f by D"). Apart from the susceptibility of this measure to >> individual author preferences, the project in infancy favoured small >> commits (because the team was small), but more recently has preferred large >> contributions, and has frequently squashed contributions with large commit >> histories into single commits. >> >> Have you considered measures of "number of deliveries" apart from number >> of commits? While counting lines of code presents other problems, the >> number of months in which a user committed changes to a file might be a >> more realistic representation. >> >> A number of factors attenuate developer loss: documentation and overall >> code quality; fairly open and wide contribution, with regular in-person >> interaction for a large number of contributors; GSoC and other >> project-based involvement entailing new contributors become very familiar >> with parts of the code; and the standardness of the algorithms implemented >> in scikit-learn, meaning they can be maintained on the basis of reference >> works (a broader documentation). >> > > > As extreme example, pydata/pandas has truck factor of one. But the one has > already been "hit". > > I think the truck factor can be very misleading for projects in the second > generation. > For old projects like scipy or numpy (which I didn't see), the truckfactor > might be quite large and take turnover into account, even if the short run > truck factor is relatively low. > > Josef > > > >> >> On 12 August 2015 at 22:57, Guilherme Avelino <gavel...@gmail.com> wrote: >> >>> As part of my PhD research on code authorship, we calculated the Truck >>> Factor (TF) of some popular GitHub repositories. >>> >>> As you probably know, the Truck (or Bus) Factor designates the minimal >>> number of developers that have to be hit by a truck (or quit) before a >>> project is incapacitated. In our work, we consider that a system is in >>> trouble if more than 50% of its files become orphan (i.e., without a main >>> author). >>> >>> More details on our work in this preprint: >>> https://peerj.com/preprints/1233 >>> >>> We calculated the TF for scikit-learn and obtained a value of 7. >>> >>> The developers responsible for this TF are: >>> >>> Fabian Pedregosa - author of 22% of the files >>> Gael varoquaux - author of 13% of the files >>> Andreas Mueller - author of 12% of the files >>> Olivier Grisel - author of 10% of the files >>> Lars Buitinck - author of 10% of the files >>> Jake Vanderplas - author of 6% of the files >>> Vlad Niculae - author of 5% of the files >>> >>> To validate our results, we would like to ask scikit-learn developers >>> the following three brief questions: >>> >>> (a) Do you agree that the listed developers are the main developers of >>> scikit-learn? >>> >>> (b) Do you agree that scikit-learn will be in trouble if the listed >>> developers leave the project (e.g., if they win in the lottery, to be less >>> morbid)? >>> >>> (c) Does scikit-learn have some characteristics that would attenuate the >>> loss of the listed developers (e.g., detailed documentation)? >>> >>> Thanks in advance for your collaboration, >>> >>> Guilherme Avelino >>> PhD Student >>> Applied Software Engineering Group (ASERG) >>> UFMG, Brazil >>> http://aserg.labsoft.dcc.ufmg.br/ >>> >>> -- >>> Prof. Guilherme Amaral Avelino >>> Universidade Federal do Piauí >>> Departamento de Computação >>> >>> >>> ------------------------------------------------------------------------------ >>> >>> _______________________________________________ >>> Scikit-learn-general mailing list >>> Scikit-learn-general@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>> >>> >> >> >> ------------------------------------------------------------------------------ >> >> _______________________________________________ >> Scikit-learn-general mailing list >> Scikit-learn-general@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >> >> > > > ------------------------------------------------------------------------------ > > > > _______________________________________________ > Scikit-learn-general mailing > listScikit-learn-general@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/scikit-learn-general > > > > > ------------------------------------------------------------------------------ > > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > > -- Prof. Guilherme Amaral Avelino Universidade Federal do Piauí Departamento de Computação
------------------------------------------------------------------------------
_______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general