2011/12/19 Ondřej Čertík <[email protected]>: > On Sun, Dec 18, 2011 at 10:06 PM, Aaron Meurer <[email protected]> wrote: >> Hi. >> >> Thanks to this GCI task >> (http://www.google-melange.com/gci/task/view/google/gci2011/7242260), >> we now have an updated .mailmap file and the AUTHORS/aboutus have also >> been updated. Several people were found to missing from those files >> and were added. >> >> If anyone has committed with more than one email address or name >> spelling, please check the .mailmap file to verify that we are using >> your preferred address/spelling. Or if you see any incorrect entries >> or missing people, please let me know. >> >> The good news here is that we can now exactly determine how many >> authors SymPy has. As of now, we have 169 people listed in the >> AUTHORS file and 165 people from the git history (from the command git >> log --format="%aN <%aE>" | sort -u | wc -l, which thanks to the >> updated .mailmap, is correct). There are five people in the AUTHORS >> file who are not in the git metadata, either because they were not >> attributed correctly or because their last contribution was from >> before the move from SVN when we lost some history. They are now >> marked with a * in the AUTHORS file. There is one person who is not >> listed in the AUTHORS file (by request). >> >> So this totals 170 authors, as of right now. >> >> We can now also determine when the nth author was added by looking at >> the order in AUTHORS/aboutus, adding 1 for all but the first few >> contributors because of the person who is not listed (I'm not sure >> where exactly the line is drawn, Ondrej?), since they are in order in > > You mean when to start adding +1? This can be determined from the git > history. But anyway > I don't think it's a big deal.
The problem is that the git history doesn't go all the way back. > > There are other forms of contributions, for example many people just > report what to fix where, but > somebody else actually writes the patch and so on. > For some people, I tried to use their name + address if they actually > sent a patch (in form of a diff) > long time ago, and I know at least one case, where the name is just a > nickname. And so on. > > Also, some contributions are fixing technical stuff, like setup.py, or > some typo in documentation, or a Makefile in docs and so on, or fixing > pyglet (let's say), and parts of it might not be in sympy anymore. > > So in any case, the total number is only approximate, especially for people > who submitted only 1 patch. For people with a few and more patches, > the number should be pretty accurate. From git history: > > number of patches: number of people > 1: 166 > 2: 118 > 3: 101 > 4: 88 > 5: 75 > 6: 68 > > and so on. Those should be quite solid numbers. So while we can > discuss whether the total number should be 165 or 170, I think that > people with 3 or more patches will count as solid contributions by all > standards, and there are at least a 100 of them. > > Finally, what really matters for the healthiness of the project are > these numbers in let's say past year: > > git shortlog -ns --since="1 year ago" > > I get: > > 1: 91 > 2: 68 > 3: 61 > 4: 54 > 5: 47 > 6: 44 > > Those are accurate, uptodate numbers. Also, nice graphs are to plot > these into a graph, let's say contributors on the x-axis, and the > normalized number of patches on the y-axis. I know Fernando Perez made > these graphs in his presentation a few months back. > >> the file. From this, we can see that the 100th author, Cristóvão >> Sousa, contributed in November 2010. And I'm convinced that we will >> get our 200th contributor at some point in 2012. To put that in >> perspective, Ondrej started the project in 2005. >> >> This does not include people (including many GCI students) who have >> contributed to other GitHub projects only, like the website or SymPy >> Live. These probably deserve their own AUTHORS files. >> >> From now on, we need to make sure to keep both .mailmap and >> AUTHORS/aboutus up-to-date, so that we can easily find people missing >> from the AUTHORS/aboutus from the git history. > > Anyway, thanks for fixing the .mailmap. In any case, ~170 is a good number. :) > > Ondrej I completely agree with you. The main reason for doing this was for attribution purposes. Over the course of doing this, Jim Zheng (the GCI student) and I found no fewer than 14 people who were not listed in the AUTHORS file. These were not all recent contributions either. To me this is shameful, and I want to prevent it from happening again. It was very difficult to find these people before, without meticulously going though each name in the AUTHORS file and each name in the git history. Now, with .mailmap updated, you just have to take the line number of the last name in AUTHORS, subtract 9 (your name is on line 7, there are 5 people there not in git, and 1 person in git but not there). If this number is the same as the output of git log --format="%aN <%aE>" | sort -u | wc -l, then it is up-to-date. If not, then there are people missing (or .mailmap needs to be updated again). The statistical outcomes of this, including the total number of authors, are just secondary to the goal of attribution. Personally, I think that more impressive than the fact that we have had 170 authors is the increase of the number of authors. Aside from the git shortlog graphs that we already know about, I would be interested to see a graph of people by their first contribution over time (say, cluster them by three month or so periods, so that you can see trends). From the data I've already seen, I'm pretty sure that this graph would be increasing. Perhaps if someone has some free time they can make one. To me, there are two important signs of the health of a project that can be gleaned from the commit history (only looking at the authors and the commit dates). The first is the number of core contributors. This is seen from the graph that you suggest and that Fernando Perez made. The second is the number of new contributors. For this second statistic, you can also consider how many commits they made if you want, but I think it's also safe to just ignore the strength of each contribution, as they will overall fit into some normal distribution, so that on average the more total new contributions overall that you have, the more strong contributors you will get. This second statistic is important because is shows a glimpse into the growth rate of the project, and also because every project will naturally lose contributors, since they are just volunteers, so this is somewhat of a "replacement rate" for the project (very loosely speaking, of course). Aaron Meurer -- You received this message because you are subscribed to the Google Groups "sympy" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/sympy?hl=en.
