Re: [sympy] Updated .mailmap (Total number of authors of SymPy)

Aaron Meurer Mon, 19 Dec 2011 16:58:42 -0800

2011/12/19 Ondřej Čertík <[email protected]>:
> On Sun, Dec 18, 2011 at 10:06 PM, Aaron Meurer <[email protected]> wrote:
>> Hi.
>>
>> Thanks to this GCI task
>> (http://www.google-melange.com/gci/task/view/google/gci2011/7242260),
>> we now have an updated .mailmap file and the AUTHORS/aboutus have also
>> been updated.  Several people were found to missing from those files
>> and were added.
>>
>> If anyone has committed with more than one email address or name
>> spelling, please check the .mailmap file to verify that we are using
>> your preferred address/spelling.  Or if you see any incorrect entries
>> or missing people, please let me know.
>>
>> The good news here is that we can now exactly determine how many
>> authors SymPy has.  As of now, we have 169 people listed in the
>> AUTHORS file and 165 people from the git history (from the command git
>> log --format="%aN <%aE>" | sort -u  | wc -l, which thanks to the
>> updated .mailmap, is correct). There are five people in the AUTHORS
>> file who are not in the git metadata, either because they were not
>> attributed correctly or because their last contribution was from
>> before the move from SVN when we lost some history.  They are now
>> marked with a * in the AUTHORS file.  There is one person who is not
>> listed in the AUTHORS file (by request).
>>
>> So this totals 170 authors, as of right now.
>>
>> We can now also determine when the nth author was added by looking at
>> the order in AUTHORS/aboutus, adding 1 for all but the first few
>> contributors because of the person who is not listed (I'm not sure
>> where exactly the line is drawn, Ondrej?), since they are in order in
>
> You mean when to start adding +1? This can be determined from the git
> history. But anyway
> I don't think it's a big deal.


The problem is that the git history doesn't go all the way back.

>
> There are other forms of contributions, for example many people just
> report what to fix where, but
> somebody else actually writes the patch and so on.
> For some people, I tried to use their name + address if they actually
> sent a patch (in form of a diff)
> long time ago, and I know at least one case, where the name is just a
> nickname. And so on.
>
> Also, some contributions are fixing technical stuff, like setup.py, or
> some typo in documentation, or a Makefile in docs and so on, or fixing
> pyglet (let's say), and parts of it might not be in sympy anymore.
>
> So in any case, the total number is only approximate, especially for people
> who submitted only 1 patch. For people with a few and more patches,
> the number should be pretty accurate. From git history:
>
> number of patches: number of people
> 1: 166
> 2: 118
> 3: 101
> 4: 88
> 5: 75
> 6: 68
>
> and so on. Those should be quite solid numbers. So while we can
> discuss whether the total number should be 165 or 170, I think that
> people with 3 or more patches will count as solid contributions by all
> standards, and there are at least a 100 of them.
>
> Finally, what really matters for the healthiness of the project are
> these numbers in let's say past year:
>
> git shortlog -ns --since="1 year ago"
>
> I get:
>
> 1: 91
> 2: 68
> 3: 61
> 4: 54
> 5: 47
> 6: 44
>
> Those are accurate, uptodate numbers. Also, nice graphs are to plot
> these into a graph, let's say contributors on the x-axis, and the
> normalized number of patches on the y-axis. I know Fernando Perez made
> these graphs in his presentation a few months back.
>
>> the file.  From this, we can see that the 100th author, Cristóvão
>> Sousa, contributed in November 2010.  And I'm convinced that we will
>> get our 200th contributor at some point in 2012. To put that in
>> perspective, Ondrej started the project in 2005.
>>
>> This does not include people (including many GCI students) who have
>> contributed to other GitHub projects only, like the website or SymPy
>> Live. These probably deserve their own AUTHORS files.
>>
>> From now on, we need to make sure to keep both .mailmap and
>> AUTHORS/aboutus up-to-date, so that we can easily find people missing
>> from the AUTHORS/aboutus from the git history.
>
> Anyway, thanks for fixing the .mailmap. In any case, ~170 is a good number. :)
>
> Ondrej

I completely agree with you.  The main reason for doing this was for
attribution purposes.  Over the course of doing this, Jim Zheng (the
GCI student) and I found no fewer than 14 people who were not listed
in the AUTHORS file.  These were not all recent contributions either.
To me this is shameful, and I want to prevent it from happening again.

It was very difficult to find these people before, without
meticulously going though each name in the AUTHORS file and each name
in the git history. Now, with .mailmap updated, you just have to take
the line number of the last name in AUTHORS, subtract 9 (your name is
on line 7, there are 5 people there not in git, and 1 person in git
but not there).  If this number is the same as the output of git
log --format="%aN <%aE>" | sort -u  | wc -l, then it is up-to-date.
If not, then there are people missing (or .mailmap needs to be updated
again).

The statistical outcomes of this, including the total number of
authors, are just secondary to the goal of attribution.  Personally, I
think that more impressive than the fact that we have had 170 authors
is the increase of the number of authors.  Aside from the git shortlog
graphs that we already know about, I would be interested to see a
graph of people by their first contribution over time (say, cluster
them by three month or so periods, so that you can see trends).  From
the data I've already seen, I'm pretty sure that this graph would be
increasing.  Perhaps if someone has some free time they can make one.

To me, there are two important signs of the health of a project that
can be gleaned from the commit history (only looking at the authors
and the commit dates).  The first is the number of core contributors.
This is seen from the graph that you suggest and that Fernando Perez
made.  The second is the number of new contributors.  For this second
statistic, you can also consider how many commits they made if you
want, but I think it's also safe to just ignore the strength of each
contribution, as they will overall fit into some normal distribution,
so that on average the more total new contributions overall that you
have, the more strong contributors you will get.

This second statistic is important because is shows a glimpse into the
growth rate of the project, and also because every project will
naturally lose contributors, since they are just volunteers, so this
is somewhat of a "replacement rate" for the project (very loosely
speaking, of course).

Aaron Meurer

-- 
You received this message because you are subscribed to the Google Groups 
"sympy" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/sympy?hl=en.

Re: [sympy] Updated .mailmap (Total number of authors of SymPy)

Reply via email to