Re: Measuring Community Involvement (was Re: Contributor Summit planning)
Hi Duy, On Wed, 15 Aug 2018, Duy Nguyen wrote: > On Tue, Aug 14, 2018 at 7:43 PM Derrick Stolee wrote: > > 2. Number of other commit tag-lines (Reviewed-By, Helped-By, > > Reported-By, etc.). > > > > Using git repo: > > > > $ git log --since=2018-01-01 junio/next|grep by:|grep -v > > Signed-off-by:|sort|uniq -c|sort -nr|head -n 20 > > > > 66 Reviewed-by: Stefan Beller > > 22 Reviewed-by: Jeff King > > 19 Reviewed-by: Jonathan Tan > > 12 Helped-by: Eric Sunshine > > 11 Helped-by: Junio C Hamano > >9 Helped-by: Jeff King > >8 Reviewed-by: Elijah Newren > >7 Reported-by: Ramsay Jones > >7 Acked-by: Johannes Schindelin > >7 Acked-by: Brandon Williams > >6 Reviewed-by: Eric Sunshine > >6 Helped-by: Johannes Schindelin > >5 Mentored-by: Christian Couder > >5 Acked-by: Johannes Schindelin > >4 Reviewed-by: Jonathan Nieder > >4 Reviewed-by: Johannes Schindelin > >4 Helped-by: Stefan Beller > >4 Helped-by: René Scharfe > >3 Reviewed-by: Martin Ågren > >3 Reviewed-by: Lars Schneider > > > > (There does not appear to be enough density here to make a useful > > metric.) > > If your database keeps mail relationship (e.g. what mail is replied to > what according to In-Reply-To header) then look for mail replies to > patches. I think we have a rough picture who are active reviewers with > that. Not really, as there is a high percentage of "on a tangent" replies in many, many patch threads. Ciao, Dscho
Re: Measuring Community Involvement (was Re: Contributor Summit planning)
Hi Junio, On Tue, 14 Aug 2018, Junio C Hamano wrote: > Jeff King writes: > > > On Tue, Aug 14, 2018 at 01:43:38PM -0400, Derrick Stolee wrote: > > > >> On 8/13/2018 5:54 PM, Jeff King wrote: > >> > So I try not to think too hard on metrics, and just use them to get a > >> > rough view on who is active. > >> > >> I've been very interested in measuring community involvement, with the > >> knowledge that any metric is flawed and we should not ever say "this metric > >> is how we measure the quality of a contributor". It can be helpful, though, > >> to track some metrics and their change over time. > >> > >> Here are a few measurements we can make: > > > > Thanks, it was nice to see a more comprehensive list in one spot. > > > > It would be neat to have a tool that presents all of these > > automatically, but I think the email ones are pretty tricky (most people > > don't have the whole list archive sitting around). > > I do not think it covered e-mail at all, but there was git stats > project several years ago (perhaps part of GSoC IIRC). > > > I think I mentioned "surviving lines" elsewhere, which I do like this > > (and almost certainly stole from Junio a long time ago): > > Yeah, I recall that one as part of counting how many of 1244 lines > Linus originally wrote still were in our codebase at around v1.6.0 > timeframe (the answer was ~220 IIRC) ;-) And if you do not remember precisely, you can easily re-run `Linus` from here: https://github.com/git/git/blob/todo/Linus Ciao, Dscho
Re: Measuring Community Involvement (was Re: Contributor Summit planning)
On Tue, Aug 14, 2018 at 7:43 PM Derrick Stolee wrote: > 2. Number of other commit tag-lines (Reviewed-By, Helped-By, > Reported-By, etc.). > > Using git repo: > > $ git log --since=2018-01-01 junio/next|grep by:|grep -v > Signed-off-by:|sort|uniq -c|sort -nr|head -n 20 > > 66 Reviewed-by: Stefan Beller > 22 Reviewed-by: Jeff King > 19 Reviewed-by: Jonathan Tan > 12 Helped-by: Eric Sunshine > 11 Helped-by: Junio C Hamano >9 Helped-by: Jeff King >8 Reviewed-by: Elijah Newren >7 Reported-by: Ramsay Jones >7 Acked-by: Johannes Schindelin >7 Acked-by: Brandon Williams >6 Reviewed-by: Eric Sunshine >6 Helped-by: Johannes Schindelin >5 Mentored-by: Christian Couder >5 Acked-by: Johannes Schindelin >4 Reviewed-by: Jonathan Nieder >4 Reviewed-by: Johannes Schindelin >4 Helped-by: Stefan Beller >4 Helped-by: René Scharfe >3 Reviewed-by: Martin Ågren >3 Reviewed-by: Lars Schneider > > (There does not appear to be enough density here to make a useful > metric.) If your database keeps mail relationship (e.g. what mail is replied to what according to In-Reply-To header) then look for mail replies to patches. I think we have a rough picture who are active reviewers with that. -- Duy
Re: Measuring Community Involvement (was Re: Contributor Summit planning)
Jeff King wrote: > On Tue, Aug 14, 2018 at 12:47:59PM -0700, Stefan Beller wrote: > > With the advent of public inbox, this is easy to obtain? > > For our project, yes. But I was thinking of a tool that could be used > for other projects, too. Nothing prevents public-inbox from being adopted by other projects :) Fwiw, Linux Foundation has LKML at https://lore.kernel.org/lkml
Re: Measuring Community Involvement (was Re: Contributor Summit planning)
Jeff King writes: > On Tue, Aug 14, 2018 at 01:43:38PM -0400, Derrick Stolee wrote: > >> On 8/13/2018 5:54 PM, Jeff King wrote: >> > So I try not to think too hard on metrics, and just use them to get a >> > rough view on who is active. >> >> I've been very interested in measuring community involvement, with the >> knowledge that any metric is flawed and we should not ever say "this metric >> is how we measure the quality of a contributor". It can be helpful, though, >> to track some metrics and their change over time. >> >> Here are a few measurements we can make: > > Thanks, it was nice to see a more comprehensive list in one spot. > > It would be neat to have a tool that presents all of these > automatically, but I think the email ones are pretty tricky (most people > don't have the whole list archive sitting around). I do not think it covered e-mail at all, but there was git stats project several years ago (perhaps part of GSoC IIRC). > I think I mentioned "surviving lines" elsewhere, which I do like this > (and almost certainly stole from Junio a long time ago): Yeah, I recall that one as part of counting how many of 1244 lines Linus originally wrote still were in our codebase at around v1.6.0 timeframe (the answer was ~220 IIRC) ;-)
Re: Measuring Community Involvement (was Re: Contributor Summit planning)
On Tue, Aug 14, 2018 at 12:47:59PM -0700, Stefan Beller wrote: > On Tue, Aug 14, 2018 at 12:36 PM Jeff King wrote: > > > Thanks, it was nice to see a more comprehensive list in one spot. > > > > It would be neat to have a tool that presents all of these > > automatically, but I think the email ones are pretty tricky (most people > > don't have the whole list archive sitting around). > > With the advent of public inbox, this is easy to obtain? For our project, yes. But I was thinking of a tool that could be used for other projects, too. > > At one point I sent a patch series that would let shortlog group by > > trailers. Nobody seemed all that interested and I didn't end up using it > > for its original purpose, so I didn't polish it further. But I'd be > > happy to re-submit it if you think it would be useful. > > I would think it is useful. Didn't Linus also ask for a related thing? > https://public-inbox.org/git/CA+55aFzWkE43rSm-TJNKkHq4F3eOiGR0-Bo9V1=a1s=vq0k...@mail.gmail.com/ He wanted grouping by committer, which we ended up adding as a separate feature. I think there's some discussion of the trailer thing in that thread. -Peff
Re: Measuring Community Involvement (was Re: Contributor Summit planning)
On Tue, Aug 14, 2018 at 12:36 PM Jeff King wrote: > Thanks, it was nice to see a more comprehensive list in one spot. > > It would be neat to have a tool that presents all of these > automatically, but I think the email ones are pretty tricky (most people > don't have the whole list archive sitting around). With the advent of public inbox, this is easy to obtain? > > > 2. Number of other commit tag-lines (Reviewed-By, Helped-By, Reported-By, > > etc.). > > > > Using git repo: > > > > $ git log --since=2018-01-01 junio/next|grep by:|grep -v > > Signed-off-by:|sort|uniq -c|sort -nr|head -n 20 > > At one point I sent a patch series that would let shortlog group by > trailers. Nobody seemed all that interested and I didn't end up using it > for its original purpose, so I didn't polish it further. But I'd be > happy to re-submit it if you think it would be useful. I would think it is useful. Didn't Linus also ask for a related thing? https://public-inbox.org/git/CA+55aFzWkE43rSm-TJNKkHq4F3eOiGR0-Bo9V1=a1s=vq0k...@mail.gmail.com/
Re: Measuring Community Involvement (was Re: Contributor Summit planning)
On Tue, Aug 14, 2018 at 01:43:38PM -0400, Derrick Stolee wrote: > On 8/13/2018 5:54 PM, Jeff King wrote: > > So I try not to think too hard on metrics, and just use them to get a > > rough view on who is active. > > I've been very interested in measuring community involvement, with the > knowledge that any metric is flawed and we should not ever say "this metric > is how we measure the quality of a contributor". It can be helpful, though, > to track some metrics and their change over time. > > Here are a few measurements we can make: Thanks, it was nice to see a more comprehensive list in one spot. It would be neat to have a tool that presents all of these automatically, but I think the email ones are pretty tricky (most people don't have the whole list archive sitting around). > 2. Number of other commit tag-lines (Reviewed-By, Helped-By, Reported-By, > etc.). > > Using git repo: > > $ git log --since=2018-01-01 junio/next|grep by:|grep -v > Signed-off-by:|sort|uniq -c|sort -nr|head -n 20 At one point I sent a patch series that would let shortlog group by trailers. Nobody seemed all that interested and I didn't end up using it for its original purpose, so I didn't polish it further. But I'd be happy to re-submit it if you think it would be useful. The shell hackery here isn't too bad, but doing it internally is a little faster, a little more robust (less parsing), and lets you show more details about the commits themselves (e.g., who reviews whom). > 3. Number of threads started by user. You have "started" and "participated in". I guess one more would be "closed", as in "solved a bug", but that is quite hard to tell without looking at the content. Taking just the last person in a thread as the closer means that an OP saying "thanks!" wrecks it. And somebody who rants long enough that everybody else loses interest gets marked as a closer. ;) > If you have other ideas for fun measurements, then please let me know. I think I mentioned "surviving lines" elsewhere, which I do like this (and almost certainly stole from Junio a long time ago): # Obviously you can tweak this as you like, but the mass-imported bits # in compat and xdiff tend to skew the counts. It's possibly worth # counting language lines separately. git ls-files '*.c' '*.h' :^compat :^contrib :^xdiff | while read fn; do # eye candy echo >&2 "Blaming $fn..." # You can use more/fewer -C to dig more or less for code moves. # Possibly "-w" would help, though I doubt it shifts things more # than a few percent anyway. git blame -C --line-porcelain $fn done | perl -lne '/^author (.*)/ and print $1' | sort | uniq -c | sort -rn | head The output right now is: 35156 Junio C Hamano 22207 Jeff King 17466 Nguyễn Thái Ngọc Duy 12005 Johannes Schindelin 10259 Michael Haggerty 9389 Linus Torvalds 8318 Brandon Williams 7776 Stefan Beller 5947 Christian Couder 4935 René Scharfe which seems reasonable. -Peff
Measuring Community Involvement (was Re: Contributor Summit planning)
On 8/13/2018 5:54 PM, Jeff King wrote: So I try not to think too hard on metrics, and just use them to get a rough view on who is active. I've been very interested in measuring community involvement, with the knowledge that any metric is flawed and we should not ever say "this metric is how we measure the quality of a contributor". It can be helpful, though, to track some metrics and their change over time. Here are a few measurements we can make: 1. Number of (non-merge) commit author tag-lines. using git repo: > git shortlog --no-merges --since 2017 -sne junio/next | head -n 20 284 Nguyễn Thái Ngọc Duy 257 Jeff King 206 Stefan Beller 192 brian m. carlson 159 Brandon Williams 149 Junio C Hamano 137 Elijah Newren 116 René Scharfe 112 Johannes Schindelin 105 Ævar Arnfjörð Bjarmason 96 Jonathan Tan 93 SZEDER Gábor 78 Derrick Stolee 76 Martin Ågren 66 Michael Haggerty 61 Eric Sunshine 46 Christian Couder 36 Phillip Wood 35 Jonathan Nieder 33 Thomas Gummerer 2. Number of other commit tag-lines (Reviewed-By, Helped-By, Reported-By, etc.). Using git repo: $ git log --since=2018-01-01 junio/next|grep by:|grep -v Signed-off-by:|sort|uniq -c|sort -nr|head -n 20 66 Reviewed-by: Stefan Beller 22 Reviewed-by: Jeff King 19 Reviewed-by: Jonathan Tan 12 Helped-by: Eric Sunshine 11 Helped-by: Junio C Hamano 9 Helped-by: Jeff King 8 Reviewed-by: Elijah Newren 7 Reported-by: Ramsay Jones 7 Acked-by: Johannes Schindelin 7 Acked-by: Brandon Williams 6 Reviewed-by: Eric Sunshine 6 Helped-by: Johannes Schindelin 5 Mentored-by: Christian Couder 5 Acked-by: Johannes Schindelin 4 Reviewed-by: Jonathan Nieder 4 Reviewed-by: Johannes Schindelin 4 Helped-by: Stefan Beller 4 Helped-by: René Scharfe 3 Reviewed-by: Martin Ågren 3 Reviewed-by: Lars Schneider (There does not appear to be enough density here to make a useful metric.) 3. Number of email messages sent. Using mailing list repo: $ git shortlog --since 2017 -sne | head -n 20 3749 Junio C Hamano 2213 Stefan Beller 2112 Jeff King 1106 Nguyễn Thái Ngọc Duy 1028 Johannes Schindelin 965 Ævar Arnfjörð Bjarmason 956 Brandon Williams 947 Eric Sunshine 890 Elijah Newren 753 brian m. carlson 677 Duy Nguyen 646 Jonathan Nieder 629 Derrick Stolee 545 Christian Couder 515 Jonathan Tan 425 Johannes Schindelin 425 Martin Ågren 420 Jeff Hostetler 420 SZEDER Gábor 363 Phillip Wood 3. Number of threads started by user. (For this and the measurements below, I imported emails into a SQL table with columns [commit, author, date, message-id, in-reply-to, subject] and ran queries) SELECT TOP 20 COUNT(*) as NumSent ,[Author] FROM [git].[dbo].[mailing-list] WHERE [In-Reply-To] = '' AND CONVERT(DATETIME,[Date]) > CONVERT(DATETIME, '01-01-2018 00:00') GROUP BY [Author] ORDER BY NumSent DESC | NumSent | Author | |-|| | 76 | Junio C Hamano | | 64 | Stefan Beller | | 54 | Philip Oakley | | 50 | Nguyá»…n Thái Ngá»c Duy | | 49 | Robert P. J. Day | | 47 | Christian Couder | | 36 | Ramsay Jones | | 34 | Elijah Newren | | 34 | SZEDER Gábor | | 33 | Johannes Schindelin | | 31 | Jeff King | | 30 | Ævar Arnfjörð Bjarmason | | 24 | Jonathan Tan | | 22 | Alban Gruin | | 22 | brian m. carlson | | 18 | Randall S. Becker | | 15 | Paul-Sebastian Ungureanu | | 15 | Jeff Hostetler | | 15 | Brandon Williams | | 15 | Luke Diamand | 4. Number of threads where the user participated (This is measured by completing the transitive closure of In-Reply-To edges into a new 'BaseMessage' column.) SELECT TOP 20 COUNT(BaseMessage) as NumResponded ,Author FROM [git].[dbo].[mailing-list] WHERE [In-Reply-To] <> '' AND CONVERT(DATETIME,[Date]) > CONVERT(DATETIME, '01-01-2018 00:00') GROUP BY Author ORDER BY NumResponded DESC | NumResponded | Author | |--|| | 2084 | Junio C Hamano | | 1596 | Stefan Beller | | 1211 | Jeff King | | 1120 | Johannes Schindelin | | 1021 | Nguyá»…n Thái Ngá»c Duy | | 799 | Eric Sunshine | | 797 | Ævar Arnfjörð Bjarmason | | 693