Re: Measuring Community Involvement (was Re: Contributor Summit planning)

2018-08-27 Thread Johannes Schindelin
Hi Duy,

On Wed, 15 Aug 2018, Duy Nguyen wrote:

> On Tue, Aug 14, 2018 at 7:43 PM Derrick Stolee  wrote:
> > 2. Number of other commit tag-lines (Reviewed-By, Helped-By,
> > Reported-By, etc.).
> >
> >  Using git repo:
> >
> >  $ git log --since=2018-01-01 junio/next|grep by:|grep -v
> > Signed-off-by:|sort|uniq -c|sort -nr|head -n 20
> >
> >   66 Reviewed-by: Stefan Beller 
> >   22 Reviewed-by: Jeff King 
> >   19 Reviewed-by: Jonathan Tan 
> >   12 Helped-by: Eric Sunshine 
> >   11 Helped-by: Junio C Hamano 
> >9 Helped-by: Jeff King 
> >8 Reviewed-by: Elijah Newren 
> >7 Reported-by: Ramsay Jones 
> >7 Acked-by: Johannes Schindelin 
> >7 Acked-by: Brandon Williams 
> >6 Reviewed-by: Eric Sunshine 
> >6 Helped-by: Johannes Schindelin 
> >5 Mentored-by: Christian Couder 
> >5 Acked-by: Johannes Schindelin 
> >4 Reviewed-by: Jonathan Nieder 
> >4 Reviewed-by: Johannes Schindelin 
> >4 Helped-by: Stefan Beller 
> >4 Helped-by: René Scharfe 
> >3 Reviewed-by: Martin Ågren 
> >3 Reviewed-by: Lars Schneider 
> >
> >  (There does not appear to be enough density here to make a useful
> > metric.)
> 
> If your database keeps mail relationship (e.g. what mail is replied to
> what according to In-Reply-To header) then look for mail replies to
> patches. I think we have a rough picture who are active reviewers with
> that.

Not really, as there is a high percentage of "on a tangent" replies in
many, many patch threads.

Ciao,
Dscho

Re: Measuring Community Involvement (was Re: Contributor Summit planning)

2018-08-27 Thread Johannes Schindelin
Hi Junio,

On Tue, 14 Aug 2018, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > On Tue, Aug 14, 2018 at 01:43:38PM -0400, Derrick Stolee wrote:
> >
> >> On 8/13/2018 5:54 PM, Jeff King wrote:
> >> > So I try not to think too hard on metrics, and just use them to get a
> >> > rough view on who is active.
> >> 
> >> I've been very interested in measuring community involvement, with the
> >> knowledge that any metric is flawed and we should not ever say "this metric
> >> is how we measure the quality of a contributor". It can be helpful, though,
> >> to track some metrics and their change over time.
> >> 
> >> Here are a few measurements we can make:
> >
> > Thanks, it was nice to see a more comprehensive list in one spot.
> >
> > It would be neat to have a tool that presents all of these
> > automatically, but I think the email ones are pretty tricky (most people
> > don't have the whole list archive sitting around).
> 
> I do not think it covered e-mail at all, but there was git stats
> project several years ago (perhaps part of GSoC IIRC).
> 
> > I think I mentioned "surviving lines" elsewhere, which I do like this
> > (and almost certainly stole from Junio a long time ago):
> 
> Yeah, I recall that one as part of counting how many of 1244 lines
> Linus originally wrote still were in our codebase at around v1.6.0
> timeframe (the answer was ~220 IIRC) ;-)

And if you do not remember precisely, you can easily re-run `Linus` from
here: https://github.com/git/git/blob/todo/Linus

Ciao,
Dscho


Re: Measuring Community Involvement (was Re: Contributor Summit planning)

2018-08-15 Thread Duy Nguyen
On Tue, Aug 14, 2018 at 7:43 PM Derrick Stolee  wrote:
> 2. Number of other commit tag-lines (Reviewed-By, Helped-By,
> Reported-By, etc.).
>
>  Using git repo:
>
>  $ git log --since=2018-01-01 junio/next|grep by:|grep -v
> Signed-off-by:|sort|uniq -c|sort -nr|head -n 20
>
>   66 Reviewed-by: Stefan Beller 
>   22 Reviewed-by: Jeff King 
>   19 Reviewed-by: Jonathan Tan 
>   12 Helped-by: Eric Sunshine 
>   11 Helped-by: Junio C Hamano 
>9 Helped-by: Jeff King 
>8 Reviewed-by: Elijah Newren 
>7 Reported-by: Ramsay Jones 
>7 Acked-by: Johannes Schindelin 
>7 Acked-by: Brandon Williams 
>6 Reviewed-by: Eric Sunshine 
>6 Helped-by: Johannes Schindelin 
>5 Mentored-by: Christian Couder 
>5 Acked-by: Johannes Schindelin 
>4 Reviewed-by: Jonathan Nieder 
>4 Reviewed-by: Johannes Schindelin 
>4 Helped-by: Stefan Beller 
>4 Helped-by: René Scharfe 
>3 Reviewed-by: Martin Ågren 
>3 Reviewed-by: Lars Schneider 
>
>  (There does not appear to be enough density here to make a useful
> metric.)

If your database keeps mail relationship (e.g. what mail is replied to
what according to In-Reply-To header) then look for mail replies to
patches. I think we have a rough picture who are active reviewers with
that.
-- 
Duy


Re: Measuring Community Involvement (was Re: Contributor Summit planning)

2018-08-15 Thread Eric Wong
Jeff King  wrote:
> On Tue, Aug 14, 2018 at 12:47:59PM -0700, Stefan Beller wrote:
> > With the advent of public inbox, this is easy to obtain?
> 
> For our project, yes. But I was thinking of a tool that could be used
> for other projects, too.

Nothing prevents public-inbox from being adopted by other projects :)
Fwiw, Linux Foundation has LKML at https://lore.kernel.org/lkml


Re: Measuring Community Involvement (was Re: Contributor Summit planning)

2018-08-14 Thread Junio C Hamano
Jeff King  writes:

> On Tue, Aug 14, 2018 at 01:43:38PM -0400, Derrick Stolee wrote:
>
>> On 8/13/2018 5:54 PM, Jeff King wrote:
>> > So I try not to think too hard on metrics, and just use them to get a
>> > rough view on who is active.
>> 
>> I've been very interested in measuring community involvement, with the
>> knowledge that any metric is flawed and we should not ever say "this metric
>> is how we measure the quality of a contributor". It can be helpful, though,
>> to track some metrics and their change over time.
>> 
>> Here are a few measurements we can make:
>
> Thanks, it was nice to see a more comprehensive list in one spot.
>
> It would be neat to have a tool that presents all of these
> automatically, but I think the email ones are pretty tricky (most people
> don't have the whole list archive sitting around).

I do not think it covered e-mail at all, but there was git stats
project several years ago (perhaps part of GSoC IIRC).

> I think I mentioned "surviving lines" elsewhere, which I do like this
> (and almost certainly stole from Junio a long time ago):

Yeah, I recall that one as part of counting how many of 1244 lines
Linus originally wrote still were in our codebase at around v1.6.0
timeframe (the answer was ~220 IIRC) ;-)



Re: Measuring Community Involvement (was Re: Contributor Summit planning)

2018-08-14 Thread Jeff King
On Tue, Aug 14, 2018 at 12:47:59PM -0700, Stefan Beller wrote:

> On Tue, Aug 14, 2018 at 12:36 PM Jeff King  wrote:
> 
> > Thanks, it was nice to see a more comprehensive list in one spot.
> >
> > It would be neat to have a tool that presents all of these
> > automatically, but I think the email ones are pretty tricky (most people
> > don't have the whole list archive sitting around).
> 
> With the advent of public inbox, this is easy to obtain?

For our project, yes. But I was thinking of a tool that could be used
for other projects, too.

> > At one point I sent a patch series that would let shortlog group by
> > trailers. Nobody seemed all that interested and I didn't end up using it
> > for its original purpose, so I didn't polish it further.  But I'd be
> > happy to re-submit it if you think it would be useful.
> 
> I would think it is useful. Didn't Linus also ask for a related thing?
> https://public-inbox.org/git/CA+55aFzWkE43rSm-TJNKkHq4F3eOiGR0-Bo9V1=a1s=vq0k...@mail.gmail.com/

He wanted grouping by committer, which we ended up adding as a separate
feature. I think there's some discussion of the trailer thing in that
thread.

-Peff


Re: Measuring Community Involvement (was Re: Contributor Summit planning)

2018-08-14 Thread Stefan Beller
On Tue, Aug 14, 2018 at 12:36 PM Jeff King  wrote:

> Thanks, it was nice to see a more comprehensive list in one spot.
>
> It would be neat to have a tool that presents all of these
> automatically, but I think the email ones are pretty tricky (most people
> don't have the whole list archive sitting around).

With the advent of public inbox, this is easy to obtain?

>
> > 2. Number of other commit tag-lines (Reviewed-By, Helped-By, Reported-By,
> > etc.).
> >
> > Using git repo:
> >
> > $ git log --since=2018-01-01 junio/next|grep by:|grep -v
> > Signed-off-by:|sort|uniq -c|sort -nr|head -n 20
>
> At one point I sent a patch series that would let shortlog group by
> trailers. Nobody seemed all that interested and I didn't end up using it
> for its original purpose, so I didn't polish it further.  But I'd be
> happy to re-submit it if you think it would be useful.

I would think it is useful. Didn't Linus also ask for a related thing?
https://public-inbox.org/git/CA+55aFzWkE43rSm-TJNKkHq4F3eOiGR0-Bo9V1=a1s=vq0k...@mail.gmail.com/


Re: Measuring Community Involvement (was Re: Contributor Summit planning)

2018-08-14 Thread Jeff King
On Tue, Aug 14, 2018 at 01:43:38PM -0400, Derrick Stolee wrote:

> On 8/13/2018 5:54 PM, Jeff King wrote:
> > So I try not to think too hard on metrics, and just use them to get a
> > rough view on who is active.
> 
> I've been very interested in measuring community involvement, with the
> knowledge that any metric is flawed and we should not ever say "this metric
> is how we measure the quality of a contributor". It can be helpful, though,
> to track some metrics and their change over time.
> 
> Here are a few measurements we can make:

Thanks, it was nice to see a more comprehensive list in one spot.

It would be neat to have a tool that presents all of these
automatically, but I think the email ones are pretty tricky (most people
don't have the whole list archive sitting around).

> 2. Number of other commit tag-lines (Reviewed-By, Helped-By, Reported-By,
> etc.).
> 
>     Using git repo:
> 
>     $ git log --since=2018-01-01 junio/next|grep by:|grep -v
> Signed-off-by:|sort|uniq -c|sort -nr|head -n 20

At one point I sent a patch series that would let shortlog group by
trailers. Nobody seemed all that interested and I didn't end up using it
for its original purpose, so I didn't polish it further.  But I'd be
happy to re-submit it if you think it would be useful.

The shell hackery here isn't too bad, but doing it internally is a
little faster, a little more robust (less parsing), and lets you show
more details about the commits themselves (e.g., who reviews whom).

> 3. Number of threads started by user.

You have "started" and "participated in". I guess one more would be
"closed", as in "solved a bug", but that is quite hard to tell without
looking at the content. Taking just the last person in a thread as the
closer means that an OP saying "thanks!" wrecks it. And somebody who
rants long enough that everybody else loses interest gets marked as a
closer. ;)

> If you have other ideas for fun measurements, then please let me know.

I think I mentioned "surviving lines" elsewhere, which I do like this
(and almost certainly stole from Junio a long time ago):

  # Obviously you can tweak this as you like, but the mass-imported bits
  # in compat and xdiff tend to skew the counts. It's possibly worth
  # counting language lines separately.
  git ls-files '*.c' '*.h' :^compat :^contrib :^xdiff |
  while read fn; do
# eye candy
echo >&2 "Blaming $fn..."

# You can use more/fewer -C to dig more or less for code moves.
# Possibly "-w" would help, though I doubt it shifts things more
# than a few percent anyway.
git blame -C --line-porcelain $fn
  done |
  perl -lne '/^author (.*)/ and print $1' |
  sort | uniq -c | sort -rn | head

The output right now is:

  35156 Junio C Hamano
  22207 Jeff King
  17466 Nguyễn Thái Ngọc Duy
  12005 Johannes Schindelin
  10259 Michael Haggerty
   9389 Linus Torvalds
   8318 Brandon Williams
   7776 Stefan Beller
   5947 Christian Couder
   4935 René Scharfe

which seems reasonable.

-Peff


Measuring Community Involvement (was Re: Contributor Summit planning)

2018-08-14 Thread Derrick Stolee

On 8/13/2018 5:54 PM, Jeff King wrote:

So I try not to think too hard on metrics, and just use them to get a
rough view on who is active.


I've been very interested in measuring community involvement, with the 
knowledge that any metric is flawed and we should not ever say "this 
metric is how we measure the quality of a contributor". It can be 
helpful, though, to track some metrics and their change over time.


Here are a few measurements we can make:

1. Number of (non-merge) commit author tag-lines.

    using git repo:

  > git shortlog --no-merges --since 2017 -sne junio/next | head -n 20
   284  Nguyễn Thái Ngọc Duy 
   257  Jeff King 
   206  Stefan Beller 
   192  brian m. carlson 
   159  Brandon Williams 
   149  Junio C Hamano 
   137  Elijah Newren 
   116  René Scharfe 
   112  Johannes Schindelin 
   105  Ævar Arnfjörð Bjarmason 
    96  Jonathan Tan 
    93  SZEDER Gábor 
    78  Derrick Stolee 
    76  Martin Ågren 
    66  Michael Haggerty 
    61  Eric Sunshine 
    46  Christian Couder 
    36  Phillip Wood 
    35  Jonathan Nieder 
    33  Thomas Gummerer 

2. Number of other commit tag-lines (Reviewed-By, Helped-By, 
Reported-By, etc.).


    Using git repo:

    $ git log --since=2018-01-01 junio/next|grep by:|grep -v 
Signed-off-by:|sort|uniq -c|sort -nr|head -n 20


 66 Reviewed-by: Stefan Beller 
 22 Reviewed-by: Jeff King 
 19 Reviewed-by: Jonathan Tan 
 12 Helped-by: Eric Sunshine 
 11 Helped-by: Junio C Hamano 
  9 Helped-by: Jeff King 
  8 Reviewed-by: Elijah Newren 
  7 Reported-by: Ramsay Jones 
  7 Acked-by: Johannes Schindelin 
  7 Acked-by: Brandon Williams 
  6 Reviewed-by: Eric Sunshine 
  6 Helped-by: Johannes Schindelin 
  5 Mentored-by: Christian Couder 
  5 Acked-by: Johannes Schindelin 
  4 Reviewed-by: Jonathan Nieder 
  4 Reviewed-by: Johannes Schindelin 
  4 Helped-by: Stefan Beller 
  4 Helped-by: René Scharfe 
  3 Reviewed-by: Martin Ågren 
  3 Reviewed-by: Lars Schneider 

    (There does not appear to be enough density here to make a useful 
metric.)


3. Number of email messages sent.

    Using mailing list repo:

$ git shortlog --since 2017 -sne | head -n 20
  3749  Junio C Hamano 
  2213  Stefan Beller 
  2112  Jeff King 
  1106  Nguyễn Thái Ngọc Duy 
  1028  Johannes Schindelin 
   965  Ævar Arnfjörð Bjarmason 
   956  Brandon Williams 
   947  Eric Sunshine 
   890  Elijah Newren 
   753  brian m. carlson 
   677  Duy Nguyen 
   646  Jonathan Nieder 
   629  Derrick Stolee 
   545  Christian Couder 
   515  Jonathan Tan 
   425  Johannes Schindelin 
   425  Martin Ågren 
   420  Jeff Hostetler 
   420  SZEDER Gábor 
   363  Phillip Wood 

3. Number of threads started by user.

    (For this and the measurements below, I imported emails into a SQL 
table with columns [commit, author, date, message-id, in-reply-to, 
subject] and ran queries)


SELECT TOP 20
       COUNT(*) as NumSent
  ,[Author]
  FROM [git].[dbo].[mailing-list]
  WHERE [In-Reply-To] = ''
        AND CONVERT(DATETIME,[Date]) > CONVERT(DATETIME, '01-01-2018 
00:00')

GROUP BY [Author]
ORDER BY NumSent DESC

| NumSent | Author |
|-||
| 76  | Junio C Hamano |
| 64  | Stefan Beller  |
| 54  | Philip Oakley  |
| 50  | Nguyá»…n Thái Ngọc Duy   |
| 49  | Robert P. J. Day   |
| 47  | Christian Couder   |
| 36  | Ramsay Jones   |
| 34  | Elijah Newren  |
| 34  | SZEDER Gábor  |
| 33  | Johannes Schindelin    |
| 31  | Jeff King  |
| 30  | Ævar Arnfjörð Bjarmason |
| 24  | Jonathan Tan   |
| 22  | Alban Gruin    |
| 22  | brian m. carlson   |
| 18  | Randall S. Becker  |
| 15  | Paul-Sebastian Ungureanu   |
| 15  | Jeff Hostetler     |
| 15  | Brandon Williams   |
| 15  | Luke Diamand   |

4. Number of threads where the user participated

(This is measured by completing the transitive closure of In-Reply-To 
edges into a new 'BaseMessage' column.)


SELECT TOP 20
       COUNT(BaseMessage) as NumResponded
  ,Author
  FROM [git].[dbo].[mailing-list]
  WHERE [In-Reply-To] <> ''
        AND CONVERT(DATETIME,[Date]) > CONVERT(DATETIME, '01-01-2018 
00:00')

GROUP BY Author
ORDER BY NumResponded DESC

| NumResponded | Author |
|--||
| 2084 | Junio C Hamano |
| 1596 | Stefan Beller  |
| 1211 | Jeff King  |
| 1120 | Johannes Schindelin    |
| 1021 | Nguyá»…n Thái Ngọc Duy   |
| 799  | Eric Sunshine  |
| 797  | Ævar Arnfjörð Bjarmason |
| 693