Re: [PATCH v6] Add new git-related helper to contrib

2013-05-23 Thread Junio C Hamano
Felipe Contreras felipe.contre...@gmail.com writes:

 On Wed, May 22, 2013 at 11:07 PM, Felipe Contreras
 felipe.contre...@gmail.com wrote:
 On Wed, May 22, 2013 at 7:08 PM, Junio C Hamano gits...@pobox.com wrote:
 Felipe Contreras felipe.contre...@gmail.com writes:

 IIRC, git-gui runs two blames, one without any -C and one with (I do
 not offhand recall how many -C it uses) to show both.

 'git blame' is a very expensive operation, perhaps we should add
 another option so users don't need to run two blames to find this.

 Yes, you could add a struct origin *suspect_in_the_same_file next
 to the current struct origin *suspect to the blame_entry in order
 to represent the origin you would get if you were to stop at a move
 across files event, and keep digging, when you are operating under
 -C -C or higher mode.

 No, this is what I meant:

 But that would not print the right line numbers, so perhaps:

Users of full-output may want to be able to see the same thing.

I have a suspicion that you misread what assign_blame() does.  The
first inner loop to find one suspect is really what it says.  It
loops over blame entries in the scoreboard but it is not about
finding one entry, but is to find one suspect.  The ent found
in the loop that you store in tmp_ent is no more special than any
other ent that shares the same suspect as its origin.

Imagine that your scoreboard originally has three blocks of text
(i.e. blame_entry) A, B, and C, and the current suspect for A and C
are the same, while we already know what the final guilty party for
B is (which may be some descendant of the suspect).

Once we find one suspect in the first inner loop, the call to
pass_blame() does everything it can do to exonerate that suspect.

It runs a single diff between the suspect and each of its parents to
find lines in both A and C that can be blamed to one of these
parents, and blame entries A and C are split into pieces, some of
which still have the same suspect as their origin, which are where
their lines originate from, while others are attributable to these
parents or their ancestors.

If you keep the original entry for A to do something special, like
printing what the original range of A was before it was split by
pass_blame(), wouldn't you need to do the same for C?  We will not
be visiting C later in the outer while(1) loop, as a single call to
pass_blame() for suspect we did when we found it in A has already
taken care of it.

I am not sure what you are trying to achieve with that found-guilty
logic, even if the save away in tmp_ent logic is changed to keep
all the blame entries that have suspect we are looking at as their
origin.  When the current suspect is found to have passed all lines
intact from its parents, we will see found-guilty left to be false.

How would it make the original blame_entry (perhaps halfway) guilty
in that situation?

 --- a/builtin/blame.c
 +++ b/builtin/blame.c
 @@ -1500,25 +1500,16 @@ static int emit_one_suspect_detail(struct
 origin *suspect, int repeat)
   return 1;
  }

 -/*
 - * The blame_entry is found to be guilty for the range.  Mark it
 - * as such, and show it in incremental output.
 - */
 -static void found_guilty_entry(struct blame_entry *ent)
 +static void print_guilty_entry(struct blame_entry *ent)
  {
 - if (ent-guilty)
 - return;
 - ent-guilty = 1;
 - if (incremental) {
 - struct origin *suspect = ent-suspect;
 -
 - printf(%s %d %d %d\n,
 -sha1_to_hex(suspect-commit-object.sha1),
 -ent-s_lno + 1, ent-lno + 1, ent-num_lines);
 - emit_one_suspect_detail(suspect, 0);
 - write_filename_info(suspect-path);
 - maybe_flush_or_die(stdout, stdout);
 - }
 + struct origin *suspect = ent-suspect;
 +
 + printf(%s %d %d %d\n,
 + sha1_to_hex(suspect-commit-object.sha1),
 + ent-s_lno + 1, ent-lno + 1, ent-num_lines);
 + emit_one_suspect_detail(suspect, 0);
 + write_filename_info(suspect-path);
 + maybe_flush_or_die(stdout, stdout);
  }

  /*
 @@ -1531,14 +1522,19 @@ static void assign_blame(struct scoreboard *sb, int 
 opt)
   struct rev_info *revs = sb-revs;

   while (1) {
 - struct blame_entry *ent;
 + struct blame_entry *ent, tmp_ent;
   struct commit *commit;
   struct origin *suspect = NULL;
 + int found_guilty = 0;

   /* find one suspect to break down */
 - for (ent = sb-ent; !suspect  ent; ent = ent-next)
 - if (!ent-guilty)
 + for (ent = sb-ent; ent; ent = ent-next)
 + if (!ent-guilty) {
 + tmp_ent = *ent;
   suspect = ent-suspect;
 + break;
 + }
 +
   if (!suspect)
   return; /* all done */

 @@ -1564,9 +1560,20 @@ static 

Re: [PATCH v6] Add new git-related helper to contrib

2013-05-23 Thread Junio C Hamano
Felipe Contreras felipe.contre...@gmail.com writes:

 On Wed, May 22, 2013 at 10:23 PM, Felipe Contreras
 felipe.contre...@gmail.com wrote:
 This doesn't make any sense:

 Ah, never mind, it's COPYING the one being modified, not EXTRACTING.

Yes.

The different levels of -C happens to correspond to the software
engineering practice from best to sloppy:

 - When you refactor to have a block of lines in a file , the
   original of that block would have come from another file (or the
   same file) you touched to remove duplication, so a single -C
   looks for an origin only from modified files (best).

 - When you start a new file, you may have to start from some
   boilerplate material that already exists in another file that is
   not (and does not have to be) changed in the commit you add that
   new file, so double -C -C looks for an origin of lines from other
   files, even unmodified ones, when looking at the lines in a new
   file (unfortunate but acceptable).

 - When you copy and paste without making any effort to refactor,
   you modify a file by adding new lines that match identically from
   another existing file, the latter of which does not change in the
   commit you do that copy and paste.  To find this, you need to
   look for an origin for any file from all the other files, and
   triple -C -C -C lets you do so (sloppy).

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6] Add new git-related helper to contrib

2013-05-23 Thread Junio C Hamano
Junio C Hamano gits...@pobox.com writes:

 Users of full-output may want to be able to see the same thing.

... but I agree we may be able to cheat if we only want to support
interactive, and we do want to find a way to cheat when we are
running interactive without having to store much more information in
each blame_entry.

 Imagine that your scoreboard originally has three blocks of text
 ...
 in that situation?

(I snipped everything I said in the previous reply only for
brevity but they are still relevant).

I _think_ one way to cheat without storing too much information in
each blame_entry is to first make a copy of all the original blame
entries that share the suspect, see if any line in the line range
represented by each of the blame entries ended up being blamed to an
origin with a path different from that of the suspect.  And print
those original blame entries that fits the criterion as additional
these are not the final blame as you are digging with the option to
detect copy across files, but at this commit this line appeared anew
as far as the origin-path is concerned output.

So I think you were going in the right direction (in other words,
the patch inserted new code to the right places), even though you
may have got the details in what the new code should do wrong.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6] Add new git-related helper to contrib

2013-05-23 Thread Felipe Contreras
On Thu, May 23, 2013 at 11:54 AM, Junio C Hamano gits...@pobox.com wrote:
 Felipe Contreras felipe.contre...@gmail.com writes:

 On Wed, May 22, 2013 at 11:07 PM, Felipe Contreras
 felipe.contre...@gmail.com wrote:
 On Wed, May 22, 2013 at 7:08 PM, Junio C Hamano gits...@pobox.com wrote:
 Felipe Contreras felipe.contre...@gmail.com writes:

 IIRC, git-gui runs two blames, one without any -C and one with (I do
 not offhand recall how many -C it uses) to show both.

 'git blame' is a very expensive operation, perhaps we should add
 another option so users don't need to run two blames to find this.

 Yes, you could add a struct origin *suspect_in_the_same_file next
 to the current struct origin *suspect to the blame_entry in order
 to represent the origin you would get if you were to stop at a move
 across files event, and keep digging, when you are operating under
 -C -C or higher mode.

 No, this is what I meant:

 But that would not print the right line numbers, so perhaps:

 Users of full-output may want to be able to see the same thing.

 I have a suspicion that you misread what assign_blame() does.  The
 first inner loop to find one suspect is really what it says.  It
 loops over blame entries in the scoreboard but it is not about
 finding one entry, but is to find one suspect.  The ent found
 in the loop that you store in tmp_ent is no more special than any
 other ent that shares the same suspect as its origin.

It is stored because it's the only structure that has the line
numbers. If the blame is passed to another 'ent', the original line
numbers are gone. We could manually copy the line numbers to three
variables, or we could copy the whole thing to one variable. The rest
of the 'ent' doesn't matter.

 Imagine that your scoreboard originally has three blocks of text
 (i.e. blame_entry) A, B, and C, and the current suspect for A and C
 are the same, while we already know what the final guilty party for
 B is (which may be some descendant of the suspect).

I don't see how that's possible, but whatever.

 Once we find one suspect in the first inner loop, the call to
 pass_blame() does everything it can do to exonerate that suspect.

 It runs a single diff between the suspect and each of its parents to
 find lines in both A and C that can be blamed to one of these
 parents, and blame entries A and C are split into pieces, some of
 which still have the same suspect as their origin, which are where
 their lines originate from, while others are attributable to these
 parents or their ancestors.

 If you keep the original entry for A to do something special, like
 printing what the original range of A was before it was split by
 pass_blame(), wouldn't you need to do the same for C?

tmp_ent is only used when there's no entry taken the guilt away from
A. If the blame entry of A is split, and the result of the split has a
different filename, found_guilty() would not be called, and thus we
won't show the blame entry, and we want to show it.

And yes, we would need to do the same for C, and we would once the
turn of C comes along. If it's possible that A descendant from A takes
the blame for C, then sure, we would need to do ensure C is printed
before it's gone, but pass_blame(A) would not destroy C.

 We will not
 be visiting C later in the outer while(1) loop, as a single call to
 pass_blame() for suspect we did when we found it in A has already
 taken care of it.

It took care of A's suspect, but not C. Isn't it possible that C is
split further and creates another blame entry? Either way, in
--incremental we want to print C as well, whether it's the guilty one
or not.

 I am not sure what you are trying to achieve with that found-guilty
 logic, even if the save away in tmp_ent logic is changed to keep
 all the blame entries that have suspect we are looking at as their
 origin.

 When the current suspect is found to have passed all lines
 intact from its parents, we will see found-guilty left to be false.

What? When a 'blame entry' passes the whole blame to a parent,
found_guilty is false, so we print the entry, which is exactly what we
want to do.

 How would it make the original blame_entry (perhaps halfway) guilty
 in that situation?

I thought 'guilty' meant that it was guilty of adding those lines to
the 'final' text, so it ends up in the non-incremental blame. In that
sense those blame entries are not guilty.

But they are still blame entries, and we want to print all blame
entries in incremental, guilty or not.

-- 
Felipe Contreras
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6] Add new git-related helper to contrib

2013-05-23 Thread Junio C Hamano
Felipe Contreras felipe.contre...@gmail.com writes:

 Imagine that your scoreboard originally has three blocks of text
 (i.e. blame_entry) A, B, and C, and the current suspect for A and C
 are the same, while we already know what the final guilty party for
 B is (which may be some descendant of the suspect).

 I don't see how that's possible, but whatever.

The tree in your latest commit HEAD has a file with block of text A
followed by block of text B followed by block of text C.  The latest
commit was the one that added B (perhaps new lines were inserted, or
perhaps existing contiguous block of text was replaced, there is no
difference between these two cases).  You start digging the history
of this file from HEAD.

Your scoreboard begins with a single blame-entry that covers all
three segments, with its suspect set to HEAD.  Then pass_blame()
discovers that top and bottom segments are older than this latest
commit, and splits the originally-one blame entry into three blame
entries.  The first and the third blame entries cover the block A
and the block C respectively, and their suspect fields are both set
to HEAD^.  The second blame entry covers the block B and its suspect
does not change (it still is HEAD).  Then it returns to the caller.

The caller of pass_blame() looks at the scoreboard and finds these
three blame entries.  The second one's supect is still the original
suspect the caller asked pass_blame() to exonerate blames for, and
the suspect failed to do so for block B.  The final blame for the
middle block is now known to be HEAD.

After all of the above, the next iteration of while(1) loop begins.
That is how you arrive to the whatever situation.  You have three
blame entries, A, B and C, and suspect of A and C are the same,
while B has a different suspect.

Then in that next iteration, we pick blame entry for A and decide
to see if HEAD^, which is the suspect for A, can be exonerated of
blames for _any_ (not just A) blame entry it currently is suspected
for.  We call pass_blame() and it will find and process both A and
C with a single git diff-tree, attempting to pass blame to HEAD^^
and its ancestors.


--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6] Add new git-related helper to contrib

2013-05-23 Thread Felipe Contreras
On Thu, May 23, 2013 at 4:52 PM, Junio C Hamano gits...@pobox.com wrote:
 Felipe Contreras felipe.contre...@gmail.com writes:

 Imagine that your scoreboard originally has three blocks of text
 (i.e. blame_entry) A, B, and C, and the current suspect for A and C
 are the same, while we already know what the final guilty party for
 B is (which may be some descendant of the suspect).

 I don't see how that's possible, but whatever.

 The tree in your latest commit HEAD has a file with block of text A
 followed by block of text B followed by block of text C.  The latest
 commit was the one that added B (perhaps new lines were inserted, or
 perhaps existing contiguous block of text was replaced, there is no
 difference between these two cases).  You start digging the history
 of this file from HEAD.

 Your scoreboard begins with a single blame-entry that covers all
 three segments, with its suspect set to HEAD.  Then pass_blame()
 discovers that top and bottom segments are older than this latest
 commit, and splits the originally-one blame entry into three blame
 entries.  The first and the third blame entries cover the block A
 and the block C respectively, and their suspect fields are both set
 to HEAD^.  The second blame entry covers the block B and its suspect
 does not change (it still is HEAD).  Then it returns to the caller.

 The caller of pass_blame() looks at the scoreboard and finds these
 three blame entries.  The second one's supect is still the original
 suspect the caller asked pass_blame() to exonerate blames for, and
 the suspect failed to do so for block B.  The final blame for the
 middle block is now known to be HEAD.

 After all of the above, the next iteration of while(1) loop begins.
 That is how you arrive to the whatever situation.  You have three
 blame entries, A, B and C, and suspect of A and C are the same,
 while B has a different suspect.

 Then in that next iteration, we pick blame entry for A and decide
 to see if HEAD^, which is the suspect for A, can be exonerated of
 blames for _any_ (not just A) blame entry it currently is suspected
 for.  We call pass_blame() and it will find and process both A and
 C with a single git diff-tree, attempting to pass blame to HEAD^^
 and its ancestors.

All right, my code still works in that situation.

-- 
Felipe Contreras
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6] Add new git-related helper to contrib

2013-05-23 Thread Junio C Hamano
Felipe Contreras felipe.contre...@gmail.com writes:

 Then in that next iteration, we pick blame entry for A and decide
 to see if HEAD^, which is the suspect for A, can be exonerated of
 blames for _any_ (not just A) blame entry it currently is suspected
 for.  We call pass_blame() and it will find and process both A and
 C with a single git diff-tree, attempting to pass blame to HEAD^^
 and its ancestors.

 All right, my code still works in that situation.

When HEAD^ was processed, pass_blame() finds that the first line in
A is attributable to HEAD^ (our current suspect) while the rest were
copied from a different file in HEAD^^ .  All lines in C are found
to be copy from a different file in HEAD^^.

Then your scoreboard would have:

  1. a blame entry that failed to pass blame to parents of HEAD^ (the
 first line in A), which still has suspect == HEAD^

  2. a blame entry that covers the remainder of A, blaming HEAD^^.

  3. a blame entry that covers all of B, whose final guilty party is
 known already.

  4. a blame entry that covers all of C, blaming a different file in
 HEAD^^.

Your Take responsibility loop goes over these blame entries, sets
found_guilty to true because of the first blame entry (the first
line of A), and calls print_guilty_entry() for blame entry 1,
showing that HEAD^ is guilty for the first line of A.

After the loop, your if we did not find anybody guilty logic does
not kick in, and the original line range for block A you saved in
tmp_ent is not used.

You lost the fact that the second and remainder of A were in this
file at the commit HEAD^ but not in HEAD^^ (i.e. these lines were
moved by HEAD^ from elsewhere).

The fact that HEAD^ touched _something_ is not lost, so if _all_ you
care about is list all the commits, but I do not care about what
line range was modified how, you can claim it working, but that
is a very limited definition of working and is not very reusable
or extensible in order to help those like gitk that currently have
to run two separate blames.  They need an output that lets them tell
between

 - this is the earliest we saw these lines in the path (it may have
   been copied from another path, but this entry in the incremental
   output stream is not about that final blame); and

 - this is the final blame where these lines came from, possibly
   from different path

and coalesce both kind of origin..

Also the fact that the entire C was copied from elsewhere by HEAD^
is lost but that is the same issue.  The next round will not find
any blame entry that is !ent-guity because the call to pass_blame()
for HEAD^ already handled the final blame not just for blame entries
1 and 2, but also for 4 during this round.

If the change in HEAD^ in the above example were to copy the whole A
and C from a different file, then your !found_guilty logic would
kick in and say all lines of A where copied from elsewhere in HEAD^,
but again we would not learn the same information for C.

I do not think I care only about a single bit per commit, i.e. if
the commit touched the path; screw everybody else who wants to
actually know where each line came from can be called working.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6] Add new git-related helper to contrib

2013-05-23 Thread Felipe Contreras
On Thu, May 23, 2013 at 5:44 PM, Junio C Hamano gits...@pobox.com wrote:
 Felipe Contreras felipe.contre...@gmail.com writes:

 Then in that next iteration, we pick blame entry for A and decide
 to see if HEAD^, which is the suspect for A, can be exonerated of
 blames for _any_ (not just A) blame entry it currently is suspected
 for.  We call pass_blame() and it will find and process both A and
 C with a single git diff-tree, attempting to pass blame to HEAD^^
 and its ancestors.

 All right, my code still works in that situation.

 When HEAD^ was processed, pass_blame() finds that the first line in
 A is attributable to HEAD^ (our current suspect) while the rest were
 copied from a different file in HEAD^^ .  All lines in C are found
 to be copy from a different file in HEAD^^.

 Then your scoreboard would have:

   1. a blame entry that failed to pass blame to parents of HEAD^ (the
  first line in A), which still has suspect == HEAD^

   2. a blame entry that covers the remainder of A, blaming HEAD^^.

   3. a blame entry that covers all of B, whose final guilty party is
  known already.

   4. a blame entry that covers all of C, blaming a different file in
  HEAD^^.

 Your Take responsibility loop goes over these blame entries, sets
 found_guilty to true because of the first blame entry (the first
 line of A), and calls print_guilty_entry() for blame entry 1,
 showing that HEAD^ is guilty for the first line of A.

 After the loop, your if we did not find anybody guilty logic does
 not kick in, and the original line range for block A you saved in
 tmp_ent is not used.

 You lost the fact that the second and remainder of A were in this
 file at the commit HEAD^ but not in HEAD^^ (i.e. these lines were
 moved by HEAD^ from elsewhere).

 The fact that HEAD^ touched _something_ is not lost, so if _all_ you
 care about is list all the commits, but I do not care about what
 line range was modified how, you can claim it working,

The line range printed is correct.

 but that
 is a very limited definition of working and is not very reusable
 or extensible in order to help those like gitk that currently have
 to run two separate blames.  They need an output that lets them tell
 between

  - this is the earliest we saw these lines in the path (it may have
been copied from another path, but this entry in the incremental
output stream is not about that final blame); and

  - this is the final blame where these lines came from, possibly
from different path

 and coalesce both kind of origin..

They can already do that by looking at the lines; if they get replaced
by another entry; they are not guilty.

Or we can add a 'guilty' line to the incremental output, that's trivial.

 Also the fact that the entire C was copied from elsewhere by HEAD^
 is lost but that is the same issue.  The next round will not find
 any blame entry that is !ent-guity because the call to pass_blame()
 for HEAD^ already handled the final blame not just for blame entries
 1 and 2, but also for 4 during this round.

No, that's not true. If it's a different file the blame entry is not
found guilty.

 If the change in HEAD^ in the above example were to copy the whole A
 and C from a different file, then your !found_guilty logic would
 kick in and say all lines of A where copied from elsewhere in HEAD^,
 but again we would not learn the same information for C.

We would, when it's the turn for C, which is not guilty at this point.

 I do not think I care only about a single bit per commit, i.e. if
 the commit touched the path; screw everybody else who wants to
 actually know where each line came from can be called working.

They can know where each line came from; the lines of each entry are
printed properly; that's the whole point of 'tmp_ent'.

-- 
Felipe Contreras
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6] Add new git-related helper to contrib

2013-05-23 Thread Felipe Contreras
On Thu, May 23, 2013 at 6:47 PM, Junio C Hamano gits...@pobox.com wrote:
 Felipe Contreras felipe.contre...@gmail.com writes:

 If the change in HEAD^ in the above example were to copy the whole A
 and C from a different file, then your !found_guilty logic would
 kick in and say all lines of A where copied from elsewhere in HEAD^,
 but again we would not learn the same information for C.

 We would, when it's the turn for C, which is not guilty at this point.

 In _this_ round of the while(1) loop, pass_blame_to_parent() gets
 the scoreboard and two origins (HEAD^ that we are looking at and
 HEAD^^ that is its parent); it does not even know what blame entry
 this request came from.

 It runs a single diff using diff_hunks(), and asks blame_chunk() to
 split all the blame entries in the scoreboard that suspect it is
 looking at may be guilty for.  Blame entry for A and C are both
 processed exactly the same way when HEAD^ is given to pass_blame()
 for the first time, which is when assign_blame() decided to call it
 with HEAD^ because it happened to have seen A before seeing C.  At
 that point, both A and C are processed, and the post-processing loop
 Take responsibility for the remaining will clean up remnants from
 both A and C.  After this round ends, the suspect for A and C are
 both set to HEAD^^.

 In the next round of the while(1) loop, C already forgot that its
 line movement happened in HEAD^.  Its suspect is now HEAD^^.  When
 it's the turn for C [*1*], you can say These lines originate in
 that different path in HEAD^^, but it is too late to say But the
 first time they appeared in the original file was HEAD^ (which is
 when they were moved from the different path in HEAD^^), isn't it?

If that's the case then we'll need another list of blame entries where
each discarded blame entry goes. Given the luck of my previous obvious
patches, I'm not interested in implementing this non-obvious one in
the least.

The point is 'git related' should do -C -C -C, if 'git blame' doesn't
throw the right output, that's a bug in 'git blame' not 'git related.

-- 
Felipe Contreras
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6] Add new git-related helper to contrib

2013-05-22 Thread Junio C Hamano
Felipe Contreras felipe.contre...@gmail.com writes:

 diff --git a/contrib/related/git-related b/contrib/related/git-related
 new file mode 100755
 index 000..b96dcdd
 --- /dev/null
 +++ b/contrib/related/git-related
 @@ -0,0 +1,124 @@
 +#!/usr/bin/env ruby
 +
 +# This script finds people that might be interested in a patch
 +# usage: git related file
 +
 +$since = '5-years-ago'
 +$min_percent = 10
 +
 +def fmt_person(name, email)
 +  '%s %s' % [name, email]
 +end

Micronit.  I suspect you do not need this helper, unless later
patches start using it.

 +  def import
 +return if @items.empty?
 +File.popen(%w[git cat-file --batch], 'r+') do |p|
 +  p.write(@items.keys.join(\n))
 +  p.close_write
 +  p.each do |line|
 +if line =~ /^(\h{40}) commit (\d+)/
 +  id, len = $1, $2
 +  data = p.read($2.to_i)
 +  @items[id].parse(data)
 +end
 +  end
 +end
 +  end
 +
 +  def get_blame(source, start, len, from)
 +return if len == 0
 +len ||= 1
 +File.popen(['git', 'blame', '--incremental', '-CCC',

I am torn on the hardcoded use of -CCC here.

Depending on the nature of the change in question, it may match well
or worse to what you are trying to find out.  When you are trying to
say What were you smoking when you implemented this broken logic?,
using -C may be good, but when your question is Even though all the
callers of this function live in that other file, somebody moved
this function that used to be file static in that file to here and
made it public. Why?, you do not want to use -C.

I am reasonably sure that in the finished code later in the series
it will become configurable, but a fallback default is better to be
not so expensive one.

 +   '-L', '%u,+%u' % [start, len],
 +   '--since', $since, from + '^',

Is from unconditionally set?

Perhaps that nil + '^' magically disappear and this code is relying
on that, but it smells like a too much magic to me.

 +   '--', source]) do |p|
 +  p.each do |line|
 +if line =~ /^(\h{40})/
 +  id = $
 +  @items[id] = Commit.new(id)
 +end
 +  end
 +end
 +  end
 +
 +  def from_patch(file)
 +from = source = nil
 +File.open(file) do |f|
 +  f.each do |line|
 +case line
 +when /^From (\h+) (.+)$/
 +  from = $1
 +when /^---\s+(\S+)/
 +  source = $1 != '/dev/null' ? $1[2..-1] : nil
 +when /^@@ -(\d+)(?:,(\d+))?/
 +  get_blame(source, $1, $2, from)
 +end

Makes sense to start from the preimage so that you can find out who
wrote the original block of lines your patch is removing.

But then if source is /dev/null, wouldn't you be able to stop
without running blame at all?  You know the patch is creating a new
file at that point and there is nobody to point a finger at.

 +  end
 +end
 +  end
 +
 +end
 +
 +exit 1 if ARGV.size != 1
 +
 +commits = Commits.new
 +commits.from_patch(ARGV[0])
 +commits.import
 +
 +count_per_person = Hash.new(0)
 +
 +commits.each do |id, commit|
 +  commit.persons.each do |person|
 +count_per_person[person] += 1
 +  end
 +end
 +
 +count_per_person.each do |person, count|
 +  percent = count.to_f * 100 / commits.size
 +  next if percent  $min_percent
 +  puts person
 +end
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6] Add new git-related helper to contrib

2013-05-22 Thread Junio C Hamano
Junio C Hamano gits...@pobox.com writes:

 Felipe Contreras felipe.contre...@gmail.com writes:

 diff --git a/contrib/related/git-related b/contrib/related/git-related
 new file mode 100755
 index 000..b96dcdd
 --- /dev/null
 +++ b/contrib/related/git-related
 @@ -0,0 +1,124 @@
 +#!/usr/bin/env ruby
 +
 +# This script finds people that might be interested in a patch
 +# usage: git related file
 +
 +$since = '5-years-ago'
 +$min_percent = 10
 +
 +def fmt_person(name, email)
 +  '%s %s' % [name, email]
 +end

 Micronit.  I suspect you do not need this helper, unless later
 patches start using it.

Not that matters terribly, but later patches start using it in a
different way may be needed as a clarification.

The current two callers capture both %s %s as two separate
groups but they do not need to; they can do (%s %s) and pass $1
to this function, I think.


--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6] Add new git-related helper to contrib

2013-05-22 Thread Felipe Contreras
On Wed, May 22, 2013 at 2:23 PM, Junio C Hamano gits...@pobox.com wrote:
 Felipe Contreras felipe.contre...@gmail.com writes:

 diff --git a/contrib/related/git-related b/contrib/related/git-related
 new file mode 100755
 index 000..b96dcdd
 --- /dev/null
 +++ b/contrib/related/git-related
 @@ -0,0 +1,124 @@
 +#!/usr/bin/env ruby
 +
 +# This script finds people that might be interested in a patch
 +# usage: git related file
 +
 +$since = '5-years-ago'
 +$min_percent = 10
 +
 +def fmt_person(name, email)
 +  '%s %s' % [name, email]
 +end

 Micronit.  I suspect you do not need this helper, unless later
 patches start using it.

It's not *needed*, but it makes if fulfills the role of a function: to
avoid typing that code in multiple places.

 +  def import
 +return if @items.empty?
 +File.popen(%w[git cat-file --batch], 'r+') do |p|
 +  p.write(@items.keys.join(\n))
 +  p.close_write
 +  p.each do |line|
 +if line =~ /^(\h{40}) commit (\d+)/
 +  id, len = $1, $2
 +  data = p.read($2.to_i)
 +  @items[id].parse(data)
 +end
 +  end
 +end
 +  end
 +
 +  def get_blame(source, start, len, from)
 +return if len == 0
 +len ||= 1
 +File.popen(['git', 'blame', '--incremental', '-CCC',

 I am torn on the hardcoded use of -CCC here.

 Depending on the nature of the change in question, it may match well
 or worse to what you are trying to find out.  When you are trying to
 say What were you smoking when you implemented this broken logic?,
 using -C may be good, but when your question is Even though all the
 callers of this function live in that other file, somebody moved
 this function that used to be file static in that file to here and
 made it public. Why?, you do not want to use -C.

 I am reasonably sure that in the finished code later in the series
 it will become configurable, but a fallback default is better to be
 not so expensive one.

The script's purpose is to find related commits, -CCC does that, does it not?

 +   '-L', '%u,+%u' % [start, len],
 +   '--since', $since, from + '^',

 Is from unconditionally set?

 Perhaps that nil + '^' magically disappear and this code is relying
 on that, but it smells like a too much magic to me.

I personally don't care. You decide what's the behavior when no 'From
' line is available in the patch. I don't see the point in worrying
about non-git patches.

 +   '--', source]) do |p|
 +  p.each do |line|
 +if line =~ /^(\h{40})/
 +  id = $
 +  @items[id] = Commit.new(id)
 +end
 +  end
 +end
 +  end
 +
 +  def from_patch(file)
 +from = source = nil
 +File.open(file) do |f|
 +  f.each do |line|
 +case line
 +when /^From (\h+) (.+)$/
 +  from = $1
 +when /^---\s+(\S+)/
 +  source = $1 != '/dev/null' ? $1[2..-1] : nil
 +when /^@@ -(\d+)(?:,(\d+))?/
 +  get_blame(source, $1, $2, from)
 +end

 Makes sense to start from the preimage so that you can find out who
 wrote the original block of lines your patch is removing.

 But then if source is /dev/null, wouldn't you be able to stop
 without running blame at all?  You know the patch is creating a new
 file at that point and there is nobody to point a finger at.

A patch can touch multiple files.

-- 
Felipe Contreras
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6] Add new git-related helper to contrib

2013-05-22 Thread Junio C Hamano
Felipe Contreras felipe.contre...@gmail.com writes:

 Depending on the nature of the change in question, it may match well
 or worse to what you are trying to find out.  When you are trying to
 say What were you smoking when you implemented this broken logic?,
 using -C may be good, but when your question is Even though all the
 callers of this function live in that other file, somebody moved
 this function that used to be file static in that file to here and
 made it public. Why?, you do not want to use -C.

 I am reasonably sure that in the finished code later in the series
 it will become configurable, but a fallback default is better to be
 not so expensive one.

 The script's purpose is to find related commits, -CCC does that, does it not?

As I already said in the above, the answer is no, when you are
trying to find who moved the code from the original place.

 Makes sense to start from the preimage so that you can find out who
 wrote the original block of lines your patch is removing.

 But then if source is /dev/null, wouldn't you be able to stop
 without running blame at all?  You know the patch is creating a new
 file at that point and there is nobody to point a finger at.

 A patch can touch multiple files.

So?

What line range would you be feeding blame with, for such a file
creation event?
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6] Add new git-related helper to contrib

2013-05-22 Thread Felipe Contreras
On Wed, May 22, 2013 at 5:38 PM, Junio C Hamano gits...@pobox.com wrote:
 Felipe Contreras felipe.contre...@gmail.com writes:

 Depending on the nature of the change in question, it may match well
 or worse to what you are trying to find out.  When you are trying to
 say What were you smoking when you implemented this broken logic?,
 using -C may be good, but when your question is Even though all the
 callers of this function live in that other file, somebody moved
 this function that used to be file static in that file to here and
 made it public. Why?, you do not want to use -C.

 I am reasonably sure that in the finished code later in the series
 it will become configurable, but a fallback default is better to be
 not so expensive one.

 The script's purpose is to find related commits, -CCC does that, does it not?

 As I already said in the above, the answer is no, when you are
 trying to find who moved the code from the original place.

But I'm not trying to find who moved the code, I'm trying to find
related commits; hence the name 'git related'.

The person who moved the code will be on the list regardless, so 'git
related' does find the person who moved the code indirectly; by
finding the persons that touched the code.

 Makes sense to start from the preimage so that you can find out who
 wrote the original block of lines your patch is removing.

 But then if source is /dev/null, wouldn't you be able to stop
 without running blame at all?  You know the patch is creating a new
 file at that point and there is nobody to point a finger at.

 A patch can touch multiple files.

 So?

 What line range would you be feeding blame with, for such a file
 creation event?

I thought it was skipping git blame in such cases, perhaps it got
dropped in one of the other countless versions of this script.

Good catch.

-- 
Felipe Contreras
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6] Add new git-related helper to contrib

2013-05-22 Thread Junio C Hamano
Felipe Contreras felipe.contre...@gmail.com writes:

 On Wed, May 22, 2013 at 5:38 PM, Junio C Hamano gits...@pobox.com wrote:
 ...
 As I already said in the above, the answer is no, when you are
 trying to find who moved the code from the original place.

 But I'm not trying to find who moved the code, I'm trying to find
 related commits; hence the name 'git related'.

 The person who moved the code will be on the list regardless,

That is exactly the point I have been trying to raise.  Does the
person appear in the list when you run blame with -CCC?  You ask for
the body of the function, and the -C mode of blame sees through the
block-of-line movement across file boundaries, and goes straight to
the last commit that touched the body of the function in its original
file, no?

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6] Add new git-related helper to contrib

2013-05-22 Thread Junio C Hamano
Junio C Hamano gits...@pobox.com writes:

 Felipe Contreras felipe.contre...@gmail.com writes:

 On Wed, May 22, 2013 at 5:38 PM, Junio C Hamano gits...@pobox.com wrote:
 ...
 As I already said in the above, the answer is no, when you are
 trying to find who moved the code from the original place.

 But I'm not trying to find who moved the code, I'm trying to find
 related commits; hence the name 'git related'.

 The person who moved the code will be on the list regardless,

 That is exactly the point I have been trying to raise.  Does the
 person appear in the list when you run blame with -CCC?  You ask for

s/person/commit/;

 the body of the function, and the -C mode of blame sees through the
 block-of-line movement across file boundaries, and goes straight to
 the last commit that touched the body of the function in its original
 file, no?
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6] Add new git-related helper to contrib

2013-05-22 Thread Felipe Contreras
On Wed, May 22, 2013 at 5:53 PM, Junio C Hamano gits...@pobox.com wrote:
 Felipe Contreras felipe.contre...@gmail.com writes:

 On Wed, May 22, 2013 at 5:38 PM, Junio C Hamano gits...@pobox.com wrote:
 ...
 As I already said in the above, the answer is no, when you are
 trying to find who moved the code from the original place.

 But I'm not trying to find who moved the code, I'm trying to find
 related commits; hence the name 'git related'.

 The person who moved the code will be on the list regardless,

 That is exactly the point I have been trying to raise.  Does the
 person appear in the list when you run blame with -CCC?  You ask for
 the body of the function, and the -C mode of blame sees through the
 block-of-line movement across file boundaries, and goes straight to
 the last commit that touched the body of the function in its original
 file, no?

I'm not familiar how the different -C options work, but I'm testing
right now and I see the commit that moved a file with both -C and
-CCC, but strangely enough, not if it's the previous commit (with
both).

-- 
Felipe Contreras
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6] Add new git-related helper to contrib

2013-05-22 Thread Junio C Hamano
Junio C Hamano gits...@pobox.com writes:

 The person who moved the code will be on the list regardless,

 That is exactly the point I have been trying to raise.  Does the
 person appear in the list when you run blame with -CCC?  You ask for

 s/person/commit/;

 the body of the function, and the -C mode of blame sees through the
 block-of-line movement across file boundaries, and goes straight to
 the last commit that touched the body of the function in its original
 file, no?

-- 8 --
cd /var/tmp/
git init blame
cd blame

cp /src/git/COPYING COPYING
git add COPYING
git commit -m initial

sed -ne 120,140p COPYING EXTRACTING
git add EXTRACTING
git commit -m second

git blame -C -C -C EXTRACTING
-- 8 --

will show that all lines from EXTRACTING came from the original
revision of COPYING, and we will miss the line-move event.

blame -C with a single -C does not look at the files that did not
change in the commit that added these lines to EXTRACTING
(i.e. COPYING), so the digging stops there.

After this, if you do this:

-- 8 --
echo COPYING
git commit --amend -a --no-edit

git blame -C EXTRACTING
-- 8 --

then the commit that did add these lines to EXTRACTING touched
COPYING, and the origin of these lines are traced to it (this is
designed to be useful in a typical refactor by moving code;
cut and paste without changing the original people need heavier
copy detection with more -C).

IIRC, git-gui runs two blames, one without any -C and one with (I do
not offhand recall how many -C it uses) to show both.

HTH.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6] Add new git-related helper to contrib

2013-05-22 Thread Junio C Hamano
Felipe Contreras felipe.contre...@gmail.com writes:

 IIRC, git-gui runs two blames, one without any -C and one with (I do
 not offhand recall how many -C it uses) to show both.

 'git blame' is a very expensive operation, perhaps we should add
 another option so users don't need to run two blames to find this.

Yes, you could add a struct origin *suspect_in_the_same_file next
to the current struct origin *suspect to the blame_entry in order
to represent the origin you would get if you were to stop at a move
across files event, and keep digging, when you are operating under
-C -C or higher mode.

That would make the result just as expensive as a single run of -C
-C (or higher mode) [*1*] without having to run an extra git
blame without any -C (even though that mode is much cheaper).

Representing the result for a terminal with reasonable width might
become unwieldy, but for --porcelain output format that should not
be a problem.


[Footnote]

*1* It would make it slightly more expensive than the current code,
as it is very likely that you have to split a single blame_entry
into many separate pieces.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6] Add new git-related helper to contrib

2013-05-22 Thread Felipe Contreras
This doesn't make any sense:

---
cd /tmp
rm -rf blame

git init blame
cd blame

cp ~/dev/git/COPYING COPYING
git add COPYING
git commit -m initial

sed -ne 120,140p COPYING EXTRACTING
git add EXTRACTING
git commit -m second

git log --oneline
git blame -C EXTRACTING

echo COPYING
git commit --amend -a --no-edit

git log --oneline
git blame -C EXTRACTING
---

Why would the first instance find the blame in commit #2, and the
second one at commit #1? If anything, the second instance should have
more trouble finding the original commit.

-- 
Felipe Contreras
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6] Add new git-related helper to contrib

2013-05-22 Thread Felipe Contreras
On Wed, May 22, 2013 at 10:23 PM, Felipe Contreras
felipe.contre...@gmail.com wrote:
 This doesn't make any sense:

Ah, never mind, it's COPYING the one being modified, not EXTRACTING.

-- 
Felipe Contreras
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6] Add new git-related helper to contrib

2013-05-22 Thread Felipe Contreras
On Wed, May 22, 2013 at 7:08 PM, Junio C Hamano gits...@pobox.com wrote:
 Felipe Contreras felipe.contre...@gmail.com writes:

 IIRC, git-gui runs two blames, one without any -C and one with (I do
 not offhand recall how many -C it uses) to show both.

 'git blame' is a very expensive operation, perhaps we should add
 another option so users don't need to run two blames to find this.

 Yes, you could add a struct origin *suspect_in_the_same_file next
 to the current struct origin *suspect to the blame_entry in order
 to represent the origin you would get if you were to stop at a move
 across files event, and keep digging, when you are operating under
 -C -C or higher mode.

No, this is what I meant:

--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -1509,16 +1509,6 @@ static void found_guilty_entry(struct blame_entry *ent)
if (ent-guilty)
return;
ent-guilty = 1;
-   if (incremental) {
-   struct origin *suspect = ent-suspect;
-
-   printf(%s %d %d %d\n,
-  sha1_to_hex(suspect-commit-object.sha1),
-  ent-s_lno + 1, ent-lno + 1, ent-num_lines);
-   emit_one_suspect_detail(suspect, 0);
-   write_filename_info(suspect-path);
-   maybe_flush_or_die(stdout, stdout);
-   }
 }

 /*
@@ -1536,12 +1526,24 @@ static void assign_blame(struct scoreboard *sb, int opt)
struct origin *suspect = NULL;

/* find one suspect to break down */
-   for (ent = sb-ent; !suspect  ent; ent = ent-next)
-   if (!ent-guilty)
+   for (ent = sb-ent; ent; ent = ent-next)
+   if (!ent-guilty) {
suspect = ent-suspect;
+   break;
+   }
+
if (!suspect)
return; /* all done */

+   if (incremental 
!is_null_sha1(suspect-commit-object.sha1)) {
+   printf(%s %d %d %d\n,
+
sha1_to_hex(suspect-commit-object.sha1),
+   ent-s_lno + 1, ent-lno + 1,
ent-num_lines);
+   emit_one_suspect_detail(suspect, 0);
+   write_filename_info(suspect-path);
+   maybe_flush_or_die(stdout, stdout);
+   }
+
/*
 * We will use this suspect later in the loop,
 * so hold onto it in the meantime.

-- 
Felipe Contreras
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6] Add new git-related helper to contrib

2013-05-22 Thread Felipe Contreras
On Wed, May 22, 2013 at 11:07 PM, Felipe Contreras
felipe.contre...@gmail.com wrote:
 On Wed, May 22, 2013 at 7:08 PM, Junio C Hamano gits...@pobox.com wrote:
 Felipe Contreras felipe.contre...@gmail.com writes:

 IIRC, git-gui runs two blames, one without any -C and one with (I do
 not offhand recall how many -C it uses) to show both.

 'git blame' is a very expensive operation, perhaps we should add
 another option so users don't need to run two blames to find this.

 Yes, you could add a struct origin *suspect_in_the_same_file next
 to the current struct origin *suspect to the blame_entry in order
 to represent the origin you would get if you were to stop at a move
 across files event, and keep digging, when you are operating under
 -C -C or higher mode.

 No, this is what I meant:

But that would not print the right line numbers, so perhaps:

--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -1500,25 +1500,16 @@ static int emit_one_suspect_detail(struct
origin *suspect, int repeat)
return 1;
 }

-/*
- * The blame_entry is found to be guilty for the range.  Mark it
- * as such, and show it in incremental output.
- */
-static void found_guilty_entry(struct blame_entry *ent)
+static void print_guilty_entry(struct blame_entry *ent)
 {
-   if (ent-guilty)
-   return;
-   ent-guilty = 1;
-   if (incremental) {
-   struct origin *suspect = ent-suspect;
-
-   printf(%s %d %d %d\n,
-  sha1_to_hex(suspect-commit-object.sha1),
-  ent-s_lno + 1, ent-lno + 1, ent-num_lines);
-   emit_one_suspect_detail(suspect, 0);
-   write_filename_info(suspect-path);
-   maybe_flush_or_die(stdout, stdout);
-   }
+   struct origin *suspect = ent-suspect;
+
+   printf(%s %d %d %d\n,
+   sha1_to_hex(suspect-commit-object.sha1),
+   ent-s_lno + 1, ent-lno + 1, ent-num_lines);
+   emit_one_suspect_detail(suspect, 0);
+   write_filename_info(suspect-path);
+   maybe_flush_or_die(stdout, stdout);
 }

 /*
@@ -1531,14 +1522,19 @@ static void assign_blame(struct scoreboard *sb, int opt)
struct rev_info *revs = sb-revs;

while (1) {
-   struct blame_entry *ent;
+   struct blame_entry *ent, tmp_ent;
struct commit *commit;
struct origin *suspect = NULL;
+   int found_guilty = 0;

/* find one suspect to break down */
-   for (ent = sb-ent; !suspect  ent; ent = ent-next)
-   if (!ent-guilty)
+   for (ent = sb-ent; ent; ent = ent-next)
+   if (!ent-guilty) {
+   tmp_ent = *ent;
suspect = ent-suspect;
+   break;
+   }
+
if (!suspect)
return; /* all done */

@@ -1564,9 +1560,20 @@ static void assign_blame(struct scoreboard *sb, int opt)
commit-object.flags |= UNINTERESTING;

/* Take responsibility for the remaining entries */
-   for (ent = sb-ent; ent; ent = ent-next)
-   if (same_suspect(ent-suspect, suspect))
-   found_guilty_entry(ent);
+   for (ent = sb-ent; ent; ent = ent-next) {
+   if (same_suspect(ent-suspect, suspect)) {
+   if (ent-guilty)
+   continue;
+   found_guilty = ent-guilty = 1;
+   if (incremental)
+   print_guilty_entry(ent);
+   }
+   }
+
+   if (incremental  !found_guilty 
+   !is_null_sha1(suspect-commit-object.sha1))
+   print_guilty_entry(tmp_ent);
+
origin_decref(suspect);

if (DEBUG) /* sanity */

-- 
Felipe Contreras
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v6] Add new git-related helper to contrib

2013-05-19 Thread Felipe Contreras
This script find people that might be interested in a patch, by going
back through the history for each single hunk modified, and finding
people that reviewed, acknowledge, signed, or authored the code the
patch is modifying.

It does this by running 'git blame' incrementally on each hunk, and then
parsing the commit message. After gathering all the relevant people, it
groups them to show what exactly was their role when the participated in
the development of the relevant commit, and on how many relevant commits
they participated. They are only displayed if they pass a minimum
threshold of participation.

For example:

  % git related 0001-remote-hg-trivial-cleanups.patch
  Felipe Contreras felipe.contre...@gmail.com
  Jeff King p...@peff.net
  Max Horn m...@quendi.de
  Junio C Hamano gits...@pobox.com

Thus it can be used for 'git send-email' as a cc-cmd.

There might be some other related functions to this script, not just to
be used as a cc-cmd.

Comments-by: Ramkumar Ramachandra artag...@gmail.com
Signed-off-by: Felipe Contreras felipe.contre...@gmail.com
---

Sames as v5, with a few tiny modifications. I'm tired of sending the whole
series multiple times over the years, only to get stuck at the first patch, so
I'll send only the first one.

 contrib/related/git-related | 124 
 1 file changed, 124 insertions(+)
 create mode 100755 contrib/related/git-related

diff --git a/contrib/related/git-related b/contrib/related/git-related
new file mode 100755
index 000..b96dcdd
--- /dev/null
+++ b/contrib/related/git-related
@@ -0,0 +1,124 @@
+#!/usr/bin/env ruby
+
+# This script finds people that might be interested in a patch
+# usage: git related file
+
+$since = '5-years-ago'
+$min_percent = 10
+
+def fmt_person(name, email)
+  '%s %s' % [name, email]
+end
+
+class Commit
+
+  attr_reader :persons
+
+  def initialize(id)
+@id = id
+@persons = []
+  end
+
+  def parse(data)
+msg = nil
+data.each_line do |line|
+  if not msg
+case line
+when /^author ([^]+) (\S+) (.+)$/
+  @persons  fmt_person($1, $2)
+when /^$/
+  msg = true
+end
+  else
+if line =~ /^(Signed-off|Reviewed|Acked)-by: ([^]+) (\S+?)$/
+  @persons  fmt_person($2, $3)
+end
+  end
+end
+@persons.uniq!
+  end
+
+end
+
+class Commits
+
+  def initialize
+@items = {}
+  end
+
+  def size
+@items.size
+  end
+
+  def each(block)
+@items.each(block)
+  end
+
+  def import
+return if @items.empty?
+File.popen(%w[git cat-file --batch], 'r+') do |p|
+  p.write(@items.keys.join(\n))
+  p.close_write
+  p.each do |line|
+if line =~ /^(\h{40}) commit (\d+)/
+  id, len = $1, $2
+  data = p.read($2.to_i)
+  @items[id].parse(data)
+end
+  end
+end
+  end
+
+  def get_blame(source, start, len, from)
+return if len == 0
+len ||= 1
+File.popen(['git', 'blame', '--incremental', '-CCC',
+   '-L', '%u,+%u' % [start, len],
+   '--since', $since, from + '^',
+   '--', source]) do |p|
+  p.each do |line|
+if line =~ /^(\h{40})/
+  id = $
+  @items[id] = Commit.new(id)
+end
+  end
+end
+  end
+
+  def from_patch(file)
+from = source = nil
+File.open(file) do |f|
+  f.each do |line|
+case line
+when /^From (\h+) (.+)$/
+  from = $1
+when /^---\s+(\S+)/
+  source = $1 != '/dev/null' ? $1[2..-1] : nil
+when /^@@ -(\d+)(?:,(\d+))?/
+  get_blame(source, $1, $2, from)
+end
+  end
+end
+  end
+
+end
+
+exit 1 if ARGV.size != 1
+
+commits = Commits.new
+commits.from_patch(ARGV[0])
+commits.import
+
+count_per_person = Hash.new(0)
+
+commits.each do |id, commit|
+  commit.persons.each do |person|
+count_per_person[person] += 1
+  end
+end
+
+count_per_person.each do |person, count|
+  percent = count.to_f * 100 / commits.size
+  next if percent  $min_percent
+  puts person
+end
-- 
1.8.3.rc3.286.g3d43083

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html