Re: misleading diff-hunk header

2012-08-26 Thread Stefano Lattarini
On 08/25/2012 02:56 PM, Tim Chase wrote:
 On 08/24/12 23:29, Junio C Hamano wrote:
 Tim Chase g...@tim.thechases.com writes:
 If the documented purpose of diff -p (and by proxy
 diff.{type}.xfuncname) is to show the name of the *function*
 containing the changed lines,

 Yeah, the documentation is misleading, but I do not offhand think of
 a better phrasing. Perhaps you could send in a patch to improve it.

 How does GNU manual explain the option?
 
 Tersely. :-)
 
-p  --show-c-function
   Show which C function each change is in.

That's in the manpage, which is basically just a copy of the output from
diff --help.  In the texinfo manual (which is the real documentation),
there are additional explanations, saying, among other things:

To show in which functions differences occur for C and similar languages,
you can use the --show-c-function (-p) option. This option automatically
defaults to the context output format (see Context Format), with the
default number of lines of context. You can override that number with
-C lines elsewhere in the command line. You can override both the format
and the number with -U lines elsewhere in the command line.
The -p option is equivalent to -F '^[[:alpha:]$_]' if the unified format
is specified, otherwise -c -F '^[[:alpha:]$_]' (see Specified Headings).
GNU diff provides this option for the sake of convenience.
...
The --show-function-line (-F) option finds the nearest unchanged line
that precedes each hunk of differences and matches the given regular
expression.

You can find more information in the on-line documentation:

  http://www.gnu.org/software/diffutils/manual/diffutils.html

HTH,
  Stefano
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: misleading diff-hunk header

2012-08-26 Thread Junio C Hamano
Stefano Lattarini stefano.lattar...@gmail.com writes:

 On 08/25/2012 02:56 PM, Tim Chase wrote:
 On 08/24/12 23:29, Junio C Hamano wrote:
 Tim Chase g...@tim.thechases.com writes:
 If the documented purpose of diff -p (and by proxy
 diff.{type}.xfuncname) is to show the name of the *function*
 containing the changed lines,

 Yeah, the documentation is misleading, but I do not offhand think of
 a better phrasing. Perhaps you could send in a patch to improve it.

 How does GNU manual explain the option?
 
 Tersely. :-)
 
-p  --show-c-function
   Show which C function each change is in.

 That's in the manpage, which is basically just a copy of the output from
 diff --help.  In the texinfo manual (which is the real documentation),
 there are additional explanations, saying, among other things:

 To show in which functions differences occur for C and similar languages,
 you can use the --show-c-function (-p) option. This option automatically
 defaults to the context output format (see Context Format), with the
 default number of lines of context. You can override that number with
 -C lines elsewhere in the command line. You can override both the format
 and the number with -U lines elsewhere in the command line.
 The -p option is equivalent to -F '^[[:alpha:]$_]' if the unified format
 is specified, otherwise -c -F '^[[:alpha:]$_]' (see Specified Headings).
 GNU diff provides this option for the sake of convenience.
 ...
 The --show-function-line (-F) option finds the nearest unchanged line
 that precedes each hunk of differences and matches the given regular
 expression.

So in short, if we say Show which function each change is in in
the documentation, that is consistent with what GNU does and that is
described consistently with what GNU says, modulo that we obviously
do more than C via the diff.driver.xfuncname mechanism.

We already document diff.driver.xfuncname as determining the hunk
header, and the documentation that is referred to (i.e. gitattributes)
shows the shape of a hunk in the diff output:

@@ -k,l +n,m @@ TEXT

This is called a 'hunk header'.  The TEXT portion is by default a line
that begins with an alphabet, an underscore or a dollar sign; this
matches what GNU 'diff -p' output uses.

and then later says:

Then, you would define a diff.tex.xfuncname configuration to
specify a regular expression that matches a line that you would
want to appear as the hunk header TEXT.

Honestly, I do not offhand see an obvious and possible room for vast
improvement over what we already have, so my RFH to Tim that appears
at the beginning of what you quoted from my previous message still
stands ;-).

Thanks.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: misleading diff-hunk header

2012-08-24 Thread Jeff King
On Tue, Aug 21, 2012 at 10:52:03AM -0700, Junio C Hamano wrote:

  diff.{type}.xfuncname seems to start searching backwards in
  from the beginning of the hunk, not the first differing line.
  [...]
@@ -4,4 +4,5 @@ int call_me(int maybe)
 
 int main()
 {
+  return 0;
 }
 
  misleadingly suggesting that the change occurred in the call_me()
  function, rather than in main()
  
  I think that's intentional, and matches what 'diff -p' does.  It gives
  you the context before the hunk.  After all, if a new function starts in
  the leading context lines, you can see that in the usual diff data.
 
 Correct.  It is about give the user _more_ hint/clue on the context
 of the hunk, in addition to what the user can see in the
 pre-context of the hunk, so it is pointless to hoist int main()
 there.

I don't think it is pointless. If you are skimming a diff, then the hunk
headers stand out to easily show which functions were touched. Of
course, as you mentioned later in your email, it is not an exhaustive
list, and I think for Tim's use case, he needs to actually read and
parse the whole patch.

But mentioning call_me here _is_ pointless, because it is not relevant
context at all (it was not modified; it just happens to be located near
the code in question).  So I would argue that showing main() is more
useful to a reader.

It gets even more obvious as you increase the context. Imagine I have
code like this:

   int foo(void)
   {
  return 1;
   }

   int bar(void)
   {
  return 2;
   }

   int baz(void)
   {
  return 3;
   }

and I modify baz to return 4 instead. With the regular diff
settings, the hunk header would claim that bar() is the context in the
hunk header. But if I ask for -U7, then foo() is mentioned in the hunk
header. To me, that doesn't make sense; the modification is exactly the
same, so why would the hunk header differ?

I suppose one could argue that the hunk header is not showing the
context of the change, but rather the context of the surrounding context
lines. But that doesn't seem useful to me.

We discussed this a while ago and you did a how about this patch:

  http://article.gmane.org/gmane.comp.version-control.git/181385

Gmane seems to be acting up this morning, so here is the patch (and your
comment) for reference:

 Would this be sufficient?  Instead of looking for the first line that
 matches the beginning pattern going backwards starting from one line
 before the displayed context, we start our examination at the first line
 shown in the context.
 
  xdiff/xemit.c |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)
 
 diff --git a/xdiff/xemit.c b/xdiff/xemit.c
 index 277e2ee..5f9c0e0 100644
 --- a/xdiff/xemit.c
 +++ b/xdiff/xemit.c
 @@ -131,7 +131,7 @@ int xdl_emit_diff(xdfenv_t *xe, xdchange_t *xscr, 
 xdemitcb_t *ecb,
  
   if (xecfg-flags  XDL_EMIT_FUNCNAMES) {
   long l;
 - for (l = s1 - 1; l = 0  l  funclineprev; l--) {
 + for (l = s1; l = 0  l  funclineprev; l--) {
   const char *rec;
   long reclen = xdl_get_rec(xe-xdf1, l, rec);
   long newfunclen = ff(rec, reclen, funcbuf,

In the case we were discussing then, the modified function started on
the first line of context. But as Tim's example shows, it doesn't
necessarily have to. I think it would make more sense to start counting
from the first modified line.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: misleading diff-hunk header

2012-08-24 Thread Tim Chase
On 08/24/12 09:29, Jeff King wrote:
 On Tue, Aug 21, 2012 at 10:52:03AM -0700, Junio C Hamano wrote:
 
 diff.{type}.xfuncname seems to start searching backwards in
 from the beginning of the hunk, not the first differing line.
 [...]
   @@ -4,4 +4,5 @@ int call_me(int maybe)

int main()
{
   +  return 0;
}

 misleadingly suggesting that the change occurred in the call_me()
 function, rather than in main()

 I think that's intentional, and matches what 'diff -p' does. 
...
  xdiff/xemit.c |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)

 diff --git a/xdiff/xemit.c b/xdiff/xemit.c
 index 277e2ee..5f9c0e0 100644
 --- a/xdiff/xemit.c
 +++ b/xdiff/xemit.c
 @@ -131,7 +131,7 @@ int xdl_emit_diff(xdfenv_t *xe, xdchange_t *xscr, 
 xdemitcb_t *ecb,
  
  if (xecfg-flags  XDL_EMIT_FUNCNAMES) {
  long l;
 -for (l = s1 - 1; l = 0  l  funclineprev; l--) {
 +for (l = s1; l = 0  l  funclineprev; l--) {
  const char *rec;
  long reclen = xdl_get_rec(xe-xdf1, l, rec);
  long newfunclen = ff(rec, reclen, funcbuf,
 
 In the case we were discussing then, the modified function started on
 the first line of context. But as Tim's example shows, it doesn't
 necessarily have to. I think it would make more sense to start counting
 from the first modified line.

Junio mentions that it matches the diff -p output, though I'd
consider that a bug in diff as well, since the diff(1) man/info
pages state -p  Show which C function each change is in.  In the
above (both with diff -p and with git), the change was clearly in
main() but it's not showing main().  Documented behavior and
implemented behavior conflict.

Starting at the first differing line rather than the first line of
context in the hunk would ameliorate this.  It doesn't address what
happens if multiple functions were changed in the same hunk, but at
least it becomes correct for the first one.  More complex code might
be doable to split hunks if an xfuncname match occurs between two
disjoint changes in the same hunk.  But for my purposes here, the
above should suffice.

-tkc



--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: misleading diff-hunk header

2012-08-24 Thread Jeff King
On Fri, Aug 24, 2012 at 10:29:09AM -0400, Jeff King wrote:

  Would this be sufficient?  Instead of looking for the first line that
  matches the beginning pattern going backwards starting from one line
  before the displayed context, we start our examination at the first line
  shown in the context.
  
   xdiff/xemit.c |2 +-
   1 files changed, 1 insertions(+), 1 deletions(-)
  
  diff --git a/xdiff/xemit.c b/xdiff/xemit.c
  index 277e2ee..5f9c0e0 100644
  --- a/xdiff/xemit.c
  +++ b/xdiff/xemit.c
  @@ -131,7 +131,7 @@ int xdl_emit_diff(xdfenv_t *xe, xdchange_t *xscr, 
  xdemitcb_t *ecb,
   
  if (xecfg-flags  XDL_EMIT_FUNCNAMES) {
  long l;
  -   for (l = s1 - 1; l = 0  l  funclineprev; l--) {
  +   for (l = s1; l = 0  l  funclineprev; l--) {
  const char *rec;
  long reclen = xdl_get_rec(xe-xdf1, l, rec);
  long newfunclen = ff(rec, reclen, funcbuf,
 
 In the case we were discussing then, the modified function started on
 the first line of context. But as Tim's example shows, it doesn't
 necessarily have to. I think it would make more sense to start counting
 from the first modified line.

That patch would look something like this:

diff --git a/xdiff/xemit.c b/xdiff/xemit.c
index d11dbf9..441ecf5 100644
--- a/xdiff/xemit.c
+++ b/xdiff/xemit.c
@@ -194,7 +194,7 @@ int xdl_emit_diff(xdfenv_t *xe, xdchange_t *xscr, 
xdemitcb_t *ecb,
 
if (xecfg-flags  XDL_EMIT_FUNCNAMES) {
get_func_line(xe, xecfg, func_line,
- s1 - 1, funclineprev);
+ xche-i1 - 1, funclineprev);
funclineprev = s1 - 1;
}
if (xdl_emit_hunk_hdr(s1 + 1, e1 - s1, s2 + 1, e2 - s2,

Note that this breaks a ton of tests. Some of them are just noise (e.g.,
t4042 changes line 2, so line 1 is the top of the context; before, we
would show no hunk header, since we were at the top of the file, but now
we will show line 1). Some of them are improved in the way that this
patch intends (e.g., t4051).

But some I'm not sure of. For instance, the failure in t4018.38 is odd.
I think it's because the pattern it is looking for is a somewhat odd toy
example (it's looking for a line with s in it, so naturally when we
shift the start-point of our search, we are likely to find some other
false positive). But it raises an interesting point: what if the pattern
is just looking for lines in a list, and not an enclosing function?

For example, imagine you have a file a list of items, one per line.
With the old code, you'd get:

diff --git a/old b/new
index f384549..1066a25 100644
--- a/old
+++ b/new
@@ -2,3 +2,3 @@ one
 two
-three
+three -- modified
 four

So the hunk header is showing you something useful; the element just
above your context. But with my patch, you'd see:

diff --git a/old b/new
index f384549..1066a25 100644
--- a/old
+++ b/new
@@ -2,3 +2,3 @@ two
 two
-three
+three -- modified
 four

I.e., it shows the element just before the change, which is already in
the context anyway. So it's actually less useful. Although note that the
current behavior is not all that useful, either; it is not really giving
you any information about the change, but rather just showing one extra
line of context.

So I would say that which you would prefer might depend on exactly what
you are diffing. But I would also argue that in any case where the new
code produces a worse result, the hunk header was not all that useful to
begin with.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: misleading diff-hunk header

2012-08-24 Thread Tim Chase
On 08/24/12 11:44, Jeff King wrote:
 With the old code, you'd get:
 
   diff --git a/old b/new
   index f384549..1066a25 100644
   --- a/old
   +++ b/new
   @@ -2,3 +2,3 @@ one
two
   -three
   +three -- modified
four
 
 So the hunk header is showing you something useful; the element just
 above your context. But with my patch, you'd see:
 
   diff --git a/old b/new
   index f384549..1066a25 100644
   --- a/old
   +++ b/new
   @@ -2,3 +2,3 @@ two
two
   -three
   +three -- modified
four
 
 I.e., it shows the element just before the change, which is already in
 the context anyway. So it's actually less useful. Although note that the
 current behavior is not all that useful, either; it is not really giving
 you any information about the change, but rather just showing one extra
 line of context.
 
 So I would say that which you would prefer might depend on exactly what
 you are diffing. But I would also argue that in any case where the new
 code produces a worse result, the hunk header was not all that useful to
 begin with.

If the documented purpose of diff -p (and by proxy
diff.{type}.xfuncname) is to show the name of the *function*
containing the changed lines, and all you have is a list of lines
with no function names, it's pretty arbitrary to call either
behavior worse. :-)

-tkc


--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: misleading diff-hunk header

2012-08-24 Thread Junio C Hamano
Tim Chase g...@tim.thechases.com writes:

 If the documented purpose of diff -p (and by proxy
 diff.{type}.xfuncname) is to show the name of the *function*
 containing the changed lines,

Yeah, the documentation is misleading, but I do not offhand think of
a better phrasing. Perhaps you could send in a patch to improve it.

How does GNU manual explain the option?
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


misleading diff-hunk header

2012-08-21 Thread Tim Chase
[posted originally to git-users@ but advised this would be a better
forum]

diff.{type}.xfuncname seems to start searching backwards in
from the beginning of the hunk, not the first differing line.

To reproduce:

  $ mkdir tmp
  $ cd tmp
  $ git init
  $ cat  foo.c EOF
  int call_me(int maybe)
  {
  }

  int main()
  {
  }
  EOF
  $ git add foo.c
  $ git commit -m Initial checkin
  $ ed foo.c
  # main() should return 0
  $i
return 0;
  .
  wq
  $ git diff

The diff returns a header line of

  @@ -4,4 +4,5 @@ int call_me(int maybe)

   int main()
   {
  +  return 0;
   }

misleadingly suggesting that the change occurred in the call_me()
function, rather than in main()

I'm running 1.7.2.5 from Debian Stable if that makes a difference.

Is this expected/proper behavior?

-tkc

FWIW, I stumbled across this tinkering with
diff.{typename}.xfuncname detailed in gitattributes(5)
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


misleading diff-hunk header

2012-08-21 Thread Tim Chase
[posted originally to git-users@ but advised this would be a better
forum]

diff.{type}.xfuncname seems to start searching backwards in
from the beginning of the hunk, not the first differing line.

To reproduce:

  $ mkdir tmp
  $ cd tmp
  $ git init
  $ cat  foo.c EOF
  int call_me(int maybe)
  {
  }

  int main()
  {
  }
  EOF
  $ git add foo.c
  $ git commit -m Initial checkin
  $ ed foo.c
  # main() should return 0
  $i
return 0;
  .
  wq
  $ git diff

The diff returns a header line of

  @@ -4,4 +4,5 @@ int call_me(int maybe)

   int main()
   {
  +  return 0;
   }

misleadingly suggesting that the change occurred in the call_me()
function, rather than in main()

I'm running 1.7.2.5 from Debian Stable if that makes a difference.

Is this expected/proper behavior?

-tkc

FWIW, I stumbled across this tinkering with
diff.{typename}.xfuncname detailed in gitattributes(5)
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: misleading diff-hunk header

2012-08-21 Thread Thomas Rast
Tim Chase g...@tim.thechases.com writes:

 diff.{type}.xfuncname seems to start searching backwards in
 from the beginning of the hunk, not the first differing line.
[...]
   @@ -4,4 +4,5 @@ int call_me(int maybe)

int main()
{
   +  return 0;
}

 misleadingly suggesting that the change occurred in the call_me()
 function, rather than in main()

I think that's intentional, and matches what 'diff -p' does.  It gives
you the context before the hunk.  After all, if a new function starts in
the leading context lines, you can see that in the usual diff data.

-- 
Thomas Rast
trast@{inf,student}.ethz.ch
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: misleading diff-hunk header

2012-08-21 Thread Tim Chase
On 08/21/12 10:22, Thomas Rast wrote:
 Tim Chase g...@tim.thechases.com writes:
 
 diff.{type}.xfuncname seems to start searching backwards in
 from the beginning of the hunk, not the first differing line.
 [...]
   @@ -4,4 +4,5 @@ int call_me(int maybe)

int main()
{
   +  return 0;
}

 misleadingly suggesting that the change occurred in the call_me()
 function, rather than in main()
 
 I think that's intentional, and matches what 'diff -p' does.  It gives
 you the context before the hunk.  After all, if a new function starts in
 the leading context lines, you can see that in the usual diff data.

Okay...I tested diff -p and can't argue (much) with historical
adherence.  It just makes it hard for me to gather some stats on the
functions that changed, and requires that I look in more than one
place (both in the header, and in the leading context) rather than
having a single authoritative place to grep.

Then again, diff -p only seems to support C functions, while git
supports bibtex, cpp, html, java, objc, pascal, php, python, ruby,
and tex out-of-the-box, with the option to build your own
function-finder, so pure adherence to history gets a little muddied.

Thanks for your thoughts,

-tkc


--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: misleading diff-hunk header

2012-08-21 Thread Johannes Sixt
Am 21.08.2012 17:42, schrieb Tim Chase:
 On 08/21/12 10:22, Thomas Rast wrote:
 misleadingly suggesting that the change occurred in the call_me()
 function, rather than in main()

 I think that's intentional, and matches what 'diff -p' does...
 
 Okay...I tested diff -p and can't argue (much) with historical
 adherence.  It just makes it hard for me to gather some stats on the
 functions that changed, and requires that I look in more than one
 place (both in the header, and in the leading context) rather than
 having a single authoritative place to grep.

If it's only for stats, why not just remove the context with -U0?

-- Hannes

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html