Re: misleading diff-hunk header
On 08/25/2012 02:56 PM, Tim Chase wrote: On 08/24/12 23:29, Junio C Hamano wrote: Tim Chase g...@tim.thechases.com writes: If the documented purpose of diff -p (and by proxy diff.{type}.xfuncname) is to show the name of the *function* containing the changed lines, Yeah, the documentation is misleading, but I do not offhand think of a better phrasing. Perhaps you could send in a patch to improve it. How does GNU manual explain the option? Tersely. :-) -p --show-c-function Show which C function each change is in. That's in the manpage, which is basically just a copy of the output from diff --help. In the texinfo manual (which is the real documentation), there are additional explanations, saying, among other things: To show in which functions differences occur for C and similar languages, you can use the --show-c-function (-p) option. This option automatically defaults to the context output format (see Context Format), with the default number of lines of context. You can override that number with -C lines elsewhere in the command line. You can override both the format and the number with -U lines elsewhere in the command line. The -p option is equivalent to -F '^[[:alpha:]$_]' if the unified format is specified, otherwise -c -F '^[[:alpha:]$_]' (see Specified Headings). GNU diff provides this option for the sake of convenience. ... The --show-function-line (-F) option finds the nearest unchanged line that precedes each hunk of differences and matches the given regular expression. You can find more information in the on-line documentation: http://www.gnu.org/software/diffutils/manual/diffutils.html HTH, Stefano -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: misleading diff-hunk header
Stefano Lattarini stefano.lattar...@gmail.com writes: On 08/25/2012 02:56 PM, Tim Chase wrote: On 08/24/12 23:29, Junio C Hamano wrote: Tim Chase g...@tim.thechases.com writes: If the documented purpose of diff -p (and by proxy diff.{type}.xfuncname) is to show the name of the *function* containing the changed lines, Yeah, the documentation is misleading, but I do not offhand think of a better phrasing. Perhaps you could send in a patch to improve it. How does GNU manual explain the option? Tersely. :-) -p --show-c-function Show which C function each change is in. That's in the manpage, which is basically just a copy of the output from diff --help. In the texinfo manual (which is the real documentation), there are additional explanations, saying, among other things: To show in which functions differences occur for C and similar languages, you can use the --show-c-function (-p) option. This option automatically defaults to the context output format (see Context Format), with the default number of lines of context. You can override that number with -C lines elsewhere in the command line. You can override both the format and the number with -U lines elsewhere in the command line. The -p option is equivalent to -F '^[[:alpha:]$_]' if the unified format is specified, otherwise -c -F '^[[:alpha:]$_]' (see Specified Headings). GNU diff provides this option for the sake of convenience. ... The --show-function-line (-F) option finds the nearest unchanged line that precedes each hunk of differences and matches the given regular expression. So in short, if we say Show which function each change is in in the documentation, that is consistent with what GNU does and that is described consistently with what GNU says, modulo that we obviously do more than C via the diff.driver.xfuncname mechanism. We already document diff.driver.xfuncname as determining the hunk header, and the documentation that is referred to (i.e. gitattributes) shows the shape of a hunk in the diff output: @@ -k,l +n,m @@ TEXT This is called a 'hunk header'. The TEXT portion is by default a line that begins with an alphabet, an underscore or a dollar sign; this matches what GNU 'diff -p' output uses. and then later says: Then, you would define a diff.tex.xfuncname configuration to specify a regular expression that matches a line that you would want to appear as the hunk header TEXT. Honestly, I do not offhand see an obvious and possible room for vast improvement over what we already have, so my RFH to Tim that appears at the beginning of what you quoted from my previous message still stands ;-). Thanks. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: misleading diff-hunk header
On Tue, Aug 21, 2012 at 10:52:03AM -0700, Junio C Hamano wrote: diff.{type}.xfuncname seems to start searching backwards in from the beginning of the hunk, not the first differing line. [...] @@ -4,4 +4,5 @@ int call_me(int maybe) int main() { + return 0; } misleadingly suggesting that the change occurred in the call_me() function, rather than in main() I think that's intentional, and matches what 'diff -p' does. It gives you the context before the hunk. After all, if a new function starts in the leading context lines, you can see that in the usual diff data. Correct. It is about give the user _more_ hint/clue on the context of the hunk, in addition to what the user can see in the pre-context of the hunk, so it is pointless to hoist int main() there. I don't think it is pointless. If you are skimming a diff, then the hunk headers stand out to easily show which functions were touched. Of course, as you mentioned later in your email, it is not an exhaustive list, and I think for Tim's use case, he needs to actually read and parse the whole patch. But mentioning call_me here _is_ pointless, because it is not relevant context at all (it was not modified; it just happens to be located near the code in question). So I would argue that showing main() is more useful to a reader. It gets even more obvious as you increase the context. Imagine I have code like this: int foo(void) { return 1; } int bar(void) { return 2; } int baz(void) { return 3; } and I modify baz to return 4 instead. With the regular diff settings, the hunk header would claim that bar() is the context in the hunk header. But if I ask for -U7, then foo() is mentioned in the hunk header. To me, that doesn't make sense; the modification is exactly the same, so why would the hunk header differ? I suppose one could argue that the hunk header is not showing the context of the change, but rather the context of the surrounding context lines. But that doesn't seem useful to me. We discussed this a while ago and you did a how about this patch: http://article.gmane.org/gmane.comp.version-control.git/181385 Gmane seems to be acting up this morning, so here is the patch (and your comment) for reference: Would this be sufficient? Instead of looking for the first line that matches the beginning pattern going backwards starting from one line before the displayed context, we start our examination at the first line shown in the context. xdiff/xemit.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/xdiff/xemit.c b/xdiff/xemit.c index 277e2ee..5f9c0e0 100644 --- a/xdiff/xemit.c +++ b/xdiff/xemit.c @@ -131,7 +131,7 @@ int xdl_emit_diff(xdfenv_t *xe, xdchange_t *xscr, xdemitcb_t *ecb, if (xecfg-flags XDL_EMIT_FUNCNAMES) { long l; - for (l = s1 - 1; l = 0 l funclineprev; l--) { + for (l = s1; l = 0 l funclineprev; l--) { const char *rec; long reclen = xdl_get_rec(xe-xdf1, l, rec); long newfunclen = ff(rec, reclen, funcbuf, In the case we were discussing then, the modified function started on the first line of context. But as Tim's example shows, it doesn't necessarily have to. I think it would make more sense to start counting from the first modified line. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: misleading diff-hunk header
On 08/24/12 09:29, Jeff King wrote: On Tue, Aug 21, 2012 at 10:52:03AM -0700, Junio C Hamano wrote: diff.{type}.xfuncname seems to start searching backwards in from the beginning of the hunk, not the first differing line. [...] @@ -4,4 +4,5 @@ int call_me(int maybe) int main() { + return 0; } misleadingly suggesting that the change occurred in the call_me() function, rather than in main() I think that's intentional, and matches what 'diff -p' does. ... xdiff/xemit.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/xdiff/xemit.c b/xdiff/xemit.c index 277e2ee..5f9c0e0 100644 --- a/xdiff/xemit.c +++ b/xdiff/xemit.c @@ -131,7 +131,7 @@ int xdl_emit_diff(xdfenv_t *xe, xdchange_t *xscr, xdemitcb_t *ecb, if (xecfg-flags XDL_EMIT_FUNCNAMES) { long l; -for (l = s1 - 1; l = 0 l funclineprev; l--) { +for (l = s1; l = 0 l funclineprev; l--) { const char *rec; long reclen = xdl_get_rec(xe-xdf1, l, rec); long newfunclen = ff(rec, reclen, funcbuf, In the case we were discussing then, the modified function started on the first line of context. But as Tim's example shows, it doesn't necessarily have to. I think it would make more sense to start counting from the first modified line. Junio mentions that it matches the diff -p output, though I'd consider that a bug in diff as well, since the diff(1) man/info pages state -p Show which C function each change is in. In the above (both with diff -p and with git), the change was clearly in main() but it's not showing main(). Documented behavior and implemented behavior conflict. Starting at the first differing line rather than the first line of context in the hunk would ameliorate this. It doesn't address what happens if multiple functions were changed in the same hunk, but at least it becomes correct for the first one. More complex code might be doable to split hunks if an xfuncname match occurs between two disjoint changes in the same hunk. But for my purposes here, the above should suffice. -tkc -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: misleading diff-hunk header
On Fri, Aug 24, 2012 at 10:29:09AM -0400, Jeff King wrote: Would this be sufficient? Instead of looking for the first line that matches the beginning pattern going backwards starting from one line before the displayed context, we start our examination at the first line shown in the context. xdiff/xemit.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/xdiff/xemit.c b/xdiff/xemit.c index 277e2ee..5f9c0e0 100644 --- a/xdiff/xemit.c +++ b/xdiff/xemit.c @@ -131,7 +131,7 @@ int xdl_emit_diff(xdfenv_t *xe, xdchange_t *xscr, xdemitcb_t *ecb, if (xecfg-flags XDL_EMIT_FUNCNAMES) { long l; - for (l = s1 - 1; l = 0 l funclineprev; l--) { + for (l = s1; l = 0 l funclineprev; l--) { const char *rec; long reclen = xdl_get_rec(xe-xdf1, l, rec); long newfunclen = ff(rec, reclen, funcbuf, In the case we were discussing then, the modified function started on the first line of context. But as Tim's example shows, it doesn't necessarily have to. I think it would make more sense to start counting from the first modified line. That patch would look something like this: diff --git a/xdiff/xemit.c b/xdiff/xemit.c index d11dbf9..441ecf5 100644 --- a/xdiff/xemit.c +++ b/xdiff/xemit.c @@ -194,7 +194,7 @@ int xdl_emit_diff(xdfenv_t *xe, xdchange_t *xscr, xdemitcb_t *ecb, if (xecfg-flags XDL_EMIT_FUNCNAMES) { get_func_line(xe, xecfg, func_line, - s1 - 1, funclineprev); + xche-i1 - 1, funclineprev); funclineprev = s1 - 1; } if (xdl_emit_hunk_hdr(s1 + 1, e1 - s1, s2 + 1, e2 - s2, Note that this breaks a ton of tests. Some of them are just noise (e.g., t4042 changes line 2, so line 1 is the top of the context; before, we would show no hunk header, since we were at the top of the file, but now we will show line 1). Some of them are improved in the way that this patch intends (e.g., t4051). But some I'm not sure of. For instance, the failure in t4018.38 is odd. I think it's because the pattern it is looking for is a somewhat odd toy example (it's looking for a line with s in it, so naturally when we shift the start-point of our search, we are likely to find some other false positive). But it raises an interesting point: what if the pattern is just looking for lines in a list, and not an enclosing function? For example, imagine you have a file a list of items, one per line. With the old code, you'd get: diff --git a/old b/new index f384549..1066a25 100644 --- a/old +++ b/new @@ -2,3 +2,3 @@ one two -three +three -- modified four So the hunk header is showing you something useful; the element just above your context. But with my patch, you'd see: diff --git a/old b/new index f384549..1066a25 100644 --- a/old +++ b/new @@ -2,3 +2,3 @@ two two -three +three -- modified four I.e., it shows the element just before the change, which is already in the context anyway. So it's actually less useful. Although note that the current behavior is not all that useful, either; it is not really giving you any information about the change, but rather just showing one extra line of context. So I would say that which you would prefer might depend on exactly what you are diffing. But I would also argue that in any case where the new code produces a worse result, the hunk header was not all that useful to begin with. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: misleading diff-hunk header
On 08/24/12 11:44, Jeff King wrote: With the old code, you'd get: diff --git a/old b/new index f384549..1066a25 100644 --- a/old +++ b/new @@ -2,3 +2,3 @@ one two -three +three -- modified four So the hunk header is showing you something useful; the element just above your context. But with my patch, you'd see: diff --git a/old b/new index f384549..1066a25 100644 --- a/old +++ b/new @@ -2,3 +2,3 @@ two two -three +three -- modified four I.e., it shows the element just before the change, which is already in the context anyway. So it's actually less useful. Although note that the current behavior is not all that useful, either; it is not really giving you any information about the change, but rather just showing one extra line of context. So I would say that which you would prefer might depend on exactly what you are diffing. But I would also argue that in any case where the new code produces a worse result, the hunk header was not all that useful to begin with. If the documented purpose of diff -p (and by proxy diff.{type}.xfuncname) is to show the name of the *function* containing the changed lines, and all you have is a list of lines with no function names, it's pretty arbitrary to call either behavior worse. :-) -tkc -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: misleading diff-hunk header
Tim Chase g...@tim.thechases.com writes: If the documented purpose of diff -p (and by proxy diff.{type}.xfuncname) is to show the name of the *function* containing the changed lines, Yeah, the documentation is misleading, but I do not offhand think of a better phrasing. Perhaps you could send in a patch to improve it. How does GNU manual explain the option? -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
misleading diff-hunk header
[posted originally to git-users@ but advised this would be a better forum] diff.{type}.xfuncname seems to start searching backwards in from the beginning of the hunk, not the first differing line. To reproduce: $ mkdir tmp $ cd tmp $ git init $ cat foo.c EOF int call_me(int maybe) { } int main() { } EOF $ git add foo.c $ git commit -m Initial checkin $ ed foo.c # main() should return 0 $i return 0; . wq $ git diff The diff returns a header line of @@ -4,4 +4,5 @@ int call_me(int maybe) int main() { + return 0; } misleadingly suggesting that the change occurred in the call_me() function, rather than in main() I'm running 1.7.2.5 from Debian Stable if that makes a difference. Is this expected/proper behavior? -tkc FWIW, I stumbled across this tinkering with diff.{typename}.xfuncname detailed in gitattributes(5) -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
misleading diff-hunk header
[posted originally to git-users@ but advised this would be a better forum] diff.{type}.xfuncname seems to start searching backwards in from the beginning of the hunk, not the first differing line. To reproduce: $ mkdir tmp $ cd tmp $ git init $ cat foo.c EOF int call_me(int maybe) { } int main() { } EOF $ git add foo.c $ git commit -m Initial checkin $ ed foo.c # main() should return 0 $i return 0; . wq $ git diff The diff returns a header line of @@ -4,4 +4,5 @@ int call_me(int maybe) int main() { + return 0; } misleadingly suggesting that the change occurred in the call_me() function, rather than in main() I'm running 1.7.2.5 from Debian Stable if that makes a difference. Is this expected/proper behavior? -tkc FWIW, I stumbled across this tinkering with diff.{typename}.xfuncname detailed in gitattributes(5) -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: misleading diff-hunk header
Tim Chase g...@tim.thechases.com writes: diff.{type}.xfuncname seems to start searching backwards in from the beginning of the hunk, not the first differing line. [...] @@ -4,4 +4,5 @@ int call_me(int maybe) int main() { + return 0; } misleadingly suggesting that the change occurred in the call_me() function, rather than in main() I think that's intentional, and matches what 'diff -p' does. It gives you the context before the hunk. After all, if a new function starts in the leading context lines, you can see that in the usual diff data. -- Thomas Rast trast@{inf,student}.ethz.ch -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: misleading diff-hunk header
On 08/21/12 10:22, Thomas Rast wrote: Tim Chase g...@tim.thechases.com writes: diff.{type}.xfuncname seems to start searching backwards in from the beginning of the hunk, not the first differing line. [...] @@ -4,4 +4,5 @@ int call_me(int maybe) int main() { + return 0; } misleadingly suggesting that the change occurred in the call_me() function, rather than in main() I think that's intentional, and matches what 'diff -p' does. It gives you the context before the hunk. After all, if a new function starts in the leading context lines, you can see that in the usual diff data. Okay...I tested diff -p and can't argue (much) with historical adherence. It just makes it hard for me to gather some stats on the functions that changed, and requires that I look in more than one place (both in the header, and in the leading context) rather than having a single authoritative place to grep. Then again, diff -p only seems to support C functions, while git supports bibtex, cpp, html, java, objc, pascal, php, python, ruby, and tex out-of-the-box, with the option to build your own function-finder, so pure adherence to history gets a little muddied. Thanks for your thoughts, -tkc -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: misleading diff-hunk header
Am 21.08.2012 17:42, schrieb Tim Chase: On 08/21/12 10:22, Thomas Rast wrote: misleadingly suggesting that the change occurred in the call_me() function, rather than in main() I think that's intentional, and matches what 'diff -p' does... Okay...I tested diff -p and can't argue (much) with historical adherence. It just makes it hard for me to gather some stats on the functions that changed, and requires that I look in more than one place (both in the header, and in the leading context) rather than having a single authoritative place to grep. If it's only for stats, why not just remove the context with -U0? -- Hannes -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html