Re: [PATCH v2 7/7] blame: actually use the diff opts parsed from the command line

2016-09-07 Thread Junio C Hamano
Michael Haggerty  writes:

> The reason that I would prefer to change `blame` as part of this patch
> series is that I think it would be disconcerting for `git diff` and `git
> blame` to use different heuristics when computing diffs. It would make
> their output inconsistent.

I do think it is the right thing to do.  With your shifting heuristics,
"git diff" would attribute an addition of a whole block more
correctly, e.g.

 }

+foo {
+   bar
+   baz
+}

instead of attributing the tail of the new thing to the old author,
and the "blame" should take advantage of the better heuristics as
well.


Re: [PATCH v2 7/7] blame: actually use the diff opts parsed from the command line

2016-09-04 Thread Michael Haggerty
On 08/23/2016 11:56 AM, René Scharfe wrote:
> Am 22.08.2016 um 13:22 schrieb Michael Haggerty:
>> "git blame" already parsed generic diff options from the command line
>> via diff_opt_parse(), but instead of passing the resulting xdl_opts to
>> xdi_diff(), it sent its own xdl_opts, which only reflected the values of
>> the self-parsed options "-w" and "--minimal". Instead, rely on
>> diff_opt_parse() to parse all of the diff options, including "-w" and
>> "--minimal", and pass the resulting xdl_opts to xdi_diff().
> 
> Sounds useful: It allows more fine-grained control over which whitespace
> changes to ignore and which diff algorithm to use.  There is a bit of
> overlap (e.g. with -b meaning show blank boundaries vs. ignore
> whitespace changes), but with your patch blame's own options still take
> precedence, so there should be no unpleasant surprises.
> 
>> Signed-off-by: Michael Haggerty 
>> ---
>> Somebody who knows more about how diff operations are configured
>> should please review this. I'm not certain that the change as
>> implemented won't have other unwanted side-effects, though of course
>> I checked that the test suite runs correctly.
> 
> I don't qualify, but I'll comment anyway..
> 
>>  builtin/blame.c|  11 ++--
>>  t/t4059-diff-indent.sh | 160
>> +
>>  2 files changed, 165 insertions(+), 6 deletions(-)
>>  create mode 100755 t/t4059-diff-indent.sh
> 
> This new test doesn't call git blame.  Does it belong to a different
> commit?  And shouldn't the change to blame.c stand on its own, outside
> of this series?

Thanks for catching this. I squashed the test incorrectly; it was meant
to be part of the previous commit. I'll fix it in v3.

The reason that I would prefer to change `blame` as part of this patch
series is that I think it would be disconcerting for `git diff` and `git
blame` to use different heuristics when computing diffs. It would make
their output inconsistent.

But probably that means passing just the new option to `git blame`
rather than passing all possible `diff` options through.

> [...]

Your other comments may be moot given that changing `git blame` in this
way seems to have been a bad idea in the first place [1]. But I'll keep
them in mind if a new version contains similar code.

Thanks,
Michael

[1]
http://public-inbox.org/git/xmqqtwebwhbg@gitster.mtv.corp.google.com/



Re: [PATCH v2 7/7] blame: actually use the diff opts parsed from the command line

2016-08-23 Thread Junio C Hamano
Michael Haggerty  writes:

> Somebody who knows more about how diff operations are configured
> should please review this. I'm not certain that the change as
> implemented won't have other unwanted side-effects, though of course
> I checked that the test suite runs correctly.

Generally, I think this is a bad idea.  

We run "diff-tree" internally for multiple purposes:

 - When the same path we are drilling down appears, we use diff-tree
   to compare that path alone, to obtain textual diff, so that we
   can say "what did not appear in the textual diff output are
   attributable to the parent commit, we can exonerate this child".

   Even if the command line to "git blame" had "-Sfoo", you do not
   want to pass it down to this invocation.  If the child did not
   change the number of occurrences of string "foo" in the path
   being drilled down, it will pass down the blame for all lines to
   the parent.  And copying -R from the command line to this
   invocation does not make any sense.  Copying -C -C from the
   command line will defeat the whole purpose of having find_origin
   vs find_rename distinction, I would think, for this step.

 - When the path we have been drilling down for no longer appears in
   the parent, we use diff-tree with rename detection to find where
   the file _could_ have come from.

   It is a good idea to allow the similarity threshold to be tweaked
   from the command line.  Copying -R from the command line to this
   invocation may make sense, but I think you would break this step
   if you copied -C/-C -C

 - When -C/-C -C/-C -C -C is given from the command line, after
   running the "pass the blame to our primary suspect", we run
   tree-level diff-tree with find-copy option, only to enumerate
   paths that appear in the parent.  We pick pieces from these paths
   by doing blob-level diffs in diff_hunks() ourselves.  Copying -C
   from the command line would be useful if you are only doing a
   single '-C' (because it would allow you to tweak the similarity
   threshold), but otherwise it would probably break the command.

The design principle taken in "git blame" is to leave the decision
on what diff options do or do not make sense for these particular
invocations of the internal "diff-tree", and have "blame" give the
selected options to the internal "diff-tree" invocations.  "-w"
happens to use xdl_opts that will eventually be passed down to diff
machinery but you should consider it an accidental optimization
(i.e. "blame" designer considered that it makes sense to give
whitespace-ignoring comparison at the leaf-level "diff between
blobs", and found that the most efficient way to do so is to just
take "-w" from the command line and set the ignore-whitespace bit
used to drive the internal "diff-tree" invocation).  If we _were_
adding -R to allow the users to affect similarity threshold,
we would parse it in "git blame" command line option processing,
and would give that _ONLY_ to the second invocation above, i.e. the
one done in find_rename() but not in others like in find_origin.





--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 7/7] blame: actually use the diff opts parsed from the command line

2016-08-23 Thread René Scharfe

Am 22.08.2016 um 13:22 schrieb Michael Haggerty:

"git blame" already parsed generic diff options from the command line
via diff_opt_parse(), but instead of passing the resulting xdl_opts to
xdi_diff(), it sent its own xdl_opts, which only reflected the values of
the self-parsed options "-w" and "--minimal". Instead, rely on
diff_opt_parse() to parse all of the diff options, including "-w" and
"--minimal", and pass the resulting xdl_opts to xdi_diff().


Sounds useful: It allows more fine-grained control over which whitespace 
changes to ignore and which diff algorithm to use.  There is a bit of 
overlap (e.g. with -b meaning show blank boundaries vs. ignore 
whitespace changes), but with your patch blame's own options still take 
precedence, so there should be no unpleasant surprises.



Signed-off-by: Michael Haggerty 
---
Somebody who knows more about how diff operations are configured
should please review this. I'm not certain that the change as
implemented won't have other unwanted side-effects, though of course
I checked that the test suite runs correctly.


I don't qualify, but I'll comment anyway..


 builtin/blame.c|  11 ++--
 t/t4059-diff-indent.sh | 160 +
 2 files changed, 165 insertions(+), 6 deletions(-)
 create mode 100755 t/t4059-diff-indent.sh


This new test doesn't call git blame.  Does it belong to a different 
commit?  And shouldn't the change to blame.c stand on its own, outside 
of this series?



diff --git a/builtin/blame.c b/builtin/blame.c
index 7ec7823..cde2d15 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -48,11 +48,12 @@ static int show_root;
 static int reverse;
 static int blank_boundary;
 static int incremental;
-static int xdl_opts;
 static int abbrev = -1;
 static int no_whole_file_rename;
 static int show_progress;

+static struct rev_info revs;
+
 static struct date_mode blame_date_mode = { DATE_ISO8601 };
 static size_t blame_date_width;

@@ -137,11 +138,12 @@ struct progress_info {
 static int diff_hunks(mmfile_t *file_a, mmfile_t *file_b,
  xdl_emit_hunk_consume_func_t hunk_func, void *cb_data)
 {
-   xpparam_t xpp = {0};
+   xpparam_t xpp;
xdemitconf_t xecfg = {0};
xdemitcb_t ecb = {NULL};

-   xpp.flags = xdl_opts;
+   memset(&xpp, 0, sizeof(xpp));
+   xpp.flags = revs.diffopt.xdl_opts;


Why call memset instead of using a static initializer?  The intent of 
this patch is just to change the .flags assignment, isn't it?



xecfg.hunk_func = hunk_func;
ecb.priv = cb_data;
return xdi_diff(file_a, file_b, &xpp, &xecfg, &ecb);
@@ -2517,7 +2519,6 @@ static int blame_move_callback(const struct option 
*option, const char *arg, int

 int cmd_blame(int argc, const char **argv, const char *prefix)
 {
-   struct rev_info revs;
const char *path;
struct scoreboard sb;
struct origin *o;
@@ -2548,8 +2549,6 @@ int cmd_blame(int argc, const char **argv, const char 
*prefix)
OPT_BIT('l', NULL, &output_option, N_("Show long commit SHA1 
(Default: off)"), OUTPUT_LONG_OBJECT_NAME),
OPT_BIT('s', NULL, &output_option, N_("Suppress author name and 
timestamp (Default: off)"), OUTPUT_NO_AUTHOR),
OPT_BIT('e', "show-email", &output_option, N_("Show author email 
instead of name (Default: off)"), OUTPUT_SHOW_EMAIL),
-   OPT_BIT('w', NULL, &xdl_opts, N_("Ignore whitespace 
differences"), XDF_IGNORE_WHITESPACE),
-   OPT_BIT(0, "minimal", &xdl_opts, N_("Spend extra cycles to find 
better match"), XDF_NEED_MINIMAL),


This removes -w and --minimal from blame's short help; diff options 
should be mentioned somehow in exchange for that loss.  Or perhaps they 
should be mentioned in git-rev-list(1)?  (git blame -h points to 
git-rev-list(1) already.)


Documentation/git-blame.txt needs an update as well.


OPT_STRING('S', NULL, &revs_file, N_("file"), N_("Use revisions from 
 instead of calling git-rev-list")),
OPT_STRING(0, "contents", &contents_from, N_("file"), N_("Use 
's contents as the final image")),
{ OPTION_CALLBACK, 'C', NULL, &opt, N_("score"), N_("Find line 
copies within and across files"), PARSE_OPT_OPTARG, blame_copy_callback },


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 7/7] blame: actually use the diff opts parsed from the command line

2016-08-22 Thread Michael Haggerty
"git blame" already parsed generic diff options from the command line
via diff_opt_parse(), but instead of passing the resulting xdl_opts to
xdi_diff(), it sent its own xdl_opts, which only reflected the values of
the self-parsed options "-w" and "--minimal". Instead, rely on
diff_opt_parse() to parse all of the diff options, including "-w" and
"--minimal", and pass the resulting xdl_opts to xdi_diff().

Signed-off-by: Michael Haggerty 
---
Somebody who knows more about how diff operations are configured
should please review this. I'm not certain that the change as
implemented won't have other unwanted side-effects, though of course
I checked that the test suite runs correctly.

 builtin/blame.c|  11 ++--
 t/t4059-diff-indent.sh | 160 +
 2 files changed, 165 insertions(+), 6 deletions(-)
 create mode 100755 t/t4059-diff-indent.sh

diff --git a/builtin/blame.c b/builtin/blame.c
index 7ec7823..cde2d15 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -48,11 +48,12 @@ static int show_root;
 static int reverse;
 static int blank_boundary;
 static int incremental;
-static int xdl_opts;
 static int abbrev = -1;
 static int no_whole_file_rename;
 static int show_progress;
 
+static struct rev_info revs;
+
 static struct date_mode blame_date_mode = { DATE_ISO8601 };
 static size_t blame_date_width;
 
@@ -137,11 +138,12 @@ struct progress_info {
 static int diff_hunks(mmfile_t *file_a, mmfile_t *file_b,
  xdl_emit_hunk_consume_func_t hunk_func, void *cb_data)
 {
-   xpparam_t xpp = {0};
+   xpparam_t xpp;
xdemitconf_t xecfg = {0};
xdemitcb_t ecb = {NULL};
 
-   xpp.flags = xdl_opts;
+   memset(&xpp, 0, sizeof(xpp));
+   xpp.flags = revs.diffopt.xdl_opts;
xecfg.hunk_func = hunk_func;
ecb.priv = cb_data;
return xdi_diff(file_a, file_b, &xpp, &xecfg, &ecb);
@@ -2517,7 +2519,6 @@ static int blame_move_callback(const struct option 
*option, const char *arg, int
 
 int cmd_blame(int argc, const char **argv, const char *prefix)
 {
-   struct rev_info revs;
const char *path;
struct scoreboard sb;
struct origin *o;
@@ -2548,8 +2549,6 @@ int cmd_blame(int argc, const char **argv, const char 
*prefix)
OPT_BIT('l', NULL, &output_option, N_("Show long commit SHA1 
(Default: off)"), OUTPUT_LONG_OBJECT_NAME),
OPT_BIT('s', NULL, &output_option, N_("Suppress author name and 
timestamp (Default: off)"), OUTPUT_NO_AUTHOR),
OPT_BIT('e', "show-email", &output_option, N_("Show author 
email instead of name (Default: off)"), OUTPUT_SHOW_EMAIL),
-   OPT_BIT('w', NULL, &xdl_opts, N_("Ignore whitespace 
differences"), XDF_IGNORE_WHITESPACE),
-   OPT_BIT(0, "minimal", &xdl_opts, N_("Spend extra cycles to find 
better match"), XDF_NEED_MINIMAL),
OPT_STRING('S', NULL, &revs_file, N_("file"), N_("Use revisions 
from  instead of calling git-rev-list")),
OPT_STRING(0, "contents", &contents_from, N_("file"), N_("Use 
's contents as the final image")),
{ OPTION_CALLBACK, 'C', NULL, &opt, N_("score"), N_("Find line 
copies within and across files"), PARSE_OPT_OPTARG, blame_copy_callback },
diff --git a/t/t4059-diff-indent.sh b/t/t4059-diff-indent.sh
new file mode 100755
index 000..8b6c7ef
--- /dev/null
+++ b/t/t4059-diff-indent.sh
@@ -0,0 +1,160 @@
+#!/bin/sh
+
+test_description='Test diff indent heuristic.
+
+'
+. ./test-lib.sh
+. "$TEST_DIRECTORY"/diff-lib.sh
+
+# Compare two diff outputs. Ignore "index" lines, because we don't
+# care about SHA-1s or file modes.
+compare_diff () {
+   sed -e "/^index /d" <"$1" >.tmp-1
+   sed -e "/^index /d" <"$2" >.tmp-2
+   test_cmp .tmp-1 .tmp-2 && rm -f .tmp-1 .tmp-2
+}
+
+test_expect_success 'diff: favor trailing blank lines' '
+   cat <<-\EOF >old &&
+   1
+   2
+   a
+
+   b
+   3
+   4
+   EOF
+
+   cat <<-\EOF >new &&
+   1
+   2
+   a
+
+   b
+   a
+
+   b
+   3
+   4
+   EOF
+
+   tr "_" " " <<-\EOF >expect &&
+   diff --git a/old b/new
+   --- a/old
+   +++ b/new
+   @@ -3,5 +3,8 @@
+a
+   _
+b
+   +a
+   +
+   +b
+3
+4
+   EOF
+
+   tr "_" " " <<-\EOF >expect-compacted &&
+   diff --git a/old b/new
+   --- a/old
+   +++ b/new
+   @@ -2,6 +2,9 @@
+2
+a
+   _
+   +b
+   +a
+   +
+b
+3
+4
+   EOF
+
+   test_must_fail git diff --no-index old new >out &&
+   compare_diff expect out &&
+
+   test_must_fail git diff --no-index --indent-heuristic old new 
>out-compacted &&
+   compare_diff expect-compacted out-compacted &&
+
+   test_must_fail git -c diff.indentHeuristic=true diff --no-index old new 
>out-compacted2 &&
+   compare_diff expect-compacted out-compacted2