Re: [PATCH] diff.c: color moved lines differently

2017-06-14 Thread Stefan Beller
On Tue, Jun 13, 2017 at 3:51 PM, Jonathan Tan  wrote:
> On Wed, 31 May 2017 17:24:29 -0700
> Stefan Beller  wrote:
>
>> When a patch consists mostly of moving blocks of code around, it can
>> be quite tedious to ensure that the blocks are moved verbatim, and not
>> undesirably modified in the move. To that end, color blocks that are
>> moved within the same patch differently. For example (OM, del, add,
>> and NM are different colors):
>
> [snip]
>
> Junio asks "are we happy with these changes" [1] and my answer is, in
> general, yes - this seems like a very useful feature to have, and I'm OK
> with the current design.
>
> I do feel a bit of unease at how the emitted strings are collected
> without many guarantees as to their contents (e.g. whether they are full
> lines or even whether they originate from the text of a file), but this
> is already true for the existing code. The potential danger is that we
> are now relying more on the format of these strings, but we don't plan
> to do anything other than to color them, so this seems fine.

I will add comments into the code for that.

>
> I would also prefer if there was only one coloring method, to ease
> testing, but I can tolerate the current multiplicity of options.

I *think* by now everyone involved in the discussion agrees that we
want Zebra + optional aggressive dimming (inside blocks as well as
at bounds that are not adjacent to other blocks, i.e. anything
non-adjacent to a different block)


Re: [PATCH] diff.c: color moved lines differently

2017-06-13 Thread Jonathan Tan
On Wed, 31 May 2017 17:24:29 -0700
Stefan Beller  wrote:

> When a patch consists mostly of moving blocks of code around, it can
> be quite tedious to ensure that the blocks are moved verbatim, and not
> undesirably modified in the move. To that end, color blocks that are
> moved within the same patch differently. For example (OM, del, add,
> and NM are different colors):

[snip]

Junio asks "are we happy with these changes" [1] and my answer is, in
general, yes - this seems like a very useful feature to have, and I'm OK
with the current design.

I do feel a bit of unease at how the emitted strings are collected
without many guarantees as to their contents (e.g. whether they are full
lines or even whether they originate from the text of a file), but this
is already true for the existing code. The potential danger is that we
are now relying more on the format of these strings, but we don't plan
to do anything other than to color them, so this seems fine.

I would also prefer if there was only one coloring method, to ease
testing, but I can tolerate the current multiplicity of options.

I think there was some discussion at trying to avoid indicating that
small blocks (consisting of few non-whitespace characters) are moved,
but I think that that can be added later.

[1] 
https://public-inbox.org/git/cagz79ky2z-fjyxczbzheu1hchlkkkdjecdmwsp-hkn0tjub...@mail.gmail.com/

> --- a/Documentation/diff-options.txt
> +++ b/Documentation/diff-options.txt
> @@ -231,6 +231,38 @@ ifdef::git-diff[]
>  endif::git-diff[]
>   It is the same as `--color=never`.
>  
> +--color-moved[=]::
> + Moved lines of code are colored differently.
> +ifdef::git-diff[]
> + It can be changed by the `diff.colorMoved` configuration setting.
> +endif::git-diff[]
> + The  defaults to 'no' if the option is not given
> + and to 'adjacentbounds' if the option with no mode is given.
> + The mode must be one of:
> ++
> +--
> +no::
> + Moved lines are not highlighted.
> +nobounds::
> + Any line that is added in one location and was removed
> + in another location will be colored with 'color.diff.newmoved'.
> + Similarly 'color.diff.oldmoved' will be used for removed lines

Probably best to consistently capitalize newMoved and oldMoved.

> +static unsigned get_line_hash(struct diff_line *line, unsigned ignore_ws)
> +{
> + static struct strbuf sb = STRBUF_INIT;
> +
> + if (ignore_ws) {
> + strbuf_reset();
> + get_ws_cleaned_string(line, );
> + return memhash(sb.buf, sb.len);
> + } else {
> + return memhash(line->line, line->len);
> + }
> +}

Can be written without the "else".

> +test_expect_success 'move detection does not mess up colored words' '

Probably name this test "--color-moved has no effect when used with
--word-diff".


[PATCH] diff.c: color moved lines differently

2017-05-31 Thread Stefan Beller
When a patch consists mostly of moving blocks of code around, it can
be quite tedious to ensure that the blocks are moved verbatim, and not
undesirably modified in the move. To that end, color blocks that are
moved within the same patch differently. For example (OM, del, add,
and NM are different colors):

[OM]  -void sensitive_stuff(void)
[OM]  -{
[OM]  -if (!is_authorized_user())
[OM]  -die("unauthorized");
[OM]  -sensitive_stuff(spanning,
[OM]  -multiple,
[OM]  -lines);
[OM]  -}

   void another_function()
   {
[del] -printf("foo");
[add] +printf("bar");
   }

[NM]  +void sensitive_stuff(void)
[NM]  +{
[NM]  +if (!is_authorized_user())
[NM]  +die("unauthorized");
[NM]  +sensitive_stuff(spanning,
[NM]  +multiple,
[NM]  +lines);
[NM]  +}

However adjacent blocks may be problematic. For example, in this
potentially malicious patch, the swapping of blocks can be spotted:

[OM]  -void sensitive_stuff(void)
[OM]  -{
[OMA] -if (!is_authorized_user())
[OMA] -die("unauthorized");
[OM]  -sensitive_stuff(spanning,
[OM]  -multiple,
[OM]  -lines);
[OMA] -}

   void another_function()
   {
[del] -printf("foo");
[add] +printf("bar");
   }

[NM]  +void sensitive_stuff(void)
[NM]  +{
[NMA] +sensitive_stuff(spanning,
[NMA] +multiple,
[NMA] +lines);
[NM]  +if (!is_authorized_user())
[NM]  +die("unauthorized");
[NMA] +}

If the moved code is larger, it is easier to hide some permutation in the
code, which is why some alternative coloring is needed.

As the reviewers attention should be brought to the places, where the
difference is introduced to the moved code, we cannot just have one new
color for all of moved code.

First I implemented an alternative design, which would try to fingerprint
a line by its neighbors to detect if we are in a block or at the boundary.
This idea iss error prone as it inspected each line and its neighboring
lines to determine if the line was (a) moved and (b) if was deep inside
a hunk by having matching neighboring lines. This is unreliable as the
we can construct hunks which have equal neighbors that just exceed the
number of lines inspected. (Think of 'AXYZBXYZCXYZD..' with each letter
as a line, that is permutated to AXYZCXYZBXYZD..').

Instead this provides a dynamic programming greedy algorithm that finds
the largest moved hunk and then has several modes on highlighting bounds.

A note on the options '--submodule=diff' and '--color-words/--word-diff':
In the conversion to use emit_line in the prior patches both submodules
as well as word diff output carefully chose to call emit_line with sign=0.
All output with sign=0 is ignored for move detection purposes in this
patch, such that no weird looking output will be generated for these
cases. This leads to another thought: We could pass on '--color-moved' to
submodules such that they color up moved lines for themselves. If we'd do
so only line moves within a repository boundary are marked up.

Helped-by: Jonathan Tan 
Signed-off-by: Stefan Beller 
---

 Replacing the top commit in origin/sb/diff-color-move, 
 this has the spelling fixes by Philip.
 
 Also a minor fix for the 'alternate' mode, to go back to the default
 after empty lines. Thanks to Jacob.
 
 Thanks,
 Stefan

 Documentation/config.txt   |  10 +-
 Documentation/diff-options.txt |  32 
 color.h|   2 +
 diff.c | 343 +++--
 diff.h |  15 +-
 t/t4015-diff-whitespace.sh | 373 +
 6 files changed, 761 insertions(+), 14 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 475e874d51..73511a4603 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1051,14 +1051,20 @@ This does not affect linkgit:git-format-patch[1] or the
 'git-diff-{asterisk}' plumbing commands.  Can be overridden on the
 command line with the `--color[=]` option.
 
+diff.colorMoved::
+   If set moved lines in a diff are colored differently,
+   for details see '--color-moved' in linkgit:git-diff[1].
+
 color.diff.::
Use customized color for diff colorization.  `` specifies
which part of the patch to use the specified color, and is one
of `context` (context text - `plain` is a historical synonym),
`meta` (metainformation), `frag`
(hunk header), 'func' (function in hunk header), `old` (removed