Re: [PATCH] diff.c: color moved lines differently
On Tue, Jun 13, 2017 at 3:51 PM, Jonathan Tanwrote: > On Wed, 31 May 2017 17:24:29 -0700 > Stefan Beller wrote: > >> When a patch consists mostly of moving blocks of code around, it can >> be quite tedious to ensure that the blocks are moved verbatim, and not >> undesirably modified in the move. To that end, color blocks that are >> moved within the same patch differently. For example (OM, del, add, >> and NM are different colors): > > [snip] > > Junio asks "are we happy with these changes" [1] and my answer is, in > general, yes - this seems like a very useful feature to have, and I'm OK > with the current design. > > I do feel a bit of unease at how the emitted strings are collected > without many guarantees as to their contents (e.g. whether they are full > lines or even whether they originate from the text of a file), but this > is already true for the existing code. The potential danger is that we > are now relying more on the format of these strings, but we don't plan > to do anything other than to color them, so this seems fine. I will add comments into the code for that. > > I would also prefer if there was only one coloring method, to ease > testing, but I can tolerate the current multiplicity of options. I *think* by now everyone involved in the discussion agrees that we want Zebra + optional aggressive dimming (inside blocks as well as at bounds that are not adjacent to other blocks, i.e. anything non-adjacent to a different block)
Re: [PATCH] diff.c: color moved lines differently
On Wed, 31 May 2017 17:24:29 -0700 Stefan Bellerwrote: > When a patch consists mostly of moving blocks of code around, it can > be quite tedious to ensure that the blocks are moved verbatim, and not > undesirably modified in the move. To that end, color blocks that are > moved within the same patch differently. For example (OM, del, add, > and NM are different colors): [snip] Junio asks "are we happy with these changes" [1] and my answer is, in general, yes - this seems like a very useful feature to have, and I'm OK with the current design. I do feel a bit of unease at how the emitted strings are collected without many guarantees as to their contents (e.g. whether they are full lines or even whether they originate from the text of a file), but this is already true for the existing code. The potential danger is that we are now relying more on the format of these strings, but we don't plan to do anything other than to color them, so this seems fine. I would also prefer if there was only one coloring method, to ease testing, but I can tolerate the current multiplicity of options. I think there was some discussion at trying to avoid indicating that small blocks (consisting of few non-whitespace characters) are moved, but I think that that can be added later. [1] https://public-inbox.org/git/cagz79ky2z-fjyxczbzheu1hchlkkkdjecdmwsp-hkn0tjub...@mail.gmail.com/ > --- a/Documentation/diff-options.txt > +++ b/Documentation/diff-options.txt > @@ -231,6 +231,38 @@ ifdef::git-diff[] > endif::git-diff[] > It is the same as `--color=never`. > > +--color-moved[=]:: > + Moved lines of code are colored differently. > +ifdef::git-diff[] > + It can be changed by the `diff.colorMoved` configuration setting. > +endif::git-diff[] > + The defaults to 'no' if the option is not given > + and to 'adjacentbounds' if the option with no mode is given. > + The mode must be one of: > ++ > +-- > +no:: > + Moved lines are not highlighted. > +nobounds:: > + Any line that is added in one location and was removed > + in another location will be colored with 'color.diff.newmoved'. > + Similarly 'color.diff.oldmoved' will be used for removed lines Probably best to consistently capitalize newMoved and oldMoved. > +static unsigned get_line_hash(struct diff_line *line, unsigned ignore_ws) > +{ > + static struct strbuf sb = STRBUF_INIT; > + > + if (ignore_ws) { > + strbuf_reset(); > + get_ws_cleaned_string(line, ); > + return memhash(sb.buf, sb.len); > + } else { > + return memhash(line->line, line->len); > + } > +} Can be written without the "else". > +test_expect_success 'move detection does not mess up colored words' ' Probably name this test "--color-moved has no effect when used with --word-diff".
[PATCH] diff.c: color moved lines differently
When a patch consists mostly of moving blocks of code around, it can be quite tedious to ensure that the blocks are moved verbatim, and not undesirably modified in the move. To that end, color blocks that are moved within the same patch differently. For example (OM, del, add, and NM are different colors): [OM] -void sensitive_stuff(void) [OM] -{ [OM] -if (!is_authorized_user()) [OM] -die("unauthorized"); [OM] -sensitive_stuff(spanning, [OM] -multiple, [OM] -lines); [OM] -} void another_function() { [del] -printf("foo"); [add] +printf("bar"); } [NM] +void sensitive_stuff(void) [NM] +{ [NM] +if (!is_authorized_user()) [NM] +die("unauthorized"); [NM] +sensitive_stuff(spanning, [NM] +multiple, [NM] +lines); [NM] +} However adjacent blocks may be problematic. For example, in this potentially malicious patch, the swapping of blocks can be spotted: [OM] -void sensitive_stuff(void) [OM] -{ [OMA] -if (!is_authorized_user()) [OMA] -die("unauthorized"); [OM] -sensitive_stuff(spanning, [OM] -multiple, [OM] -lines); [OMA] -} void another_function() { [del] -printf("foo"); [add] +printf("bar"); } [NM] +void sensitive_stuff(void) [NM] +{ [NMA] +sensitive_stuff(spanning, [NMA] +multiple, [NMA] +lines); [NM] +if (!is_authorized_user()) [NM] +die("unauthorized"); [NMA] +} If the moved code is larger, it is easier to hide some permutation in the code, which is why some alternative coloring is needed. As the reviewers attention should be brought to the places, where the difference is introduced to the moved code, we cannot just have one new color for all of moved code. First I implemented an alternative design, which would try to fingerprint a line by its neighbors to detect if we are in a block or at the boundary. This idea iss error prone as it inspected each line and its neighboring lines to determine if the line was (a) moved and (b) if was deep inside a hunk by having matching neighboring lines. This is unreliable as the we can construct hunks which have equal neighbors that just exceed the number of lines inspected. (Think of 'AXYZBXYZCXYZD..' with each letter as a line, that is permutated to AXYZCXYZBXYZD..'). Instead this provides a dynamic programming greedy algorithm that finds the largest moved hunk and then has several modes on highlighting bounds. A note on the options '--submodule=diff' and '--color-words/--word-diff': In the conversion to use emit_line in the prior patches both submodules as well as word diff output carefully chose to call emit_line with sign=0. All output with sign=0 is ignored for move detection purposes in this patch, such that no weird looking output will be generated for these cases. This leads to another thought: We could pass on '--color-moved' to submodules such that they color up moved lines for themselves. If we'd do so only line moves within a repository boundary are marked up. Helped-by: Jonathan TanSigned-off-by: Stefan Beller --- Replacing the top commit in origin/sb/diff-color-move, this has the spelling fixes by Philip. Also a minor fix for the 'alternate' mode, to go back to the default after empty lines. Thanks to Jacob. Thanks, Stefan Documentation/config.txt | 10 +- Documentation/diff-options.txt | 32 color.h| 2 + diff.c | 343 +++-- diff.h | 15 +- t/t4015-diff-whitespace.sh | 373 + 6 files changed, 761 insertions(+), 14 deletions(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index 475e874d51..73511a4603 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -1051,14 +1051,20 @@ This does not affect linkgit:git-format-patch[1] or the 'git-diff-{asterisk}' plumbing commands. Can be overridden on the command line with the `--color[=]` option. +diff.colorMoved:: + If set moved lines in a diff are colored differently, + for details see '--color-moved' in linkgit:git-diff[1]. + color.diff.:: Use customized color for diff colorization. `` specifies which part of the patch to use the specified color, and is one of `context` (context text - `plain` is a historical synonym), `meta` (metainformation), `frag` (hunk header), 'func' (function in hunk header), `old` (removed