Re: [PATCH v4 9/9] diff.c: add white space mode to move detection that allows indent changes
On Mon, Jul 2, 2018 at 10:36 AM Brandon Williams wrote: > > On 06/28, Stefan Beller wrote: > > The option of --color-moved has proven to be useful as observed on the > > mailing list. However when refactoring sometimes the indentation changes, > > for example when partitioning a functions into smaller helper functions > > the code usually mostly moved around except for a decrease in indentation. > > > > To just review the moved code ignoring the change in indentation, a mode > > to ignore spaces in the move detection as implemented in a previous patch > > would be enough. However the whole move coloring as motivated in commit > > 2e2d5ac (diff.c: color moved lines differently, 2017-06-30), brought > > up the notion of the reviewer being able to trust the move of a "block". > > > > As there are languages such as python, which depend on proper relative > > indentation for the control flow of the program, ignoring any white space > > change in a block would not uphold the promises of 2e2d5ac that allows > > reviewers to pay less attention to the inside of a block, as inside > > the reviewer wants to assume the same program flow. > > > > This new mode of white space ignorance will take this into account and will > > only allow the same white space changes per line in each block. This patch > > even allows only for the same change at the beginning of the lines. > > > > As this is a white space mode, it is made exclusive to other white space > > modes in the move detection. > > > > This patch brings some challenges, related to the detection of blocks. > > We need a white net the catch the possible moved lines, but then need to > > s/white/wide/ oops. will fix. > > > + > > +/** > > + * The struct ws_delta holds white space differences between moved lines, > > i.e. > > + * between '+' and '-' lines that have been detected to be a move. > > + * The string contains the difference in leading white spaces, before the > > + * rest of the line is compared using the white space config for move > > + * coloring. The current_longer indicates if the first string in the > > + * comparision is longer than the second. > > + */ > > +struct ws_delta { > > + char *string; > > + unsigned int current_longer : 1; > > }; > > +#define WS_DELTA_INIT { NULL, 0 } > > + > > +static int compute_ws_delta(const struct emitted_diff_symbol *a, > > + const struct emitted_diff_symbol *b, > > + struct ws_delta *out) > > +{ > > + const struct emitted_diff_symbol *longer = a->len > b->len ? a : b; > > + const struct emitted_diff_symbol *shorter = a->len > b->len ? b : a; > > + int d = longer->len - shorter->len; > > + > > + out->string = xmemdupz(longer->line, d); > > + out->current_longer = (a == longer); > > + > > + return !strncmp(longer->line + d, shorter->line, shorter->len); > > +} > > I'm having a harder time understanding this block. This is used to fill > a ws_delta struct by calculating the whitespace delta between two lines. yes. > If that is the case then why doesn't this function verify that the first > 'd' characters in the longer line are indeed whitespace? This was done implicitly before as compute_ws_delta is called only on two lines that are "equal with XDF_IGNORE_WHITESPACE". > Also, maybe > this is just because I'm not as familiar with the move detection code, > but how would the whitespace detection handle a line being moved from > being indented with tabs to spaces or vice versa? Is this something > already handled and not an issue? The ws_delta stores the string (of whitespaces) that the 'shorter string' needs to be prefixed with to create the 'longer string. (I chose 'shorter' and 'longer' as any of them can be '+' or '-' lines) Then later when we compare the strings (the current line in consideration and strings that we put in hashmaps that we know are moved lines) in cmp_in_block_with_wsd, we (should) take this white space delta into account. I just realize that there we do not check for the tab/space replacement as there we would need to compare the ws_delta string to the beginning of the 'longer' string there. I'll add a test for the space/tab replacement as well. Thanks, Stefan
Re: [PATCH v4 9/9] diff.c: add white space mode to move detection that allows indent changes
On 06/28, Stefan Beller wrote: > The option of --color-moved has proven to be useful as observed on the > mailing list. However when refactoring sometimes the indentation changes, > for example when partitioning a functions into smaller helper functions > the code usually mostly moved around except for a decrease in indentation. > > To just review the moved code ignoring the change in indentation, a mode > to ignore spaces in the move detection as implemented in a previous patch > would be enough. However the whole move coloring as motivated in commit > 2e2d5ac (diff.c: color moved lines differently, 2017-06-30), brought > up the notion of the reviewer being able to trust the move of a "block". > > As there are languages such as python, which depend on proper relative > indentation for the control flow of the program, ignoring any white space > change in a block would not uphold the promises of 2e2d5ac that allows > reviewers to pay less attention to the inside of a block, as inside > the reviewer wants to assume the same program flow. > > This new mode of white space ignorance will take this into account and will > only allow the same white space changes per line in each block. This patch > even allows only for the same change at the beginning of the lines. > > As this is a white space mode, it is made exclusive to other white space > modes in the move detection. > > This patch brings some challenges, related to the detection of blocks. > We need a white net the catch the possible moved lines, but then need to s/white/wide/ > + > +/** > + * The struct ws_delta holds white space differences between moved lines, > i.e. > + * between '+' and '-' lines that have been detected to be a move. > + * The string contains the difference in leading white spaces, before the > + * rest of the line is compared using the white space config for move > + * coloring. The current_longer indicates if the first string in the > + * comparision is longer than the second. > + */ > +struct ws_delta { > + char *string; > + unsigned int current_longer : 1; > }; > +#define WS_DELTA_INIT { NULL, 0 } > + > +static int compute_ws_delta(const struct emitted_diff_symbol *a, > + const struct emitted_diff_symbol *b, > + struct ws_delta *out) > +{ > + const struct emitted_diff_symbol *longer = a->len > b->len ? a : b; > + const struct emitted_diff_symbol *shorter = a->len > b->len ? b : a; > + int d = longer->len - shorter->len; > + > + out->string = xmemdupz(longer->line, d); > + out->current_longer = (a == longer); > + > + return !strncmp(longer->line + d, shorter->line, shorter->len); > +} I'm having a harder time understanding this block. This is used to fill a ws_delta struct by calculating the whitespace delta between two lines. If that is the case then why doesn't this function verify that the first 'd' characters in the longer line are indeed whitespace? Also, maybe this is just because I'm not as familiar with the move detection code, but how would the whitespace detection handle a line being moved from being indented with tabs to spaces or vice versa? Is this something already handled and not an issue? -- Brandon Williams
[PATCH v4 9/9] diff.c: add white space mode to move detection that allows indent changes
The option of --color-moved has proven to be useful as observed on the mailing list. However when refactoring sometimes the indentation changes, for example when partitioning a functions into smaller helper functions the code usually mostly moved around except for a decrease in indentation. To just review the moved code ignoring the change in indentation, a mode to ignore spaces in the move detection as implemented in a previous patch would be enough. However the whole move coloring as motivated in commit 2e2d5ac (diff.c: color moved lines differently, 2017-06-30), brought up the notion of the reviewer being able to trust the move of a "block". As there are languages such as python, which depend on proper relative indentation for the control flow of the program, ignoring any white space change in a block would not uphold the promises of 2e2d5ac that allows reviewers to pay less attention to the inside of a block, as inside the reviewer wants to assume the same program flow. This new mode of white space ignorance will take this into account and will only allow the same white space changes per line in each block. This patch even allows only for the same change at the beginning of the lines. As this is a white space mode, it is made exclusive to other white space modes in the move detection. This patch brings some challenges, related to the detection of blocks. We need a white net the catch the possible moved lines, but then need to narrow down to check if the blocks are still in tact. Consider this example (ignoring block sizes): - A - B - C +A +B +C At the beginning of a block when checking if there is a counterpart for A, we have to ignore all space changes. However at the following lines we have to check if the indent change stayed the same. Checking if the indentation change did stay the same, is done by computing the indentation change by the difference in line length, and then assume the change is only in the beginning of the longer line, the common tail is the same. That is why the test contains lines like: - A ... + A ... As the first line starting a block is caught using a compare function that ignores white spaces unlike the rest of the block, where the white space delta is taken into account for the comparison, we also have to think about the following situation: - A - B - A - B +A +B + A + B When checking if the first A (both in the + and - lines) is a start of a block, we have to check all 'A' and record all the white space deltas such that we can find the example above to be just one block that is indented. Signed-off-by: Stefan Beller Signed-off-by: Junio C Hamano --- Documentation/diff-options.txt | 5 ++ diff.c | 158 - diff.h | 3 + t/t4015-diff-whitespace.sh | 98 ++-- 4 files changed, 256 insertions(+), 8 deletions(-) diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt index 80e29e39854..143acd9417e 100644 --- a/Documentation/diff-options.txt +++ b/Documentation/diff-options.txt @@ -307,6 +307,11 @@ ignore-space-change:: ignore-all-space:: Ignore whitespace when comparing lines. This ignores differences even if one line has whitespace where the other line has none. +allow-indentation-change:: + Initially ignore any white spaces in the move detection, then + group the moved code blocks only into a block if the change in + whitespace is the same per line. This is incompatible with the + other modes. -- --word-diff[=]:: diff --git a/diff.c b/diff.c index 4963819e530..f51f0ac32f4 100644 --- a/diff.c +++ b/diff.c @@ -302,12 +302,18 @@ static int parse_color_moved_ws(const char *arg) ret |= XDF_IGNORE_WHITESPACE_AT_EOL; else if (!strcmp(sb.buf, "ignore-all-space")) ret |= XDF_IGNORE_WHITESPACE; + else if (!strcmp(sb.buf, "allow-indentation-change")) + ret |= COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE; else error(_("ignoring unknown color-moved-ws mode '%s'"), sb.buf); strbuf_release(&sb); } + if ((ret & COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE) && + (ret & XDF_WHITESPACE_FLAGS)) + die(_("color-moved-ws: allow-indentation-change cannot be combined with other white space modes")); + string_list_clear(&l, 0); return ret; @@ -737,7 +743,91 @@ struct moved_entry { struct hashmap_entry ent; const struct emitted_diff_symbol *es; struct moved_entry *next_line; + struct ws_delta *wsd; +}; + +/** + * The struct ws_delta holds white space differences between moved lines, i.e. + * between '+' and '-' lines that have been detected to be a move. + * The string contains the difference in leading white spaces, before the + * res