Re: [PATCH 3/4] diff-highlight: match up lines before highlighting
On Tue, Nov 3, 2015 at 5:03 PM, Jeff Kingwrote: > Your is _much_ slower. I get: > > real0m25.538s > user0m25.420s > sys 0m0.120s > > for the old versus: > > real2m3.580s > user2m3.548s > sys 0m0.156s Thanks for investigating and trying it out. I got the same results here as well. > for your series. In an interactive setting, the latency may not be that > noticeable, but if you are digging far into history (e.g., "git log -p", > then using "/" in less to search for a commit or some test), I suspect > it would be very noticeable. Agreed. > I was thinking there was some low-hanging fruit in memoizing the > calculations, but even the prefix/suffix computation is pairwise. I'm > not really sure how to make this much faster. I gave memoization a try to see if it could improve the situation. I also lowered maxhunksize to 10. Doing `git log -p` on git.git went from 2m31 to 2m11. So I think it would require a whole other approach overall. > As for the output itself, the diff between the two looks promising. The > first several cases I looked at ar strict improvements. Some of them are > kind of weird, especially in English text. Yes, I'm very happy with the improvements and run with these patches all the time for now. > In the other thread I mentioned earlier, the solution I cooked up was > dropping highlighting entirely for hunks over a certain percentage of > highlighting. I wonder if we could do something similar here (e.g., > don't match lines where more than 50% of the line would be highlighted). I looked over but haven't tested the patches in the other thread yet. But overall, the LCS definitely looks promising. I'm hoping to find some time to have a more serious go at it and maybe pick it up where you left off. > > -Peff Thanks again for reviewing these patches and apologies for the delayed reply. Jonathan -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/4] diff-highlight: add `less -r` to cmd in README
On Mon, Nov 2, 2015 at 9:41 PM, Junio C Hamano <gits...@pobox.com> wrote: > > Jonathan Lebon <jonathan.le...@gmail.com> writes: > > > As it is, the suggested command for trying out diff-highlight will just > > dump the whole git log output to the terminal. Let's pipe it through > > `less` so users aren't surprised on the first try. > > That justifies the "less" part but not your choice of "-r". > > I am assuming that you are telling "less" not to show the ANSI > "color" escape sequences using the caret notation with "-r", which > is a very natural and sensible thing to do when using `highlight`. > > But if that is the case, you don't want "-r" (raw control chars for > everything). You would want to say "-R", I think. Ahh thanks, that makes sense. I will update this for v2 tomorrow. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4] diff-highlight: add maxhunksize config option
As the size of the hunk gets bigger, it becomes harder to jump back and forth between the removed and added lines, and highlighting becomes less beneficial. We add a new config option called diff-highlight.maxhunksize which controls the maximum size of the hunk allowed for which highlighting is still performed. The default value is set to 20. Signed-off-by: Jonathan Lebon <jonathan.le...@gmail.com> --- contrib/diff-highlight/README | 20 contrib/diff-highlight/diff-highlight | 16 2 files changed, 36 insertions(+) diff --git a/contrib/diff-highlight/README b/contrib/diff-highlight/README index 885ff2f..ed12be1 100644 --- a/contrib/diff-highlight/README +++ b/contrib/diff-highlight/README @@ -89,6 +89,26 @@ newHighlight = "black #aaffaa" - +Max Hunk Config +--- + +By default, diff-highlight will not do any highlighting if either the +number of removed or added lines is greater than 20. This is because as +the hunk gets bigger, it becomes harder to jump back and forth between +the removed and added lines, and highlighting becomes less beneficial. + +You can change this default by setting the "diff-highlight.maxhunksize" +configuration. + +Example: + +- +# Increase the maximum diff-highlight to 30 +[diff-highlight] +maxhunksize = 30 +- + + Bugs diff --git a/contrib/diff-highlight/diff-highlight b/contrib/diff-highlight/diff-highlight index 46556fc..a005146 100755 --- a/contrib/diff-highlight/diff-highlight +++ b/contrib/diff-highlight/diff-highlight @@ -17,6 +17,8 @@ my @NEW_HIGHLIGHT = ( color_config('color.diff-highlight.newreset', $OLD_HIGHLIGHT[2]) ); +my $MAX_HUNK_SIZE = config('diff-highlight.maxhunksize', 20); + my $RESET = "\x1b[m"; my $COLOR = qr/\x1b\[[0-9;]*m/; my $BORING = qr/$COLOR|\s/; @@ -79,6 +81,13 @@ sub color_config { return length($s) ? $s : $default; } +# Also handle our own fallback here to be independent. +sub config { + my ($key, $default) = @_; + my $s = `git config --get $key 2>/dev/null`; + return length($s) ? $s : $default; +} + sub show_hunk { my ($a, $b) = @_; @@ -88,6 +97,13 @@ sub show_hunk { return; } + # Skip highlighting if the hunk gets bigger than the user configured + # limit. + if (@$a > $MAX_HUNK_SIZE || @$b > $MAX_HUNK_SIZE) { + print @$a, @$b; + return; + } + my @queue; match_and_highlight_pairs($a, 0, scalar @$a, $b, 0, scalar @$b, \@queue); print @queue; -- 2.6.0 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/4] diff-highlight: add `less -r` to cmd in README
As it is, the suggested command for trying out diff-highlight will just dump the whole git log output to the terminal. Let's pipe it through `less` so users aren't surprised on the first try. Signed-off-by: Jonathan Lebon <jonathan.le...@gmail.com> --- contrib/diff-highlight/README | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/contrib/diff-highlight/README b/contrib/diff-highlight/README index 836b97a..bbbfdda 100644 --- a/contrib/diff-highlight/README +++ b/contrib/diff-highlight/README @@ -44,9 +44,9 @@ Use You can try out the diff-highlight program with: -- -git log -p --color | /path/to/diff-highlight -- +-- +git log -p --color | /path/to/diff-highlight | less -r +-- If you want to use it all the time, drop it in your $PATH and put the following in your git configuration: -- 2.6.0 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/4] diff-highlight: factor out prefix/suffix functions
In preparation for the next patch, we factor out the functions for finding the common prefix and suffix between two lines. Signed-off-by: Jonathan Lebon <jonathan.le...@gmail.com> --- contrib/diff-highlight/diff-highlight | 98 --- 1 file changed, 56 insertions(+), 42 deletions(-) diff --git a/contrib/diff-highlight/diff-highlight b/contrib/diff-highlight/diff-highlight index ffefc31..a332f86 100755 --- a/contrib/diff-highlight/diff-highlight +++ b/contrib/diff-highlight/diff-highlight @@ -110,48 +110,8 @@ sub highlight_pair { my @a = split_line(shift); my @b = split_line(shift); - # Find common prefix, taking care to skip any ansi - # color codes. - my $seen_plusminus; - my ($pa, $pb) = (0, 0); - while ($pa < @a && $pb < @b) { - if ($a[$pa] =~ /$COLOR/) { - $pa++; - } - elsif ($b[$pb] =~ /$COLOR/) { - $pb++; - } - elsif ($a[$pa] eq $b[$pb]) { - $pa++; - $pb++; - } - elsif (!$seen_plusminus && $a[$pa] eq '-' && $b[$pb] eq '+') { - $seen_plusminus = 1; - $pa++; - $pb++; - } - else { - last; - } - } - - # Find common suffix, ignoring colors. - my ($sa, $sb) = ($#a, $#b); - while ($sa >= $pa && $sb >= $pb) { - if ($a[$sa] =~ /$COLOR/) { - $sa--; - } - elsif ($b[$sb] =~ /$COLOR/) { - $sb--; - } - elsif ($a[$sa] eq $b[$sb]) { - $sa--; - $sb--; - } - else { - last; - } - } + my ($pa, $pb) = find_common_prefix(\@a, \@b); + my ($sa, $sb) = find_common_suffix(\@a, $pa, \@b, $pb); if (is_pair_interesting(\@a, $pa, $sa, \@b, $pb, $sb)) { return highlight_line(\@a, $pa, $sa, \@OLD_HIGHLIGHT), @@ -173,6 +133,60 @@ sub split_line { split /($COLOR+)/; } +sub find_common_prefix { + my ($a, $b) = @_; + + # Take care to skip any ansi color codes. + my $seen_plusminus; + my ($pa, $pb) = (0, 0); + while ($pa < @$a && $pb < @$b) { + if ($a->[$pa] =~ /$COLOR/) { + $pa++; + } + elsif ($b->[$pb] =~ /$COLOR/) { + $pb++; + } + elsif ($a->[$pa] eq $b->[$pb]) { + $pa++; + $pb++; + } + elsif (!$seen_plusminus && $a->[$pa] eq '-' && $b->[$pb] eq '+') { + $seen_plusminus = 1; + $pa++; + $pb++; + } + else { + last; + } + } + + return $pa, $pb; +} + +sub find_common_suffix { + my ($a, $pa, $b, $pb) = @_; + + # Take care to skip any ansi color codes. + my ($sa, $sb) = ($#$a, $#$b); + while ($sa >= $pa && $sb >= $pb) { + if ($a->[$sa] =~ /$COLOR/) { + $sa--; + } + elsif ($b->[$sb] =~ /$COLOR/) { + $sb--; + } + elsif ($a->[$sa] eq $b->[$sb]) { + $sa--; + $sb--; + } + else { + last; + } + } + + return $sa, $sb; +} + sub highlight_line { my ($line, $prefix, $suffix, $theme) = @_; -- 2.6.0 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4] diff-highlight: match up lines before highlighting
As mentioned in the README, one of the current limitations of diff-highlight is that it only calculates highlights when the hunk contains the same number of removed lines as added lines. A further limitation upon this is that diff-highlight assumes that the first line removed matches the first line added, similarly with the second, the third, etc... As was demonstrated in the "Bugs" section of the README, this poses limitations since that assumption does not always give the best result. With this patch, we eliminate those limitations by trying to match up the removed and added lines before highlighting them. This is done using a recursive algorithm. Note that I did not bother with some common optimizations such as memoization since the usual number of removed/added lines in a single hunk are small. In practice, I have not felt any lag at all during paging. Signed-off-by: Jonathan Lebon <jonathan.le...@gmail.com> --- contrib/diff-highlight/README | 61 + contrib/diff-highlight/diff-highlight | 83 +-- 2 files changed, 70 insertions(+), 74 deletions(-) diff --git a/contrib/diff-highlight/README b/contrib/diff-highlight/README index bbbfdda..885ff2f 100644 --- a/contrib/diff-highlight/README +++ b/contrib/diff-highlight/README @@ -14,17 +14,7 @@ Instead, this script post-processes the line-oriented diff, finds pairs of lines, and highlights the differing segments. It's currently very simple and stupid about doing these tasks. In particular: - 1. It will only highlight hunks in which the number of removed and - added lines is the same, and it will pair lines within the hunk by - position (so the first removed line is compared to the first added - line, and so forth). This is simple and tends to work well in - practice. More complex changes don't highlight well, so we tend to - exclude them due to the "same number of removed and added lines" - restriction. Or even if we do try to highlight them, they end up - not highlighting because of our "don't highlight if the whole line - would be highlighted" rule. - - 2. It will find the common prefix and suffix of two lines, and + 1. It will find the common prefix and suffix of two lines, and consider everything in the middle to be "different". It could instead do a real diff of the characters between the two lines and find common subsequences. However, the point of the highlight is to @@ -142,52 +132,3 @@ heuristics. - which is less readable than the current output. - -2. The multi-line matching assumes that lines in the pre- and post-image - match by position. This is often the case, but can be fooled when a - line is removed from the top and a new one added at the bottom (or - vice versa). Unless the lines in the middle are also changed, diffs - will show this as two hunks, and it will not get highlighted at all - (which is good). But if the lines in the middle are changed, the - highlighting can be misleading. Here's a pathological case: - -- --one --two --three --four -+two 2 -+three 3 -+four 4 -+five 5 -- - - which gets highlighted as: - -- --one --t-{wo} --three --f-{our} -+two 2 -+t+{hree 3} -+four 4 -+f+{ive 5} -- - - because it matches "two" to "three 3", and so forth. It would be - nicer as: - -- --one --two --three --four -+two +{2} -+three +{3} -+four +{4} -+five 5 -- - - which would probably involve pre-matching the lines into pairs - according to some heuristic. diff --git a/contrib/diff-highlight/diff-highlight b/contrib/diff-highlight/diff-highlight index a332f86..46556fc 100755 --- a/contrib/diff-highlight/diff-highlight +++ b/contrib/diff-highlight/diff-highlight @@ -88,24 +88,79 @@ sub show_hunk { return; } - # If we have mismatched numbers of lines on each side, we could try to - # be clever and match up similar lines. But for now we are simple and - # stupid, and only handle multi-line hunks that remove and add the same - # number of lines. - if (@$a != @$b) { - print @$a, @$b; - return; - } - my @queue; - for (my $i = 0; $i < @$a; $i++) { - my ($rm, $add) = highlight_pair($a->[$i], $b->[$i]); - print $rm; - push @queue, $add; - } + match_and_highlight_pairs($a, 0, scalar @$a, $b, 0, scalar @$b, \@queue); print @queue; } +# Here, we try to be clever and match up similar lines. I.e. we try t
[PATCH 0/4] diff-highlight: make a few improvements
These patches bring a few improvements to the contrib/diff-highlight Perl script. The major improvement is done in patch 3/4, which improves diff-highlighting accuracy by implementing a recursive line matching algorithm. Please note that I have limited experience with Perl, so there may be better ways to do things. (Let me know if that is the case!) Jonathan Lebon (4): diff-highlight: add `less -r` to cmd in README diff-highlight: factor out prefix/suffix functions diff-highlight: match up lines before highlighting diff-highlight: add maxhunksize config option contrib/diff-highlight/README | 87 +--- contrib/diff-highlight/diff-highlight | 189 -- 2 files changed, 161 insertions(+), 115 deletions(-) -- 2.6.0 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html