Re: [PATCH 3/4] diff-highlight: match up lines before highlighting

2015-11-16 Thread Jonathan Lebon
On Tue, Nov 3, 2015 at 5:03 PM, Jeff King  wrote:
> Your is _much_ slower. I get:
>
>   real0m25.538s
>   user0m25.420s
>   sys 0m0.120s
>
> for the old versus:
>
>   real2m3.580s
>   user2m3.548s
>   sys 0m0.156s

Thanks for investigating and trying it out. I got the same results
here as well.

> for your series. In an interactive setting, the latency may not be that
> noticeable, but if you are digging far into history (e.g., "git log -p",
> then using "/" in less to search for a commit or some test), I suspect
> it would be very noticeable.

Agreed.

> I was thinking there was some low-hanging fruit in memoizing the
> calculations, but even the prefix/suffix computation is pairwise. I'm
> not really sure how to make this much faster.

I gave memoization a try to see if it could improve the situation. I also
lowered maxhunksize to 10. Doing `git log -p` on git.git went from 2m31
to 2m11. So I think it would require a whole other approach overall.

> As for the output itself, the diff between the two looks promising. The
> first several cases I looked at ar strict improvements. Some of them are
> kind of weird, especially in English text.

Yes, I'm very happy with the improvements and run with these patches all
the time for now.

> In the other thread I mentioned earlier, the solution I cooked up was
> dropping highlighting entirely for hunks over a certain percentage of
> highlighting. I wonder if we could do something similar here (e.g.,
> don't match lines where more than 50% of the line would be highlighted).

I looked over but haven't tested the patches in the other thread yet. But
overall, the LCS definitely looks promising. I'm hoping to find some time
to have a more serious go at it and maybe pick it up where you left off.

>
> -Peff

Thanks again for reviewing these patches and apologies for the delayed
reply.

Jonathan
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] diff-highlight: add `less -r` to cmd in README

2015-11-02 Thread Jonathan Lebon
On Mon, Nov 2, 2015 at 9:41 PM, Junio C Hamano <gits...@pobox.com> wrote:
>
> Jonathan Lebon <jonathan.le...@gmail.com> writes:
>
> > As it is, the suggested command for trying out diff-highlight will just
> > dump the whole git log output to the terminal. Let's pipe it through
> > `less` so users aren't surprised on the first try.
>
> That justifies the "less" part but not your choice of "-r".
>
> I am assuming that you are telling "less" not to show the ANSI
> "color" escape sequences using the caret notation with "-r", which
> is a very natural and sensible thing to do when using `highlight`.
>
> But if that is the case, you don't want "-r" (raw control chars for
> everything).  You would want to say "-R", I think.

Ahh thanks, that makes sense. I will update this for v2 tomorrow.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4] diff-highlight: add maxhunksize config option

2015-11-02 Thread Jonathan Lebon
As the size of the hunk gets bigger, it becomes harder to jump back and
forth between the removed and added lines, and highlighting becomes less
beneficial. We add a new config option called

diff-highlight.maxhunksize

which controls the maximum size of the hunk allowed for which
highlighting is still performed. The default value is set to 20.

Signed-off-by: Jonathan Lebon <jonathan.le...@gmail.com>
---
 contrib/diff-highlight/README | 20 
 contrib/diff-highlight/diff-highlight | 16 
 2 files changed, 36 insertions(+)

diff --git a/contrib/diff-highlight/README b/contrib/diff-highlight/README
index 885ff2f..ed12be1 100644
--- a/contrib/diff-highlight/README
+++ b/contrib/diff-highlight/README
@@ -89,6 +89,26 @@ newHighlight = "black #aaffaa"
 -
 
 
+Max Hunk Config
+---
+
+By default, diff-highlight will not do any highlighting if either the
+number of removed or added lines is greater than 20. This is because as
+the hunk gets bigger, it becomes harder to jump back and forth between
+the removed and added lines, and highlighting becomes less beneficial.
+
+You can change this default by setting the "diff-highlight.maxhunksize"
+configuration.
+
+Example:
+
+-
+# Increase the maximum diff-highlight to 30
+[diff-highlight]
+maxhunksize = 30
+-
+
+
 Bugs
 
 
diff --git a/contrib/diff-highlight/diff-highlight 
b/contrib/diff-highlight/diff-highlight
index 46556fc..a005146 100755
--- a/contrib/diff-highlight/diff-highlight
+++ b/contrib/diff-highlight/diff-highlight
@@ -17,6 +17,8 @@ my @NEW_HIGHLIGHT = (
color_config('color.diff-highlight.newreset', $OLD_HIGHLIGHT[2])
 );
 
+my $MAX_HUNK_SIZE = config('diff-highlight.maxhunksize', 20);
+
 my $RESET = "\x1b[m";
 my $COLOR = qr/\x1b\[[0-9;]*m/;
 my $BORING = qr/$COLOR|\s/;
@@ -79,6 +81,13 @@ sub color_config {
return length($s) ? $s : $default;
 }
 
+# Also handle our own fallback here to be independent.
+sub config {
+   my ($key, $default) = @_;
+   my $s = `git config --get $key 2>/dev/null`;
+   return length($s) ? $s : $default;
+}
+
 sub show_hunk {
my ($a, $b) = @_;
 
@@ -88,6 +97,13 @@ sub show_hunk {
return;
}
 
+   # Skip highlighting if the hunk gets bigger than the user configured
+   # limit.
+   if (@$a > $MAX_HUNK_SIZE || @$b > $MAX_HUNK_SIZE) {
+   print @$a, @$b;
+   return;
+   }
+
my @queue;
match_and_highlight_pairs($a, 0, scalar @$a, $b, 0, scalar @$b, 
\@queue);
print @queue;
-- 
2.6.0

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] diff-highlight: add `less -r` to cmd in README

2015-11-02 Thread Jonathan Lebon
As it is, the suggested command for trying out diff-highlight will just
dump the whole git log output to the terminal. Let's pipe it through
`less` so users aren't surprised on the first try.

Signed-off-by: Jonathan Lebon <jonathan.le...@gmail.com>
---
 contrib/diff-highlight/README | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/contrib/diff-highlight/README b/contrib/diff-highlight/README
index 836b97a..bbbfdda 100644
--- a/contrib/diff-highlight/README
+++ b/contrib/diff-highlight/README
@@ -44,9 +44,9 @@ Use
 
 You can try out the diff-highlight program with:
 
--
-git log -p --color | /path/to/diff-highlight
--
+--
+git log -p --color | /path/to/diff-highlight | less -r
+--
 
 If you want to use it all the time, drop it in your $PATH and put the
 following in your git configuration:
-- 
2.6.0

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/4] diff-highlight: factor out prefix/suffix functions

2015-11-02 Thread Jonathan Lebon
In preparation for the next patch, we factor out the functions for
finding the common prefix and suffix between two lines.

Signed-off-by: Jonathan Lebon <jonathan.le...@gmail.com>
---
 contrib/diff-highlight/diff-highlight | 98 ---
 1 file changed, 56 insertions(+), 42 deletions(-)

diff --git a/contrib/diff-highlight/diff-highlight 
b/contrib/diff-highlight/diff-highlight
index ffefc31..a332f86 100755
--- a/contrib/diff-highlight/diff-highlight
+++ b/contrib/diff-highlight/diff-highlight
@@ -110,48 +110,8 @@ sub highlight_pair {
my @a = split_line(shift);
my @b = split_line(shift);
 
-   # Find common prefix, taking care to skip any ansi
-   # color codes.
-   my $seen_plusminus;
-   my ($pa, $pb) = (0, 0);
-   while ($pa < @a && $pb < @b) {
-   if ($a[$pa] =~ /$COLOR/) {
-   $pa++;
-   }
-   elsif ($b[$pb] =~ /$COLOR/) {
-   $pb++;
-   }
-   elsif ($a[$pa] eq $b[$pb]) {
-   $pa++;
-   $pb++;
-   }
-   elsif (!$seen_plusminus && $a[$pa] eq '-' && $b[$pb] eq '+') {
-   $seen_plusminus = 1;
-   $pa++;
-   $pb++;
-   }
-   else {
-   last;
-   }
-   }
-
-   # Find common suffix, ignoring colors.
-   my ($sa, $sb) = ($#a, $#b);
-   while ($sa >= $pa && $sb >= $pb) {
-   if ($a[$sa] =~ /$COLOR/) {
-   $sa--;
-   }
-   elsif ($b[$sb] =~ /$COLOR/) {
-   $sb--;
-   }
-   elsif ($a[$sa] eq $b[$sb]) {
-   $sa--;
-   $sb--;
-   }
-   else {
-   last;
-   }
-   }
+   my ($pa, $pb) = find_common_prefix(\@a, \@b);
+   my ($sa, $sb) = find_common_suffix(\@a, $pa, \@b, $pb);
 
if (is_pair_interesting(\@a, $pa, $sa, \@b, $pb, $sb)) {
return highlight_line(\@a, $pa, $sa, \@OLD_HIGHLIGHT),
@@ -173,6 +133,60 @@ sub split_line {
split /($COLOR+)/;
 }
 
+sub find_common_prefix {
+   my ($a, $b) = @_;
+
+   # Take care to skip any ansi color codes.
+   my $seen_plusminus;
+   my ($pa, $pb) = (0, 0);
+   while ($pa < @$a && $pb < @$b) {
+   if ($a->[$pa] =~ /$COLOR/) {
+   $pa++;
+   }
+   elsif ($b->[$pb] =~ /$COLOR/) {
+   $pb++;
+   }
+   elsif ($a->[$pa] eq $b->[$pb]) {
+   $pa++;
+   $pb++;
+   }
+   elsif (!$seen_plusminus && $a->[$pa] eq '-' && $b->[$pb] eq 
'+') {
+   $seen_plusminus = 1;
+   $pa++;
+   $pb++;
+   }
+   else {
+   last;
+   }
+   }
+
+   return $pa, $pb;
+}
+
+sub find_common_suffix {
+   my ($a, $pa, $b, $pb) = @_;
+
+   # Take care to skip any ansi color codes.
+   my ($sa, $sb) = ($#$a, $#$b);
+   while ($sa >= $pa && $sb >= $pb) {
+   if ($a->[$sa] =~ /$COLOR/) {
+   $sa--;
+   }
+   elsif ($b->[$sb] =~ /$COLOR/) {
+   $sb--;
+   }
+   elsif ($a->[$sa] eq $b->[$sb]) {
+   $sa--;
+   $sb--;
+   }
+   else {
+   last;
+   }
+   }
+
+   return $sa, $sb;
+}
+
 sub highlight_line {
my ($line, $prefix, $suffix, $theme) = @_;
 
-- 
2.6.0

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/4] diff-highlight: match up lines before highlighting

2015-11-02 Thread Jonathan Lebon
As mentioned in the README, one of the current limitations of
diff-highlight is that it only calculates highlights when the hunk
contains the same number of removed lines as added lines.

A further limitation upon this is that diff-highlight assumes that the
first line removed matches the first line added, similarly with the
second, the third, etc... As was demonstrated in the "Bugs" section of
the README, this poses limitations since that assumption does not always
give the best result.

With this patch, we eliminate those limitations by trying to match up
the removed and added lines before highlighting them. This is done using
a recursive algorithm.

Note that I did not bother with some common optimizations such as
memoization since the usual number of removed/added lines in a single
hunk are small. In practice, I have not felt any lag at all during
paging.

Signed-off-by: Jonathan Lebon <jonathan.le...@gmail.com>
---
 contrib/diff-highlight/README | 61 +
 contrib/diff-highlight/diff-highlight | 83 +--
 2 files changed, 70 insertions(+), 74 deletions(-)

diff --git a/contrib/diff-highlight/README b/contrib/diff-highlight/README
index bbbfdda..885ff2f 100644
--- a/contrib/diff-highlight/README
+++ b/contrib/diff-highlight/README
@@ -14,17 +14,7 @@ Instead, this script post-processes the line-oriented diff, 
finds pairs
 of lines, and highlights the differing segments.  It's currently very
 simple and stupid about doing these tasks. In particular:
 
-  1. It will only highlight hunks in which the number of removed and
- added lines is the same, and it will pair lines within the hunk by
- position (so the first removed line is compared to the first added
- line, and so forth). This is simple and tends to work well in
- practice. More complex changes don't highlight well, so we tend to
- exclude them due to the "same number of removed and added lines"
- restriction. Or even if we do try to highlight them, they end up
- not highlighting because of our "don't highlight if the whole line
- would be highlighted" rule.
-
-  2. It will find the common prefix and suffix of two lines, and
+  1. It will find the common prefix and suffix of two lines, and
  consider everything in the middle to be "different". It could
  instead do a real diff of the characters between the two lines and
  find common subsequences. However, the point of the highlight is to
@@ -142,52 +132,3 @@ heuristics.
 -
 
which is less readable than the current output.
-
-2. The multi-line matching assumes that lines in the pre- and post-image
-   match by position. This is often the case, but can be fooled when a
-   line is removed from the top and a new one added at the bottom (or
-   vice versa). Unless the lines in the middle are also changed, diffs
-   will show this as two hunks, and it will not get highlighted at all
-   (which is good). But if the lines in the middle are changed, the
-   highlighting can be misleading. Here's a pathological case:
-
--
--one
--two
--three
--four
-+two 2
-+three 3
-+four 4
-+five 5
--
-
-   which gets highlighted as:
-
--
--one
--t-{wo}
--three
--f-{our}
-+two 2
-+t+{hree 3}
-+four 4
-+f+{ive 5}
--
-
-   because it matches "two" to "three 3", and so forth. It would be
-   nicer as:
-
--
--one
--two
--three
--four
-+two +{2}
-+three +{3}
-+four +{4}
-+five 5
--
-
-   which would probably involve pre-matching the lines into pairs
-   according to some heuristic.
diff --git a/contrib/diff-highlight/diff-highlight 
b/contrib/diff-highlight/diff-highlight
index a332f86..46556fc 100755
--- a/contrib/diff-highlight/diff-highlight
+++ b/contrib/diff-highlight/diff-highlight
@@ -88,24 +88,79 @@ sub show_hunk {
return;
}
 
-   # If we have mismatched numbers of lines on each side, we could try to
-   # be clever and match up similar lines. But for now we are simple and
-   # stupid, and only handle multi-line hunks that remove and add the same
-   # number of lines.
-   if (@$a != @$b) {
-   print @$a, @$b;
-   return;
-   }
-
my @queue;
-   for (my $i = 0; $i < @$a; $i++) {
-   my ($rm, $add) = highlight_pair($a->[$i], $b->[$i]);
-   print $rm;
-   push @queue, $add;
-   }
+   match_and_highlight_pairs($a, 0, scalar @$a, $b, 0, scalar @$b, 
\@queue);
print @queue;
 }
 
+# Here, we try to be clever and match up similar lines. I.e. we try t

[PATCH 0/4] diff-highlight: make a few improvements

2015-11-02 Thread Jonathan Lebon
These patches bring a few improvements to the contrib/diff-highlight
Perl script. The major improvement is done in patch 3/4, which improves
diff-highlighting accuracy by implementing a recursive line matching
algorithm.

Please note that I have limited experience with Perl, so there may be
better ways to do things. (Let me know if that is the case!)

Jonathan Lebon (4):
  diff-highlight: add `less -r` to cmd in README
  diff-highlight: factor out prefix/suffix functions
  diff-highlight: match up lines before highlighting
  diff-highlight: add maxhunksize config option

 contrib/diff-highlight/README |  87 +---
 contrib/diff-highlight/diff-highlight | 189 --
 2 files changed, 161 insertions(+), 115 deletions(-)

-- 
2.6.0

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html