Re: [PATCH] Improve contrib/diff-highlight to highlight unevenly-sized hunks
On Thu, 18 Jun 2015, Jeff King wrote: On Thu, Jun 18, 2015 at 04:14:19PM -0400, Patrick Palka wrote: in a test script becomes more clear. But some of the output is not so great. For instance, the very commit under discussion has a confusing and useless highlight. Or take a documentation patch like 5c31acfb, where I find the highlights actively distracting. We are saved a little by the "if the whole line is different, do not highlight at all" behavior of 097128d1bc. To fix the useless highlights for both evenly and unevenly sized hunks (like when all but a semicolon on a line changes), one can loosen the criterion for not highlighting from "do not highlight if 0% of the before and after lines are common between them" to, say, "do not highlight if less than 10% of the before and after lines are common between them". Then most of these useless highlights are gone for both evenly and unevenly sized hunks. Yeah, this is an idea I had considered but never actually experimented with. It does make some things better, but it also makes some a little worse. For example, in 8dbf3eb, the hunk: - const char *plain = diff_get_color(ecb->color_diff, - DIFF_PLAIN); + const char *context = diff_get_color(ecb->color_diff, +DIFF_CONTEXT); currently gets the plain/context change in the first line highlighted, as well as the DIFF_PLAIN/DIFF_CONTEXT in the second line. With a 10% limit, the second line isn't highlighted. That's correct by the heuristic, but it's a bit harder to read, because the highlight draws your eye to the first change, and it is easy to miss the second. Good example, this actually exposes a "bug" in the heuristic. Each line is around 15 characters long and the common affix ");" is 2 characters long, which is about 15% of 15. So the DIFF_PLAIN/DIFF_CONTEXT pair ought to be highlighted. The patch was unintentionally comparing the lengths of the common affixes against the length of the entire line, whitespace and color codes and all. In effect this meant that the 10% threshold was much higher. We should compare against the length of the non-boring parts of the whole line when determining the percentage in common. Attached is a revised patch. Still, I think this is probably a minority case, and it may be outweighed by the improvements. The "real" solution is to consider the hunk as a whole and do an LCS diff on it, which would show that yes, it's worth highlighting both of those spots, as they are a small percentage of the total hunk. Here is a patch that changes the criterion as mentioned. Testing this change on the documentation patch 5c31acfb, only two pairs of lines are highlighted instead of six. On my original patch, the useless highlight is gone. The useless semicolon-related highlights on e.g. commit 99a2cfb are gone. Nice, the ones like 99a2cfb are definitely wrong (I had though to fix them eventually by treating some punctuation as uninteresting, but I suspect the percentage heuristic covers that reasonably well in practice). Of course, these patches are both hacks but they seem to be surprisingly effective hacks especially when paired together. The whole script is a (surprisingly effective) hack. ;) So I dunno. IMHO this does more harm than good, and I would not want to use it myself. But it is somewhat a matter of taste; I am not opposed to making it a configurable option. That is something I can do :) Coupled with the 10%-threshold patch, I think it would be OK to include it unconditionally. So far we've just been diffing the two outputs and micro-analyzing them. The real test to me will be using it in practice and seeing if it's helpful or annoying. -- >8 -- Subject: [PATCH] diff-highlight: don't highlight lines that have little in common --- contrib/diff-highlight/diff-highlight | 21 + 1 file changed, 17 insertions(+), 4 deletions(-) diff --git a/contrib/diff-highlight/diff-highlight b/contrib/diff-highlight/diff-highlight index 85d2eb0..0cc2b60 100755 --- a/contrib/diff-highlight/diff-highlight +++ b/contrib/diff-highlight/diff-highlight @@ -218,8 +218,21 @@ sub is_pair_interesting { my $suffix_a = join('', @$a[($sa+1)..$#$a]); my $suffix_b = join('', @$b[($sb+1)..$#$b]); - return $prefix_a !~ /^$COLOR*-$BORING*$/ || - $prefix_b !~ /^$COLOR*\+$BORING*$/ || - $suffix_a !~ /^$BORING*$/ || - $suffix_b !~ /^$BORING*$/; + $prefix_a =~ s/^$COLOR*-$BORING*//; + $prefix_b =~ s/^$COLOR*\+$BORING*//; + $suffix_a =~ s/$BORING*$//; + $suffix_b =~ s/$BORING*$//; + + my $whole_a = join ('', @$a); + $whole_a =~ s/^$COLOR*-$BORING*//; + $whole_a =~ s/$BORING*$//; + +
Re: [PATCH] Improve contrib/diff-highlight to highlight unevenly-sized hunks
On Thu, Jun 18, 2015 at 5:23 PM, Jeff King wrote: > On Thu, Jun 18, 2015 at 04:45:05PM -0400, Jeff King wrote: > >> Still, I think this is probably a minority case, and it may be >> outweighed by the improvements. The "real" solution is to consider the >> hunk as a whole and do an LCS diff on it, which would show that yes, >> it's worth highlighting both of those spots, as they are a small >> percentage of the total hunk. > > I've been meaning to play with this for years, so I took the opportunity > to spend a little time on it. :) Cool! > > Below is a (slightly hacky) patch I came up with. It seems to work, and > produces really great output in some cases. For instance, in 99a2cfb, it > produces (I put highlighted bits in angle brackets): > > - cpy(peeled, ); > + cpy(<&>peeled, ); > > It also produces nonsense like: > > - sed peeled<[20]>; > + sed peeled; That's not even so bad. The diff of the change itself is... interesting. > > but I think that is simply because my splitting function is terrible (it > splits each byte, whereas we'd probably want to use whitespace and > punctuation, or something content-specific). I hope you can polish this. It definitely has potential. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Improve contrib/diff-highlight to highlight unevenly-sized hunks
On Thu, Jun 18, 2015 at 3:08 PM, Jeff King wrote: > On Thu, Jun 18, 2015 at 12:28:58PM -0400, Patrick Palka wrote: > >> By the way, what would it take to get something like this script into >> git proper? It is IMHO immensely useful even in its current form, yet >> because it's not baked into the application hardly anybody knows about >> it. > > I think if we were going to make it more official, it would make sense > to do it inside the diff code itself (i.e., not as a separate script), > and it might be reasonable at that point to actually do a "real" > character-based diff rather than the hacky prefix/suffix thing (or > possibly even integrate with the color-words patterns to find > syntactically interesting breaks). There is some discussion in the > "Bugs" section of contrib/diff-highlight/README. > > -Peff Thanks for the pointers. This is something I am interested in implementing (though not any time soon). I was actually in the process of familiarizing myself with the diff code before I discovered the existence of diff-highlight by accident. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Improve contrib/diff-highlight to highlight unevenly-sized hunks
On Thu, Jun 18, 2015 at 2:08 PM, Junio C Hamano wrote: > Patrick Palka writes: > >>> I have this nagging feeling that it is just as likely that two >>> uneven hunks align at the top as they align at the bottom, so while >>> this might not hurt it may not be the right approach for a better >>> solution, in the sense that when somebody really wants to do a >>> better solution, this change and the original code may need to be >>> ripped out and redone from scratch. >> >> Hmm, maybe. I stuck with assuming hunks are top-aligned because it >> required less code to implement :) > > Yeah, I understand that. > > If we will need to rip out only this change but keep the original in > order to implement a future better solution, then we might be better > off not having this change (if we anticipate such a better solution > to come reasonably soon), because it would make it more work for the > final improved solution. But if we need to rip out the original as > well as this change while we do so, then having this patch would not > make it more work, either. > > So as I said, I do not think it would hurt to have this as an > incremental improvement (albeit going in a possibly wrong > direction). > > Of course, it is a separate question if this change makes the output > worse, by comparing unmatched early parts of two hunks and making > nonsense highlight by calling highlight_pair() more often. As long > as that is not an issue, I am not opposed to this change, which was > what I meant to say by "this might not hurt". > That makes sense. The extra useless highlighting indeed is currently a problem but it may yet be worked around. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Improve contrib/diff-highlight to highlight unevenly-sized hunks
On Thu, 18 Jun 2015, Jeff King wrote: On Thu, Jun 18, 2015 at 11:08:16AM -0700, Junio C Hamano wrote: So as I said, I do not think it would hurt to have this as an incremental improvement (albeit going in a possibly wrong direction). Of course, it is a separate question if this change makes the output worse, by comparing unmatched early parts of two hunks and making nonsense highlight by calling highlight_pair() more often. As long as that is not an issue, I am not opposed to this change, which was what I meant to say by "this might not hurt". Yes, that is my big concern, and why I punted on mismatched-size hunks in the first place. Now that we have a patch, it is easy enough to "git log -p | diff-highlight" with the old and new versions to compare the results. It certainly does improve some cases. E.g.: -foo +foo && +bar in a test script becomes more clear. But some of the output is not so great. For instance, the very commit under discussion has a confusing and useless highlight. Or take a documentation patch like 5c31acfb, where I find the highlights actively distracting. We are saved a little by the "if the whole line is different, do not highlight at all" behavior of 097128d1bc. To fix the useless highlights for both evenly and unevenly sized hunks (like when all but a semicolon on a line changes), one can loosen the criterion for not highlighting from "do not highlight if 0% of the before and after lines are common between them" to, say, "do not highlight if less than 10% of the before and after lines are common between them". Then most of these useless highlights are gone for both evenly and unevenly sized hunks. Here is a patch that changes the criterion as mentioned. Testing this change on the documentation patch 5c31acfb, only two pairs of lines are highlighted instead of six. On my original patch, the useless highlight is gone. The useless semicolon-related highlights on e.g. commit 99a2cfb are gone. Ten percent is a modest threshold, and perhaps it should be increased when highlighting unevenly sized hunks and decreased when highlighting evenly sized hunks. Of course, these patches are both hacks but they seem to be surprisingly effective hacks especially when paired together. So I dunno. IMHO this does more harm than good, and I would not want to use it myself. But it is somewhat a matter of taste; I am not opposed to making it a configurable option. That is something I can do :) -Peff -- >8 -- Subject: [PATCH] diff-highlight: don't highlight lines that have little in common --- contrib/diff-highlight/diff-highlight | 13 + 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/contrib/diff-highlight/diff-highlight b/contrib/diff-highlight/diff-highlight index 85d2eb0..e4829ec 100755 --- a/contrib/diff-highlight/diff-highlight +++ b/contrib/diff-highlight/diff-highlight @@ -218,8 +218,13 @@ sub is_pair_interesting { my $suffix_a = join('', @$a[($sa+1)..$#$a]); my $suffix_b = join('', @$b[($sb+1)..$#$b]); - return $prefix_a !~ /^$COLOR*-$BORING*$/ || - $prefix_b !~ /^$COLOR*\+$BORING*$/ || - $suffix_a !~ /^$BORING*$/ || - $suffix_b !~ /^$BORING*$/; + $prefix_a =~ s/^$COLOR*-$BORING*//; + $prefix_b =~ s/^$COLOR*\+$BORING*//; + $suffix_a =~ s/$BORING*$//; + $suffix_b =~ s/$BORING*$//; + + # Only bother highlighting if at least 10% of each line is common among + # the lines. + return ((length($prefix_a)+length($suffix_a))*100 >= @$a*10) && + ((length($prefix_b)+length($suffix_b))*100 >= @$b*10); } -- 2.4.4.410.g43ed522.dirty -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Improve contrib/diff-highlight to highlight unevenly-sized hunks
On Thu, Jun 18, 2015 at 11:50 AM, Junio C Hamano wrote: > Patrick Palka writes: > >> Currently the diff-highlight script does not try to highlight hunks that >> have different numbers of removed/added lines. But we can be a little >> smarter than that, without introducing much magic and complexity. >> >> In the case of unevenly-sized hunks, we could still highlight the first >> few (lexicographical) add/remove pairs. It is not uncommon for hunks to >> have common "prefixes", and in such a case this change is very useful >> for spotting differences. >> >> Signed-off-by: Patrick Palka >> --- > > Patrick, "git shortlog --no-merges contrib/diff-highlight/" is your > friend to see who may be able to give you a good feedback. Sorry about that. I admit the sending of this patch was rushed for no good reason. > > Jeff, what do you think? > > I have this nagging feeling that it is just as likely that two > uneven hunks align at the top as they align at the bottom, so while > this might not hurt it may not be the right approach for a better > solution, in the sense that when somebody really wants to do a > better solution, this change and the original code may need to be > ripped out and redone from scratch. Hmm, maybe. I stuck with assuming hunks are top-aligned because it required less code to implement :) The benefits of a simple dumb solution like assuming hunks align at the top or bottom is that it remains very easy to visually match up each highlighted deleted slice with its corresponding highlighted added slice. If we start matching up similar lines or something like that then it seems we would have to mostly forsake this benefit. A stupid algorithm in this case is nice because its output is predictable. A direct improvement upon this patch that would not require redoing the whole script from scratch would be to first to calculate the highlighting assuming the hunk aligns at the top, then to calculate the highlighting assuming the hunk aligns at the bottom, and to pick out of the two the highlighting with the least "noise". Though we would still be out of luck if the hunk is more complicated than being top-aligned or bottom-aligned. By the way, what would it take to get something like this script into git proper? It is IMHO immensely useful even in its current form, yet because it's not baked into the application hardly anybody knows about it. > >> contrib/diff-highlight/diff-highlight | 26 +- >> 1 file changed, 17 insertions(+), 9 deletions(-) >> >> diff --git a/contrib/diff-highlight/diff-highlight >> b/contrib/diff-highlight/diff-highlight >> index ffefc31..0dfbebd 100755 >> --- a/contrib/diff-highlight/diff-highlight >> +++ b/contrib/diff-highlight/diff-highlight >> @@ -88,22 +88,30 @@ sub show_hunk { >> return; >> } >> >> - # If we have mismatched numbers of lines on each side, we could try to >> - # be clever and match up similar lines. But for now we are simple and >> - # stupid, and only handle multi-line hunks that remove and add the same >> - # number of lines. >> - if (@$a != @$b) { >> - print @$a, @$b; >> - return; >> - } >> + # We match up the first MIN(a, b) lines on each side. >> + my $c = @$a < @$b ? @$a : @$b; >> >> + # Highlight each pair, and print each removed line of that pair. >> my @queue; >> - for (my $i = 0; $i < @$a; $i++) { >> + for (my $i = 0; $i < $c; $i++) { >> my ($rm, $add) = highlight_pair($a->[$i], $b->[$i]); >> print $rm; >> push @queue, $add; >> } >> + >> + # Print the remaining unmatched removed lines of the hunk. >> + for (my $i = $c; $i < @$a; $i++) { >> + print $a->[$i]; >> + } >> + >> + # Print the added lines of each highlighted pair. >> print @queue; >> + >> + # Print the remaining unmatched added lines of the hunk. >> + for (my $i = $c; $i < @$b; $i++) { >> + print $b->[$i]; >> + } >> + >> } >> >> sub highlight_pair { -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Improve contrib/diff-highlight to highlight unevenly-sized hunks
Currently the diff-highlight script does not try to highlight hunks that have different numbers of removed/added lines. But we can be a little smarter than that, without introducing much magic and complexity. In the case of unevenly-sized hunks, we could still highlight the first few (lexicographical) add/remove pairs. It is not uncommon for hunks to have common "prefixes", and in such a case this change is very useful for spotting differences. Signed-off-by: Patrick Palka --- contrib/diff-highlight/diff-highlight | 26 +- 1 file changed, 17 insertions(+), 9 deletions(-) diff --git a/contrib/diff-highlight/diff-highlight b/contrib/diff-highlight/diff-highlight index ffefc31..0dfbebd 100755 --- a/contrib/diff-highlight/diff-highlight +++ b/contrib/diff-highlight/diff-highlight @@ -88,22 +88,30 @@ sub show_hunk { return; } - # If we have mismatched numbers of lines on each side, we could try to - # be clever and match up similar lines. But for now we are simple and - # stupid, and only handle multi-line hunks that remove and add the same - # number of lines. - if (@$a != @$b) { - print @$a, @$b; - return; - } + # We match up the first MIN(a, b) lines on each side. + my $c = @$a < @$b ? @$a : @$b; + # Highlight each pair, and print each removed line of that pair. my @queue; - for (my $i = 0; $i < @$a; $i++) { + for (my $i = 0; $i < $c; $i++) { my ($rm, $add) = highlight_pair($a->[$i], $b->[$i]); print $rm; push @queue, $add; } + + # Print the remaining unmatched removed lines of the hunk. + for (my $i = $c; $i < @$a; $i++) { + print $a->[$i]; + } + + # Print the added lines of each highlighted pair. print @queue; + + # Print the remaining unmatched added lines of the hunk. + for (my $i = $c; $i < @$b; $i++) { + print $b->[$i]; + } + } sub highlight_pair { -- 2.4.3.413.ga5fe668 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] reset: optionally setup worktree and refresh index on --mixed
On Sat, Feb 15, 2014 at 9:28 PM, Nguyễn Thái Ngọc Duy wrote: > Refreshing index requires work tree. So we have to options: always set > up work tree (and refuse to reset if failing to do so), or make > refreshing index optional. > > As refreshing index is not the main task, it makes more sense to make > it optional. > > Reported-by: Patrick Palka > Signed-off-by: Nguyễn Thái Ngọc Duy Thanks! I can confirm that this change fixes my use case. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
git-reset does not seem to respect GIT_WORK_TREE
Hi everyone, I noticed that git-reset does not seem to respect GIT_WORK_TREE. Here is a simplified test case: $ mkdir src_dir && cd src_dir $ git init $ touch A && git add A && git commit -m "Dummy commit." $ mkdir ../build_dir && cd ../build_dir $ export GIT_WORK_TREE=../src_dir $ export GIT_DIR=../src_dir/.git $ git reset Unstaged changes after reset: D A The final command "git reset" erroneously suggests that the file "A" does not exist in the working tree. Does anybody know why git-reset behaves this way? Thanks, Patrick -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Documentation: improve the example of overriding LESS via core.pager
You can override an option set in the LESS variable by simply prefixing the command line option with `-+`. This is more robust than the previous example if the default LESS options are to ever change. Signed-off-by: Patrick Palka --- Documentation/config.txt |8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index 11f320b..9a0544c 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -538,14 +538,14 @@ core.pager:: `LESS` variable to some other value. Alternately, these settings can be overridden on a project or global basis by setting the `core.pager` option. - Setting `core.pager` has no affect on the `LESS` + Setting `core.pager` has no effect on the `LESS` environment variable behaviour above, so if you want to override git's default settings this way, you need to be explicit. For example, to disable the S option in a backward compatible manner, set `core.pager` - to `less -+$LESS -FRX`. This will be passed to the - shell by git, which will translate the final command to - `LESS=FRSX less -+FRSX -FRX`. + to `less -+S`. This will be passed to the shell by + git, which will translate the final command to + `LESS=FRSX less -+S`. core.whitespace:: A comma separated list of common whitespace problems to -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html