Difficulty with parsing colorized diff output

2018-12-07 Thread George King
Hello, I have a rather elaborate diff highlighter that I have implemented as a 
post-processor to regular git output. I am writing to discuss some difficult 
aspects of git diff's color output that I am observing with version 2.19.2. 
This is not a regression report; I am trying to implement a new feature and am 
stymied by these details.

My goal is to detect SGR color sequences, e.g. '\x1b[32m', that exist in the 
source text, and have my highlighter print escaped representations of those. 
For example, I have checked in files that are expected test outputs for tools 
that emit color codes, and diffs of those get very confusing.

Figuring out which color codes are from the source text and which were added by 
git is proving very difficult. The obvious solution is to turn git diff 
coloring off, but as far as I can see this also turns off all coloring for 
logs, which is undesirable.

Then I tried to remove just the color codes that git adds to the diff. This 
almost works, but there are some irregularities. Most lines begin with a 
style/color code and end with a reset code, which would be a perfect indicator 
that git is using colors. However:

* Context lines do not begin with reset code, but do end with a reset code. It 
would be preferable in my opinion if they had both (like every other line), or 
none at all.

* Added lines have excess codes after the plus sign. The entire prefix is, 
`\x1b[32m+\x1b[m\x1b[32m` translating to GREEN PLUS RESET GREEN. Emitting codes 
after the plus sign makes the parsing more complex and idiosyncratic.


In summary, I would like to suggest the following improvements:

* Remove the excess codes after the plus sign.

* When git diff is adding colors, ensure that every line begins with an SGR 
code and ends with the RESET code.

* Add a config feature to turn on log coloring while leaving diff coloring off.


I would be willing to attempt a fix for this myself, but I'd like to hear what 
the maintainers think first, and would appreciate any hints as to where I 
should start looking in the code base.


If anyone is curious about the implementation it is called `same-same` and 
lives here: https://github.com/gwk/pithy/blob/master/pithy/bin/same_same.py

I configure it like this in .gitconfig:

[core]
  pager = same-same | LESSANSIENDCHARS=mK less --RAW-CONTROL-CHARS
[interactive]
  diffFilter = same-same -interactive | LESSANSIENDCHARS=mK less 
--RAW-CONTROL-CHARS


Thank you,
George



Re: Git diff --no-index --no-prefix output loses leading slash in paths

2018-06-18 Thread George King
This is a feature request; sorry for the confusion. My guess is that it's a 
corner case that was not considered due to the default prefixing.


> On Jun 18, 2018, at 10:59 AM, Duy Nguyen  wrote:
> 
> On Mon, Jun 18, 2018 at 4:36 PM George King  wrote:
>> 
>> As of 2.17.1, `git diff --no-index --no-prefix relative/path /absolute/path` 
>> produces the following:
> 
> I checked as far back as v1.4.0 and git behaved the same way too. What
> version did it work for you? Or is this not a regression, rather a
> feature request?
> 
>> diff --git relative/path absolute/path
>> index XXX..YYY ZZ
>> --- relative/path
>> +++ absolute/path
>> 
>> The leading slash on `absolute/path` is lost. This is unfortunate; my use 
>> case is a diff highlighter that parses and reformats paths so that code 
>> editors can autodetect them and link to the files.
>> 
>> Would the maintainers please consider fixing the output to preserve absolute 
>> paths?
>> 
>> Thank you,
>> George King
>> 
> -- 
> Duy



Git diff --no-index --no-prefix output loses leading slash in paths

2018-06-18 Thread George King
As of 2.17.1, `git diff --no-index --no-prefix relative/path /absolute/path` 
produces the following:

diff --git relative/path absolute/path
index XXX..YYY ZZ
--- relative/path
+++ absolute/path

The leading slash on `absolute/path` is lost. This is unfortunate; my use case 
is a diff highlighter that parses and reformats paths so that code editors can 
autodetect them and link to the files. 

Would the maintainers please consider fixing the output to preserve absolute 
paths?

Thank you,
George King