Re: [PATCHv2 12/20] submodule.c: convert show_submodule_summary to use emit_line_fmt

2017-05-20 Thread Stefan Beller
On Fri, May 19, 2017 at 9:50 PM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
>> That could be added in ws.c:ws_check_emit, as these certain words
>> are similar to coloring whitespace.
>
> I actually was envisioning of highlighting a part of a line, like
>
> -Very poor SCM
> +Very nice SCM
>
> which would be done by finding semi-matching removed and added lines
> in the same hunk (i.e. local buffering) and makes a coloring decision.
> That does not have any place in ws.c.

Yes such a feature would not want to be in ws.c

For the problem above, we're still fine with the given data structures IMO.
Though it may hint at bad naming of the struct 'buffered_patch_line'
as it is not a complete line.

Assuming the example above, running "show --word-diff" would produce
the following "buffered_patch_lines"

{show_prefix=1, sign='\0', line="-Very ", set=NULL, reset=NULL}
{show_prefix=0, sign='\0', line="{- poor}", set='red', reset='normal'}
{show_prefix=0, sign='\0', line="{+ nice}", set='green', reset='normal'}
{show_prefix=0, sign='\0', line=" SCM\n", set=NULL, reset=NULL}

so in the future, we may want to produce

{show_prefix=1, sign='-', line="Very ", set=NULL, reset=NULL}
{show_prefix=0, sign='\0', line="poor", set='red', reset='normal'}
{show_prefix=0, sign='\0', line=" SCM\n", set=NULL, reset=NULL}
{show_prefix=1, sign='+', line="Very ", set=NULL, reset=NULL}
{show_prefix=0, sign='\0', line="nice", set='gren', reset='normal'}
{show_prefix=0, sign='\0', line=" SCM\n", set=NULL, reset=NULL}

instead.

I think the data structure can be used as-is, but is just mis-named.
I'll fix that in a resend.

The algorithm to highlight what changed on a word level after a hunk
would have to be put into the current hunk finding
("mark_color_as_moved"), and then split up the actual lines into pieces
of a line and add different colors like we see above.

>
>>> Having said that, we need to start somewhere, and I think it is a
>>> reasonable first-cut attempt to work on top of the textual output
>>> like this series does (IOW, while I do agree with the NEEDSWORK and
>>> the way this series currently does things must be revamped in the
>>> longer term, I do not think we should wait until that happens to
>>> start playing with this topic).
>>
>> Ok. I share a similar reaction to submodule diffs that we discuss above
>> and word coloring, that Jonathan Tan brought up off list.
>>
>> Both of them are broken in this implementation, but the NEEDSWORK
>> would hint at how to fix them.
>
> Yes, but if NEEDSWORK has to say "the current hack is working at a
> wrong level, we need to do all of this before producing textual
> diffs that are passed to the layer that colors lines", that wouldn't
> help that much as a hint X-<.

For the word coloring, I think we'd just need better post processing.

For cros-submodule move detection, we may want to wait in Brandons
work to be able to run submodule stuff in-process and then we
have the lines buffered directly there, so we can operate on them as well.

I agree that "NEEDSWORK: the current hack is working at a wrong level"
is not useful. But I thought we're in agreement that it is not? I must have
misunderstood parts of what you were saying.

By saying it could go to ws.c in my reply I rather meant:
it could go into some post-processing function of the "buffered_patch_line"
struct. Currently we only have ws_emit_line that does it, but we can do
it either similarly or just split up one buffered_patch_line into more
than one and color up each individually.

This would also be possible for us now, too. Instead of running it through
ws_emit_line we could split up the line and color each piece differently.
However that could be a problem in the algorithm to find similar hunks
(as we do have different rules for added and deleted text).

Thanks,
Stefan


Re: [PATCHv2 12/20] submodule.c: convert show_submodule_summary to use emit_line_fmt

2017-05-19 Thread Junio C Hamano
Stefan Beller  writes:

> That could be added in ws.c:ws_check_emit, as these certain words
> are similar to coloring whitespace.

I actually was envisioning of highlighting a part of a line, like

-Very poor SCM
+Very nice SCM

which would be done by finding semi-matching removed and added lines
in the same hunk (i.e. local buffering) and makes a coloring decision.
That does not have any place in ws.c.

>> Having said that, we need to start somewhere, and I think it is a
>> reasonable first-cut attempt to work on top of the textual output
>> like this series does (IOW, while I do agree with the NEEDSWORK and
>> the way this series currently does things must be revamped in the
>> longer term, I do not think we should wait until that happens to
>> start playing with this topic).
>
> Ok. I share a similar reaction to submodule diffs that we discuss above
> and word coloring, that Jonathan Tan brought up off list.
>
> Both of them are broken in this implementation, but the NEEDSWORK
> would hint at how to fix them.

Yes, but if NEEDSWORK has to say "the current hack is working at a
wrong level, we need to do all of this before producing textual
diffs that are passed to the layer that colors lines", that wouldn't
help that much as a hint X-<.


Re: [PATCHv2 12/20] submodule.c: convert show_submodule_summary to use emit_line_fmt

2017-05-18 Thread Stefan Beller
On Wed, May 17, 2017 at 8:25 PM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
 +static void show_submodule_header(struct diff_options *o, const char 
 *path,
   struct object_id *one, struct object_id *two,
   unsigned dirty_submodule, const char *meta,
   const char *reset,
>>> ...
>>> How does capturing these lines help moved line detection, by the
>>> way?  These must never be matched with any other added or removed
>>> line in the real patch output.
>>
>> Why?
>
> What are buffered are not patch text, but informational text like
> "Submodule X contains untracked content", etc.  When a text file is
> modified elsewhere and lost a line that happened to say the same
> contents, we do not want to consider that such a line was moved to
> where a submodule had an untracked file.
>
> I have a suspicion that the two-pass buffering is done at too high a
> level in this series.  Doesn't the code (I haven't reached the end
> of the series) update emit_line() to buffer the patch text and these
> non-patch text with all the coloring and resetting sequences?
> Because the "ah, this old line removed corresponds to that new line
> that appears elsewhere?" logic do not want to see these color/reset
> sequence, the buffering code needs to become quite specific to how
> the current diff code is colored (e.g. a line must be painted in a
> single color and have reset at the end) and makes future change to
> color things differently almost impossible (e.g. imagine how you
> would add a "feature" that paints certain words on added lines
> differently?).

That could be added in ws.c:ws_check_emit, as these certain words
are similar to coloring whitespace.

It depends on the precedence of such a future feature, is the move
detection or the word highlighting more important to keep its color?

> Ahh, yes, I see NEEDSWORK comment in patch 19/20.
>
> Yes, I agree that this code really should be working in terms of
> offsets into pre/post images when finding matching changes, which
> probably should happen without letting fn_out_consume produce fully
> colored textual diff output in the first pass.  For the purpose of
> "moved lines detection", the logic to match a stretch of preimage
> lines with postimage lines do not want to bother with "--- a/$path"
> headers, and it does not want to care if a line that begins with '+'
> needs to be added by calling emit_add_line() that knows how to check
> ws errors or the payload needs to be painted in green.  After the
> first pass determines which added lines are not true addition but
> merely moved, the second pass would know how that '+' line needs to
> be painted a lot better (e.g. it may not be painted in green).
> Letting fn_out_consume() call emit_add_line() only to compute
> information (e.g. "'+'? ok, green") that the first pass does not
> want to see and the second pass will compute better is probably not
> a good longer term direction to go in.
>
> Having said that, we need to start somewhere, and I think it is a
> reasonable first-cut attempt to work on top of the textual output
> like this series does (IOW, while I do agree with the NEEDSWORK and
> the way this series currently does things must be revamped in the
> longer term, I do not think we should wait until that happens to
> start playing with this topic).

Ok. I share a similar reaction to submodule diffs that we discuss above
and word coloring, that Jonathan Tan brought up off list.

Both of them are broken in this implementation, but the NEEDSWORK
would hint at how to fix them.

For word coloring, we'd invent a new state BPL_EMIT_WITH_BRACES,
that would only store the word and at output time we'd have to add
sign, braces, colors. Then a block movement detection is possible.
(and this would also work with offset/len into the files longer term)

For the submodule diffs, I am really looking forward how Brandons
current work is coming along to have a repository struct such that we
can process submodules in the same process. For this diffing the
repo object would need to learn about the attribute system of
the submodules, such that we can obtain the whitespace coloring
rules, as well as the config (submodule may be configured to use
different colors for diffs).


Re: [PATCHv2 12/20] submodule.c: convert show_submodule_summary to use emit_line_fmt

2017-05-17 Thread Junio C Hamano
Stefan Beller  writes:

>>> +static void show_submodule_header(struct diff_options *o, const char *path,
>>>   struct object_id *one, struct object_id *two,
>>>   unsigned dirty_submodule, const char *meta,
>>>   const char *reset,
>> ...
>> How does capturing these lines help moved line detection, by the
>> way?  These must never be matched with any other added or removed
>> line in the real patch output.
>
> Why?

What are buffered are not patch text, but informational text like
"Submodule X contains untracked content", etc.  When a text file is
modified elsewhere and lost a line that happened to say the same
contents, we do not want to consider that such a line was moved to
where a submodule had an untracked file.

I have a suspicion that the two-pass buffering is done at too high a
level in this series.  Doesn't the code (I haven't reached the end
of the series) update emit_line() to buffer the patch text and these
non-patch text with all the coloring and resetting sequences?
Because the "ah, this old line removed corresponds to that new line
that appears elsewhere?" logic do not want to see these color/reset
sequence, the buffering code needs to become quite specific to how
the current diff code is colored (e.g. a line must be painted in a
single color and have reset at the end) and makes future change to
color things differently almost impossible (e.g. imagine how you
would add a "feature" that paints certain words on added lines
differently?).

Ahh, yes, I see NEEDSWORK comment in patch 19/20.  

Yes, I agree that this code really should be working in terms of
offsets into pre/post images when finding matching changes, which
probably should happen without letting fn_out_consume produce fully
colored textual diff output in the first pass.  For the purpose of
"moved lines detection", the logic to match a stretch of preimage
lines with postimage lines do not want to bother with "--- a/$path"
headers, and it does not want to care if a line that begins with '+'
needs to be added by calling emit_add_line() that knows how to check
ws errors or the payload needs to be painted in green.  After the
first pass determines which added lines are not true addition but
merely moved, the second pass would know how that '+' line needs to
be painted a lot better (e.g. it may not be painted in green).
Letting fn_out_consume() call emit_add_line() only to compute
information (e.g. "'+'? ok, green") that the first pass does not
want to see and the second pass will compute better is probably not
a good longer term direction to go in.

Having said that, we need to start somewhere, and I think it is a
reasonable first-cut attempt to work on top of the textual output
like this series does (IOW, while I do agree with the NEEDSWORK and
the way this series currently does things must be revamped in the
longer term, I do not think we should wait until that happens to
start playing with this topic).

Thanks.





Re: [PATCHv2 12/20] submodule.c: convert show_submodule_summary to use emit_line_fmt

2017-05-17 Thread Stefan Beller
On Tue, May 16, 2017 at 10:19 PM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
>> In a later patch, I want to propose an option to detect
>> moved lines in a diff, which cannot be done in a one-pass over
>> the diff. Instead we need to go over the whole diff twice,
>> because we cannot detect the first line of the two corresponding
>> lines (+ and -) that got moved.
>>
>> So to prepare the diff machinery for two pass algorithms
>> (i.e. buffer it all up and then operate on the result),
>> move all emissions to places, such that the only emitting
>> function is emit_line_0.
>>
>> This prepares the code for submodules to go through the
>> emit_line function.
>>
>> Signed-off-by: Stefan Beller 
>> ---
>>  diff.c  | 20 +++-
>>  diff.h  |  5 
>>  submodule.c | 78 
>> ++---
>>  submodule.h |  9 +++
>>  4 files changed, 56 insertions(+), 56 deletions(-)
>>
>> diff --git a/diff.c b/diff.c
>> index 690794aeb8..7c8d6a5d12 100644
>> --- a/diff.c
>> +++ b/diff.c
>> @@ -516,8 +516,8 @@ static void check_blank_at_eof(mmfile_t *mf1, mmfile_t 
>> *mf2,
>>   ecbdata->blank_at_eof_in_postimage = (at - l2) + 1;
>>  }
>>
>> -static void emit_line(struct diff_options *o, const char *set, const char 
>> *reset,
>> -   int add_line_prefix, int sign, const char *line, int len)
>> +void emit_line(struct diff_options *o, const char *set, const char *reset,
>> +int add_line_prefix, int sign, const char *line, int len)
>>  {
>>   int has_trailing_newline, has_trailing_carriage_return;
>>   FILE *file = o->file;
>> @@ -547,10 +547,10 @@ static void emit_line(struct diff_options *o, const 
>> char *set, const char *reset
>>   fputc('\n', file);
>>  }
>>
>> -static void emit_line_fmt(struct diff_options *o,
>> -   const char *set, const char *reset,
>> -   int add_line_prefix,
>> -   const char *fmt, ...)
>> +void emit_line_fmt(struct diff_options *o,
>> +const char *set, const char *reset,
>> +int add_line_prefix,
>> +const char *fmt, ...)
>
> Interesting...
>
>> -static void show_submodule_header(FILE *f, const char *path,
>> - const char *line_prefix,
>> +static void show_submodule_header(struct diff_options *o, const char *path,
>>   struct object_id *one, struct object_id *two,
>>   unsigned dirty_submodule, const char *meta,
>>   const char *reset,
>
> Is this ONLY called when the caller wants its output inserted to the
> "diff" (or "log -p") output?

Yes.

>  If so, I think it makes sense to pass
> 'o', but if the function is oblivious that it is driven to produce
> part of a "diff", it feels wrong to pass 'o'.  The original was
> taking a "FILE *" and line_prefix, so it is rather clear that the
> answer to the question is "yes, this is very closely tied to diff
> output".  Now you have access to 'o', so you do not need to pass
> them separately.  Good.

ok.

> Each line in its output, when incorporated in "diff" or "log -p"
> output, must be prefixed with the line-prefix to accomodate users of
> "log --graph", so I guess it cannot be helped.  Your calls to
> emit_line_fmt() below seems to ask the line-prefix to be added,
> which is good, too.
>
> How does capturing these lines help moved line detection, by the
> way?  These must never be matched with any other added or removed
> line in the real patch output.

Why?

Actually I think it has some value if it can match across
(submodule-)repository boundaries, e.g. think of Ævars RFC to put
SHA1DC into a submodule. If reviewing that commit later on, a user
may be interested in "what is the difference between what we carried so
far in this repo compared to what we point at now in the submodule".
Most of the code should be the same, but anchored at a different
path/repo, so a move detection would be super helpful.

I do understand that you may not want to see a move crossing
a repo boundary, but I would prefer that to a later patch, once we
have a better understanding on the use cases of this new feature.

>>   if (is_null_oid(one))
>>   message = "(new submodule)";
>
> emit_line() and emit_line_fmt() are both inappropriate names for a
> global function.  These are very closely tied to diff generation, so
> we probably want to see some form of "diff" in their names.

Oh, uh. You're right.

I would think inside of diff.c we'd still want to keep the short name,
so maybe I'd expose a wrapper to the outside world.

Maybe

diff_emit_strbuf(diff_options *, strbuf *)

would be fine for all use cases from outside.


> The fact that it is clear because its first parameter is "struct
> diff_options" is insufficient---"you cannot tell what context the
> function is meant to be used by only looking at its name" is
> certainly 

Re: [PATCHv2 12/20] submodule.c: convert show_submodule_summary to use emit_line_fmt

2017-05-16 Thread Junio C Hamano
Stefan Beller  writes:

> In a later patch, I want to propose an option to detect
> moved lines in a diff, which cannot be done in a one-pass over
> the diff. Instead we need to go over the whole diff twice,
> because we cannot detect the first line of the two corresponding
> lines (+ and -) that got moved.
>
> So to prepare the diff machinery for two pass algorithms
> (i.e. buffer it all up and then operate on the result),
> move all emissions to places, such that the only emitting
> function is emit_line_0.
>
> This prepares the code for submodules to go through the
> emit_line function.
>
> Signed-off-by: Stefan Beller 
> ---
>  diff.c  | 20 +++-
>  diff.h  |  5 
>  submodule.c | 78 
> ++---
>  submodule.h |  9 +++
>  4 files changed, 56 insertions(+), 56 deletions(-)
>
> diff --git a/diff.c b/diff.c
> index 690794aeb8..7c8d6a5d12 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -516,8 +516,8 @@ static void check_blank_at_eof(mmfile_t *mf1, mmfile_t 
> *mf2,
>   ecbdata->blank_at_eof_in_postimage = (at - l2) + 1;
>  }
>  
> -static void emit_line(struct diff_options *o, const char *set, const char 
> *reset,
> -   int add_line_prefix, int sign, const char *line, int len)
> +void emit_line(struct diff_options *o, const char *set, const char *reset,
> +int add_line_prefix, int sign, const char *line, int len)
>  {
>   int has_trailing_newline, has_trailing_carriage_return;
>   FILE *file = o->file;
> @@ -547,10 +547,10 @@ static void emit_line(struct diff_options *o, const 
> char *set, const char *reset
>   fputc('\n', file);
>  }
>  
> -static void emit_line_fmt(struct diff_options *o,
> -   const char *set, const char *reset,
> -   int add_line_prefix,
> -   const char *fmt, ...)
> +void emit_line_fmt(struct diff_options *o,
> +const char *set, const char *reset,
> +int add_line_prefix,
> +const char *fmt, ...)

Interesting...

> -static void show_submodule_header(FILE *f, const char *path,
> - const char *line_prefix,
> +static void show_submodule_header(struct diff_options *o, const char *path,
>   struct object_id *one, struct object_id *two,
>   unsigned dirty_submodule, const char *meta,
>   const char *reset,

Is this ONLY called when the caller wants its output inserted to the
"diff" (or "log -p") output?  If so, I think it makes sense to pass
'o', but if the function is oblivious that it is driven to produce
part of a "diff", it feels wrong to pass 'o'.  The original was
taking a "FILE *" and line_prefix, so it is rather clear that the
answer to the question is "yes, this is very closely tied to diff
output".  Now you have access to 'o', so you do not need to pass
them separately.  Good.

Each line in its output, when incorporated in "diff" or "log -p"
output, must be prefixed with the line-prefix to accomodate users of
"log --graph", so I guess it cannot be helped.  Your calls to
emit_line_fmt() below seems to ask the line-prefix to be added,
which is good, too.

How does capturing these lines help moved line detection, by the
way?  These must never be matched with any other added or removed
line in the real patch output.

> @@ -426,11 +419,11 @@ static void show_submodule_header(FILE *f, const char 
> *path,
>   int fast_forward = 0, fast_backward = 0;
>  
>   if (dirty_submodule & DIRTY_SUBMODULE_UNTRACKED)
> - fprintf(f, "%sSubmodule %s contains untracked content\n",
> - line_prefix, path);
> + emit_line_fmt(o, NULL, NULL, 1,
> +   "Submodule %s contains untracked content\n", 
> path);
>   if (dirty_submodule & DIRTY_SUBMODULE_MODIFIED)
> - fprintf(f, "%sSubmodule %s contains modified content\n",
> - line_prefix, path);
> + emit_line_fmt(o, NULL, NULL, 1,
> +   "Submodule %s contains modified content\n", path);
>  
>   if (is_null_oid(one))
>   message = "(new submodule)";

emit_line() and emit_line_fmt() are both inappropriate names for a
global function.  These are very closely tied to diff generation, so
we probably want to see some form of "diff" in their names.  

The fact that it is clear because its first parameter is "struct
diff_options" is insufficient---"you cannot tell what context the
function is meant to be used by only looking at its name" is
certainly solved by its function signature, but the other issue with
an overly generic name is that other codepaths in different contexts
may want to use such a short and sweet name.

Thanks.


[PATCHv2 12/20] submodule.c: convert show_submodule_summary to use emit_line_fmt

2017-05-16 Thread Stefan Beller
In a later patch, I want to propose an option to detect
moved lines in a diff, which cannot be done in a one-pass over
the diff. Instead we need to go over the whole diff twice,
because we cannot detect the first line of the two corresponding
lines (+ and -) that got moved.

So to prepare the diff machinery for two pass algorithms
(i.e. buffer it all up and then operate on the result),
move all emissions to places, such that the only emitting
function is emit_line_0.

This prepares the code for submodules to go through the
emit_line function.

Signed-off-by: Stefan Beller 
---
 diff.c  | 20 +++-
 diff.h  |  5 
 submodule.c | 78 ++---
 submodule.h |  9 +++
 4 files changed, 56 insertions(+), 56 deletions(-)

diff --git a/diff.c b/diff.c
index 690794aeb8..7c8d6a5d12 100644
--- a/diff.c
+++ b/diff.c
@@ -516,8 +516,8 @@ static void check_blank_at_eof(mmfile_t *mf1, mmfile_t *mf2,
ecbdata->blank_at_eof_in_postimage = (at - l2) + 1;
 }
 
-static void emit_line(struct diff_options *o, const char *set, const char 
*reset,
- int add_line_prefix, int sign, const char *line, int len)
+void emit_line(struct diff_options *o, const char *set, const char *reset,
+  int add_line_prefix, int sign, const char *line, int len)
 {
int has_trailing_newline, has_trailing_carriage_return;
FILE *file = o->file;
@@ -547,10 +547,10 @@ static void emit_line(struct diff_options *o, const char 
*set, const char *reset
fputc('\n', file);
 }
 
-static void emit_line_fmt(struct diff_options *o,
- const char *set, const char *reset,
- int add_line_prefix,
- const char *fmt, ...)
+void emit_line_fmt(struct diff_options *o,
+  const char *set, const char *reset,
+  int add_line_prefix,
+  const char *fmt, ...)
 {
struct strbuf sb = STRBUF_INIT;
va_list ap;
@@ -2386,8 +2386,7 @@ static void builtin_diff(const char *name_a,
(!two->mode || S_ISGITLINK(two->mode))) {
const char *del = diff_get_color_opt(o, DIFF_FILE_OLD);
const char *add = diff_get_color_opt(o, DIFF_FILE_NEW);
-   show_submodule_summary(o->file, one->path ? one->path : 
two->path,
-   line_prefix,
+   show_submodule_summary(o, one->path ? one->path : two->path,
>oid, >oid,
two->dirty_submodule,
meta, del, add, reset);
@@ -2397,11 +2396,10 @@ static void builtin_diff(const char *name_a,
   (!two->mode || S_ISGITLINK(two->mode))) {
const char *del = diff_get_color_opt(o, DIFF_FILE_OLD);
const char *add = diff_get_color_opt(o, DIFF_FILE_NEW);
-   show_submodule_inline_diff(o->file, one->path ? one->path : 
two->path,
-   line_prefix,
+   show_submodule_inline_diff(o, one->path ? one->path : two->path,
>oid, >oid,
two->dirty_submodule,
-   meta, del, add, reset, o);
+   meta, del, add, reset);
return;
}
 
diff --git a/diff.h b/diff.h
index 5be1ee77a7..6e14100102 100644
--- a/diff.h
+++ b/diff.h
@@ -188,6 +188,11 @@ struct diff_options {
int diff_path_counter;
 };
 
+void emit_line_fmt(struct diff_options *o, const char *set, const char *reset,
+  int add_line_prefix, const char *fmt, ...);
+void emit_line(struct diff_options *o, const char *set, const char *reset,
+  int add_line_prefix, int sign, const char *line, int len);
+
 enum color_diff {
DIFF_RESET = 0,
DIFF_CONTEXT = 1,
diff --git a/submodule.c b/submodule.c
index d3299e29c0..5996ebca44 100644
--- a/submodule.c
+++ b/submodule.c
@@ -362,8 +362,8 @@ static int prepare_submodule_summary(struct rev_info *rev, 
const char *path,
return prepare_revision_walk(rev);
 }
 
-static void print_submodule_summary(struct rev_info *rev, FILE *f,
-   const char *line_prefix,
+static void print_submodule_summary(struct rev_info *rev,
+   struct diff_options *o,
const char *del, const char *add, const char *reset)
 {
static const char format[] = "  %m %s";
@@ -375,18 +375,12 @@ static void print_submodule_summary(struct rev_info *rev, 
FILE *f,
ctx.date_mode = rev->date_mode;
ctx.output_encoding = get_log_output_encoding();
strbuf_setlen(, 0);
-   strbuf_addstr(, line_prefix);
-   if (commit->object.flags & SYMMETRIC_LEFT) {
-   if (del)
-   strbuf_addstr(, del);
-   }
-