Re: [PATCHv2 12/20] submodule.c: convert show_submodule_summary to use emit_line_fmt
On Fri, May 19, 2017 at 9:50 PM, Junio C Hamanowrote: > Stefan Beller writes: > >> That could be added in ws.c:ws_check_emit, as these certain words >> are similar to coloring whitespace. > > I actually was envisioning of highlighting a part of a line, like > > -Very poor SCM > +Very nice SCM > > which would be done by finding semi-matching removed and added lines > in the same hunk (i.e. local buffering) and makes a coloring decision. > That does not have any place in ws.c. Yes such a feature would not want to be in ws.c For the problem above, we're still fine with the given data structures IMO. Though it may hint at bad naming of the struct 'buffered_patch_line' as it is not a complete line. Assuming the example above, running "show --word-diff" would produce the following "buffered_patch_lines" {show_prefix=1, sign='\0', line="-Very ", set=NULL, reset=NULL} {show_prefix=0, sign='\0', line="{- poor}", set='red', reset='normal'} {show_prefix=0, sign='\0', line="{+ nice}", set='green', reset='normal'} {show_prefix=0, sign='\0', line=" SCM\n", set=NULL, reset=NULL} so in the future, we may want to produce {show_prefix=1, sign='-', line="Very ", set=NULL, reset=NULL} {show_prefix=0, sign='\0', line="poor", set='red', reset='normal'} {show_prefix=0, sign='\0', line=" SCM\n", set=NULL, reset=NULL} {show_prefix=1, sign='+', line="Very ", set=NULL, reset=NULL} {show_prefix=0, sign='\0', line="nice", set='gren', reset='normal'} {show_prefix=0, sign='\0', line=" SCM\n", set=NULL, reset=NULL} instead. I think the data structure can be used as-is, but is just mis-named. I'll fix that in a resend. The algorithm to highlight what changed on a word level after a hunk would have to be put into the current hunk finding ("mark_color_as_moved"), and then split up the actual lines into pieces of a line and add different colors like we see above. > >>> Having said that, we need to start somewhere, and I think it is a >>> reasonable first-cut attempt to work on top of the textual output >>> like this series does (IOW, while I do agree with the NEEDSWORK and >>> the way this series currently does things must be revamped in the >>> longer term, I do not think we should wait until that happens to >>> start playing with this topic). >> >> Ok. I share a similar reaction to submodule diffs that we discuss above >> and word coloring, that Jonathan Tan brought up off list. >> >> Both of them are broken in this implementation, but the NEEDSWORK >> would hint at how to fix them. > > Yes, but if NEEDSWORK has to say "the current hack is working at a > wrong level, we need to do all of this before producing textual > diffs that are passed to the layer that colors lines", that wouldn't > help that much as a hint X-<. For the word coloring, I think we'd just need better post processing. For cros-submodule move detection, we may want to wait in Brandons work to be able to run submodule stuff in-process and then we have the lines buffered directly there, so we can operate on them as well. I agree that "NEEDSWORK: the current hack is working at a wrong level" is not useful. But I thought we're in agreement that it is not? I must have misunderstood parts of what you were saying. By saying it could go to ws.c in my reply I rather meant: it could go into some post-processing function of the "buffered_patch_line" struct. Currently we only have ws_emit_line that does it, but we can do it either similarly or just split up one buffered_patch_line into more than one and color up each individually. This would also be possible for us now, too. Instead of running it through ws_emit_line we could split up the line and color each piece differently. However that could be a problem in the algorithm to find similar hunks (as we do have different rules for added and deleted text). Thanks, Stefan
Re: [PATCHv2 12/20] submodule.c: convert show_submodule_summary to use emit_line_fmt
Stefan Bellerwrites: > That could be added in ws.c:ws_check_emit, as these certain words > are similar to coloring whitespace. I actually was envisioning of highlighting a part of a line, like -Very poor SCM +Very nice SCM which would be done by finding semi-matching removed and added lines in the same hunk (i.e. local buffering) and makes a coloring decision. That does not have any place in ws.c. >> Having said that, we need to start somewhere, and I think it is a >> reasonable first-cut attempt to work on top of the textual output >> like this series does (IOW, while I do agree with the NEEDSWORK and >> the way this series currently does things must be revamped in the >> longer term, I do not think we should wait until that happens to >> start playing with this topic). > > Ok. I share a similar reaction to submodule diffs that we discuss above > and word coloring, that Jonathan Tan brought up off list. > > Both of them are broken in this implementation, but the NEEDSWORK > would hint at how to fix them. Yes, but if NEEDSWORK has to say "the current hack is working at a wrong level, we need to do all of this before producing textual diffs that are passed to the layer that colors lines", that wouldn't help that much as a hint X-<.
Re: [PATCHv2 12/20] submodule.c: convert show_submodule_summary to use emit_line_fmt
On Wed, May 17, 2017 at 8:25 PM, Junio C Hamanowrote: > Stefan Beller writes: > +static void show_submodule_header(struct diff_options *o, const char *path, struct object_id *one, struct object_id *two, unsigned dirty_submodule, const char *meta, const char *reset, >>> ... >>> How does capturing these lines help moved line detection, by the >>> way? These must never be matched with any other added or removed >>> line in the real patch output. >> >> Why? > > What are buffered are not patch text, but informational text like > "Submodule X contains untracked content", etc. When a text file is > modified elsewhere and lost a line that happened to say the same > contents, we do not want to consider that such a line was moved to > where a submodule had an untracked file. > > I have a suspicion that the two-pass buffering is done at too high a > level in this series. Doesn't the code (I haven't reached the end > of the series) update emit_line() to buffer the patch text and these > non-patch text with all the coloring and resetting sequences? > Because the "ah, this old line removed corresponds to that new line > that appears elsewhere?" logic do not want to see these color/reset > sequence, the buffering code needs to become quite specific to how > the current diff code is colored (e.g. a line must be painted in a > single color and have reset at the end) and makes future change to > color things differently almost impossible (e.g. imagine how you > would add a "feature" that paints certain words on added lines > differently?). That could be added in ws.c:ws_check_emit, as these certain words are similar to coloring whitespace. It depends on the precedence of such a future feature, is the move detection or the word highlighting more important to keep its color? > Ahh, yes, I see NEEDSWORK comment in patch 19/20. > > Yes, I agree that this code really should be working in terms of > offsets into pre/post images when finding matching changes, which > probably should happen without letting fn_out_consume produce fully > colored textual diff output in the first pass. For the purpose of > "moved lines detection", the logic to match a stretch of preimage > lines with postimage lines do not want to bother with "--- a/$path" > headers, and it does not want to care if a line that begins with '+' > needs to be added by calling emit_add_line() that knows how to check > ws errors or the payload needs to be painted in green. After the > first pass determines which added lines are not true addition but > merely moved, the second pass would know how that '+' line needs to > be painted a lot better (e.g. it may not be painted in green). > Letting fn_out_consume() call emit_add_line() only to compute > information (e.g. "'+'? ok, green") that the first pass does not > want to see and the second pass will compute better is probably not > a good longer term direction to go in. > > Having said that, we need to start somewhere, and I think it is a > reasonable first-cut attempt to work on top of the textual output > like this series does (IOW, while I do agree with the NEEDSWORK and > the way this series currently does things must be revamped in the > longer term, I do not think we should wait until that happens to > start playing with this topic). Ok. I share a similar reaction to submodule diffs that we discuss above and word coloring, that Jonathan Tan brought up off list. Both of them are broken in this implementation, but the NEEDSWORK would hint at how to fix them. For word coloring, we'd invent a new state BPL_EMIT_WITH_BRACES, that would only store the word and at output time we'd have to add sign, braces, colors. Then a block movement detection is possible. (and this would also work with offset/len into the files longer term) For the submodule diffs, I am really looking forward how Brandons current work is coming along to have a repository struct such that we can process submodules in the same process. For this diffing the repo object would need to learn about the attribute system of the submodules, such that we can obtain the whitespace coloring rules, as well as the config (submodule may be configured to use different colors for diffs).
Re: [PATCHv2 12/20] submodule.c: convert show_submodule_summary to use emit_line_fmt
Stefan Bellerwrites: >>> +static void show_submodule_header(struct diff_options *o, const char *path, >>> struct object_id *one, struct object_id *two, >>> unsigned dirty_submodule, const char *meta, >>> const char *reset, >> ... >> How does capturing these lines help moved line detection, by the >> way? These must never be matched with any other added or removed >> line in the real patch output. > > Why? What are buffered are not patch text, but informational text like "Submodule X contains untracked content", etc. When a text file is modified elsewhere and lost a line that happened to say the same contents, we do not want to consider that such a line was moved to where a submodule had an untracked file. I have a suspicion that the two-pass buffering is done at too high a level in this series. Doesn't the code (I haven't reached the end of the series) update emit_line() to buffer the patch text and these non-patch text with all the coloring and resetting sequences? Because the "ah, this old line removed corresponds to that new line that appears elsewhere?" logic do not want to see these color/reset sequence, the buffering code needs to become quite specific to how the current diff code is colored (e.g. a line must be painted in a single color and have reset at the end) and makes future change to color things differently almost impossible (e.g. imagine how you would add a "feature" that paints certain words on added lines differently?). Ahh, yes, I see NEEDSWORK comment in patch 19/20. Yes, I agree that this code really should be working in terms of offsets into pre/post images when finding matching changes, which probably should happen without letting fn_out_consume produce fully colored textual diff output in the first pass. For the purpose of "moved lines detection", the logic to match a stretch of preimage lines with postimage lines do not want to bother with "--- a/$path" headers, and it does not want to care if a line that begins with '+' needs to be added by calling emit_add_line() that knows how to check ws errors or the payload needs to be painted in green. After the first pass determines which added lines are not true addition but merely moved, the second pass would know how that '+' line needs to be painted a lot better (e.g. it may not be painted in green). Letting fn_out_consume() call emit_add_line() only to compute information (e.g. "'+'? ok, green") that the first pass does not want to see and the second pass will compute better is probably not a good longer term direction to go in. Having said that, we need to start somewhere, and I think it is a reasonable first-cut attempt to work on top of the textual output like this series does (IOW, while I do agree with the NEEDSWORK and the way this series currently does things must be revamped in the longer term, I do not think we should wait until that happens to start playing with this topic). Thanks.
Re: [PATCHv2 12/20] submodule.c: convert show_submodule_summary to use emit_line_fmt
On Tue, May 16, 2017 at 10:19 PM, Junio C Hamanowrote: > Stefan Beller writes: > >> In a later patch, I want to propose an option to detect >> moved lines in a diff, which cannot be done in a one-pass over >> the diff. Instead we need to go over the whole diff twice, >> because we cannot detect the first line of the two corresponding >> lines (+ and -) that got moved. >> >> So to prepare the diff machinery for two pass algorithms >> (i.e. buffer it all up and then operate on the result), >> move all emissions to places, such that the only emitting >> function is emit_line_0. >> >> This prepares the code for submodules to go through the >> emit_line function. >> >> Signed-off-by: Stefan Beller >> --- >> diff.c | 20 +++- >> diff.h | 5 >> submodule.c | 78 >> ++--- >> submodule.h | 9 +++ >> 4 files changed, 56 insertions(+), 56 deletions(-) >> >> diff --git a/diff.c b/diff.c >> index 690794aeb8..7c8d6a5d12 100644 >> --- a/diff.c >> +++ b/diff.c >> @@ -516,8 +516,8 @@ static void check_blank_at_eof(mmfile_t *mf1, mmfile_t >> *mf2, >> ecbdata->blank_at_eof_in_postimage = (at - l2) + 1; >> } >> >> -static void emit_line(struct diff_options *o, const char *set, const char >> *reset, >> - int add_line_prefix, int sign, const char *line, int len) >> +void emit_line(struct diff_options *o, const char *set, const char *reset, >> +int add_line_prefix, int sign, const char *line, int len) >> { >> int has_trailing_newline, has_trailing_carriage_return; >> FILE *file = o->file; >> @@ -547,10 +547,10 @@ static void emit_line(struct diff_options *o, const >> char *set, const char *reset >> fputc('\n', file); >> } >> >> -static void emit_line_fmt(struct diff_options *o, >> - const char *set, const char *reset, >> - int add_line_prefix, >> - const char *fmt, ...) >> +void emit_line_fmt(struct diff_options *o, >> +const char *set, const char *reset, >> +int add_line_prefix, >> +const char *fmt, ...) > > Interesting... > >> -static void show_submodule_header(FILE *f, const char *path, >> - const char *line_prefix, >> +static void show_submodule_header(struct diff_options *o, const char *path, >> struct object_id *one, struct object_id *two, >> unsigned dirty_submodule, const char *meta, >> const char *reset, > > Is this ONLY called when the caller wants its output inserted to the > "diff" (or "log -p") output? Yes. > If so, I think it makes sense to pass > 'o', but if the function is oblivious that it is driven to produce > part of a "diff", it feels wrong to pass 'o'. The original was > taking a "FILE *" and line_prefix, so it is rather clear that the > answer to the question is "yes, this is very closely tied to diff > output". Now you have access to 'o', so you do not need to pass > them separately. Good. ok. > Each line in its output, when incorporated in "diff" or "log -p" > output, must be prefixed with the line-prefix to accomodate users of > "log --graph", so I guess it cannot be helped. Your calls to > emit_line_fmt() below seems to ask the line-prefix to be added, > which is good, too. > > How does capturing these lines help moved line detection, by the > way? These must never be matched with any other added or removed > line in the real patch output. Why? Actually I think it has some value if it can match across (submodule-)repository boundaries, e.g. think of Ævars RFC to put SHA1DC into a submodule. If reviewing that commit later on, a user may be interested in "what is the difference between what we carried so far in this repo compared to what we point at now in the submodule". Most of the code should be the same, but anchored at a different path/repo, so a move detection would be super helpful. I do understand that you may not want to see a move crossing a repo boundary, but I would prefer that to a later patch, once we have a better understanding on the use cases of this new feature. >> if (is_null_oid(one)) >> message = "(new submodule)"; > > emit_line() and emit_line_fmt() are both inappropriate names for a > global function. These are very closely tied to diff generation, so > we probably want to see some form of "diff" in their names. Oh, uh. You're right. I would think inside of diff.c we'd still want to keep the short name, so maybe I'd expose a wrapper to the outside world. Maybe diff_emit_strbuf(diff_options *, strbuf *) would be fine for all use cases from outside. > The fact that it is clear because its first parameter is "struct > diff_options" is insufficient---"you cannot tell what context the > function is meant to be used by only looking at its name" is > certainly
Re: [PATCHv2 12/20] submodule.c: convert show_submodule_summary to use emit_line_fmt
Stefan Bellerwrites: > In a later patch, I want to propose an option to detect > moved lines in a diff, which cannot be done in a one-pass over > the diff. Instead we need to go over the whole diff twice, > because we cannot detect the first line of the two corresponding > lines (+ and -) that got moved. > > So to prepare the diff machinery for two pass algorithms > (i.e. buffer it all up and then operate on the result), > move all emissions to places, such that the only emitting > function is emit_line_0. > > This prepares the code for submodules to go through the > emit_line function. > > Signed-off-by: Stefan Beller > --- > diff.c | 20 +++- > diff.h | 5 > submodule.c | 78 > ++--- > submodule.h | 9 +++ > 4 files changed, 56 insertions(+), 56 deletions(-) > > diff --git a/diff.c b/diff.c > index 690794aeb8..7c8d6a5d12 100644 > --- a/diff.c > +++ b/diff.c > @@ -516,8 +516,8 @@ static void check_blank_at_eof(mmfile_t *mf1, mmfile_t > *mf2, > ecbdata->blank_at_eof_in_postimage = (at - l2) + 1; > } > > -static void emit_line(struct diff_options *o, const char *set, const char > *reset, > - int add_line_prefix, int sign, const char *line, int len) > +void emit_line(struct diff_options *o, const char *set, const char *reset, > +int add_line_prefix, int sign, const char *line, int len) > { > int has_trailing_newline, has_trailing_carriage_return; > FILE *file = o->file; > @@ -547,10 +547,10 @@ static void emit_line(struct diff_options *o, const > char *set, const char *reset > fputc('\n', file); > } > > -static void emit_line_fmt(struct diff_options *o, > - const char *set, const char *reset, > - int add_line_prefix, > - const char *fmt, ...) > +void emit_line_fmt(struct diff_options *o, > +const char *set, const char *reset, > +int add_line_prefix, > +const char *fmt, ...) Interesting... > -static void show_submodule_header(FILE *f, const char *path, > - const char *line_prefix, > +static void show_submodule_header(struct diff_options *o, const char *path, > struct object_id *one, struct object_id *two, > unsigned dirty_submodule, const char *meta, > const char *reset, Is this ONLY called when the caller wants its output inserted to the "diff" (or "log -p") output? If so, I think it makes sense to pass 'o', but if the function is oblivious that it is driven to produce part of a "diff", it feels wrong to pass 'o'. The original was taking a "FILE *" and line_prefix, so it is rather clear that the answer to the question is "yes, this is very closely tied to diff output". Now you have access to 'o', so you do not need to pass them separately. Good. Each line in its output, when incorporated in "diff" or "log -p" output, must be prefixed with the line-prefix to accomodate users of "log --graph", so I guess it cannot be helped. Your calls to emit_line_fmt() below seems to ask the line-prefix to be added, which is good, too. How does capturing these lines help moved line detection, by the way? These must never be matched with any other added or removed line in the real patch output. > @@ -426,11 +419,11 @@ static void show_submodule_header(FILE *f, const char > *path, > int fast_forward = 0, fast_backward = 0; > > if (dirty_submodule & DIRTY_SUBMODULE_UNTRACKED) > - fprintf(f, "%sSubmodule %s contains untracked content\n", > - line_prefix, path); > + emit_line_fmt(o, NULL, NULL, 1, > + "Submodule %s contains untracked content\n", > path); > if (dirty_submodule & DIRTY_SUBMODULE_MODIFIED) > - fprintf(f, "%sSubmodule %s contains modified content\n", > - line_prefix, path); > + emit_line_fmt(o, NULL, NULL, 1, > + "Submodule %s contains modified content\n", path); > > if (is_null_oid(one)) > message = "(new submodule)"; emit_line() and emit_line_fmt() are both inappropriate names for a global function. These are very closely tied to diff generation, so we probably want to see some form of "diff" in their names. The fact that it is clear because its first parameter is "struct diff_options" is insufficient---"you cannot tell what context the function is meant to be used by only looking at its name" is certainly solved by its function signature, but the other issue with an overly generic name is that other codepaths in different contexts may want to use such a short and sweet name. Thanks.
[PATCHv2 12/20] submodule.c: convert show_submodule_summary to use emit_line_fmt
In a later patch, I want to propose an option to detect moved lines in a diff, which cannot be done in a one-pass over the diff. Instead we need to go over the whole diff twice, because we cannot detect the first line of the two corresponding lines (+ and -) that got moved. So to prepare the diff machinery for two pass algorithms (i.e. buffer it all up and then operate on the result), move all emissions to places, such that the only emitting function is emit_line_0. This prepares the code for submodules to go through the emit_line function. Signed-off-by: Stefan Beller--- diff.c | 20 +++- diff.h | 5 submodule.c | 78 ++--- submodule.h | 9 +++ 4 files changed, 56 insertions(+), 56 deletions(-) diff --git a/diff.c b/diff.c index 690794aeb8..7c8d6a5d12 100644 --- a/diff.c +++ b/diff.c @@ -516,8 +516,8 @@ static void check_blank_at_eof(mmfile_t *mf1, mmfile_t *mf2, ecbdata->blank_at_eof_in_postimage = (at - l2) + 1; } -static void emit_line(struct diff_options *o, const char *set, const char *reset, - int add_line_prefix, int sign, const char *line, int len) +void emit_line(struct diff_options *o, const char *set, const char *reset, + int add_line_prefix, int sign, const char *line, int len) { int has_trailing_newline, has_trailing_carriage_return; FILE *file = o->file; @@ -547,10 +547,10 @@ static void emit_line(struct diff_options *o, const char *set, const char *reset fputc('\n', file); } -static void emit_line_fmt(struct diff_options *o, - const char *set, const char *reset, - int add_line_prefix, - const char *fmt, ...) +void emit_line_fmt(struct diff_options *o, + const char *set, const char *reset, + int add_line_prefix, + const char *fmt, ...) { struct strbuf sb = STRBUF_INIT; va_list ap; @@ -2386,8 +2386,7 @@ static void builtin_diff(const char *name_a, (!two->mode || S_ISGITLINK(two->mode))) { const char *del = diff_get_color_opt(o, DIFF_FILE_OLD); const char *add = diff_get_color_opt(o, DIFF_FILE_NEW); - show_submodule_summary(o->file, one->path ? one->path : two->path, - line_prefix, + show_submodule_summary(o, one->path ? one->path : two->path, >oid, >oid, two->dirty_submodule, meta, del, add, reset); @@ -2397,11 +2396,10 @@ static void builtin_diff(const char *name_a, (!two->mode || S_ISGITLINK(two->mode))) { const char *del = diff_get_color_opt(o, DIFF_FILE_OLD); const char *add = diff_get_color_opt(o, DIFF_FILE_NEW); - show_submodule_inline_diff(o->file, one->path ? one->path : two->path, - line_prefix, + show_submodule_inline_diff(o, one->path ? one->path : two->path, >oid, >oid, two->dirty_submodule, - meta, del, add, reset, o); + meta, del, add, reset); return; } diff --git a/diff.h b/diff.h index 5be1ee77a7..6e14100102 100644 --- a/diff.h +++ b/diff.h @@ -188,6 +188,11 @@ struct diff_options { int diff_path_counter; }; +void emit_line_fmt(struct diff_options *o, const char *set, const char *reset, + int add_line_prefix, const char *fmt, ...); +void emit_line(struct diff_options *o, const char *set, const char *reset, + int add_line_prefix, int sign, const char *line, int len); + enum color_diff { DIFF_RESET = 0, DIFF_CONTEXT = 1, diff --git a/submodule.c b/submodule.c index d3299e29c0..5996ebca44 100644 --- a/submodule.c +++ b/submodule.c @@ -362,8 +362,8 @@ static int prepare_submodule_summary(struct rev_info *rev, const char *path, return prepare_revision_walk(rev); } -static void print_submodule_summary(struct rev_info *rev, FILE *f, - const char *line_prefix, +static void print_submodule_summary(struct rev_info *rev, + struct diff_options *o, const char *del, const char *add, const char *reset) { static const char format[] = " %m %s"; @@ -375,18 +375,12 @@ static void print_submodule_summary(struct rev_info *rev, FILE *f, ctx.date_mode = rev->date_mode; ctx.output_encoding = get_log_output_encoding(); strbuf_setlen(, 0); - strbuf_addstr(, line_prefix); - if (commit->object.flags & SYMMETRIC_LEFT) { - if (del) - strbuf_addstr(, del); - } -