Re: [PATCH 2/3] xdiff: -W: include immediately preceding non-empty lines in context
René Scharfewrites: > ... How bad would it be to only > implement the first part (as in the patch I just sent) without adding > new config settings or parameters? That probably is a good place to stop ;-)
Re: [PATCH 2/3] xdiff: -W: include immediately preceding non-empty lines in context
Am 15.01.2017 um 11:06 schrieb Vegard Nossum: On 15/01/2017 03:39, Junio C Hamano wrote: René Scharfewrites: How about extending the context upward only up to and excluding a line that is either empty *or* a function line? That would limit the extra context to a single function in the worst case. Reducing context at the bottom with the aim to remove comments for the next section is more tricky as it could remove part of the function that we'd like to show if we get the boundary wrong. How bad would it be to keep the southern border unchanged? I personally do not think there is any robust heuristic other than Vegard's "a blank line may be a signal enough that lines before that are not part of the beginning of the function", and I think your "hence we look for a blank line but if there is a line that matches the function header, stop there as we know we came too far back" will be a good-enough safety measure. I also agree with you that we probably do not want to futz with the southern border. You are right, trying to change the southern border in this way is not quite reliable if there are no empty lines whatsoever and can erroneously cause the function context to not include the bottom of the function being changed. I'm splitting the function boundary detection logic into separate functions and trying to solve the above case without breaking the tests (and adding a new test for the above case too). I'll see if I can additionally provide some toggles (flags or config variables) to control the new behaviour, what I had in mind was: -W[=preamble,=no-preamble] --function-context[=preamble,=no-preamble] diff.functionContextPreamble = (where the new logic is controlled by the new config variable and overridden by the presence of =preamble or =no-preamble). Adding comments before a function is useful, removing comments after a function sounds to me as only nice to have (under the assumption that they belong to the next function[*]). How bad would it be to only implement the first part (as in the patch I just sent) without adding new config settings or parameters? Thanks, René [*] Silly counter-example (the #endif line): #ifdef SOMETHING int f(...) { // implementation for SOMETHING } #else inf f(...) { // implementation without SOMETHING } #endif /* SOMETHING */
Re: [PATCH 2/3] xdiff: -W: include immediately preceding non-empty lines in context
Am 15.01.2017 um 03:39 schrieb Junio C Hamano: > René Scharfewrites: > >>> I am also more focused on keeping the codebase maintainable in good >>> health by making sure that we made an effort to find a solution that >>> is general-enough before solving a single specific problem you have >>> today. We may end up deciding that a blank-line heuristics gives us >>> good enough tradeoff, but I do not want us to make a decision before >>> thinking. >> >> How about extending the context upward only up to and excluding a line >> that is either empty *or* a function line? That would limit the extra >> context to a single function in the worst case. >> >> Reducing context at the bottom with the aim to remove comments for the >> next section is more tricky as it could remove part of the function >> that we'd like to show if we get the boundary wrong. How bad would it >> be to keep the southern border unchanged? > > I personally do not think there is any robust heuristic other than > Vegard's "a blank line may be a signal enough that lines before that > are not part of the beginning of the function", and I think your > "hence we look for a blank line but if there is a line that matches > the function header, stop there as we know we came too far back" > will be a good-enough safety measure. > > I also agree with you that we probably do not want to futz with the > southern border. A replacement patch for 2/3 with these changes would look like this: diff --git a/xdiff/xemit.c b/xdiff/xemit.c index 8c88dbde38..9ed54cd318 100644 --- a/xdiff/xemit.c +++ b/xdiff/xemit.c @@ -174,11 +174,11 @@ int xdl_emit_diff(xdfenv_t *xe, xdchange_t *xscr, xdemitcb_t *ecb, s2 = XDL_MAX(xch->i2 - xecfg->ctxlen, 0); if (xecfg->flags & XDL_EMIT_FUNCCONTEXT) { + char dummy[1]; long fs1, i1 = xch->i1; /* Appended chunk? */ if (i1 >= xe->xdf1.nrec) { - char dummy[1]; long i2 = xch->i2; /* @@ -200,6 +200,10 @@ int xdl_emit_diff(xdfenv_t *xe, xdchange_t *xscr, xdemitcb_t *ecb, } fs1 = get_func_line(xe, xecfg, NULL, i1, -1); + while (fs1 > 0 && !is_empty_rec(>xdf1, fs1 - 1) && + match_func_rec(>xdf1, xecfg, fs1 - 1, + dummy, sizeof(dummy)) < 0) + fs1--; if (fs1 < 0) fs1 = 0; if (fs1 < s1) {
Re: [PATCH 2/3] xdiff: -W: include immediately preceding non-empty lines in context
On 15/01/2017 03:39, Junio C Hamano wrote: René Scharfewrites: I am also more focused on keeping the codebase maintainable in good health by making sure that we made an effort to find a solution that is general-enough before solving a single specific problem you have today. We may end up deciding that a blank-line heuristics gives us good enough tradeoff, but I do not want us to make a decision before thinking. You are right; I appreciate this approach. How about extending the context upward only up to and excluding a line that is either empty *or* a function line? That would limit the extra context to a single function in the worst case. Reducing context at the bottom with the aim to remove comments for the next section is more tricky as it could remove part of the function that we'd like to show if we get the boundary wrong. How bad would it be to keep the southern border unchanged? I personally do not think there is any robust heuristic other than Vegard's "a blank line may be a signal enough that lines before that are not part of the beginning of the function", and I think your "hence we look for a blank line but if there is a line that matches the function header, stop there as we know we came too far back" will be a good-enough safety measure. I also agree with you that we probably do not want to futz with the southern border. You are right, trying to change the southern border in this way is not quite reliable if there are no empty lines whatsoever and can erroneously cause the function context to not include the bottom of the function being changed. I'm splitting the function boundary detection logic into separate functions and trying to solve the above case without breaking the tests (and adding a new test for the above case too). I'll see if I can additionally provide some toggles (flags or config variables) to control the new behaviour, what I had in mind was: -W[=preamble,=no-preamble] --function-context[=preamble,=no-preamble] diff.functionContextPreamble = (where the new logic is controlled by the new config variable and overridden by the presence of =preamble or =no-preamble). Then it also shouldn't be too difficult to add diff..preamble = diff..xpreamble = to override the heuristic used for function border detection in exceptional cases. You can argue about the naming now ;-) But I will use this for a start, renaming/reworking it (or throwing it away) afterwards should be easy once the code has been written. Vegard
Re: [PATCH 2/3] xdiff: -W: include immediately preceding non-empty lines in context
René Scharfewrites: >> I am also more focused on keeping the codebase maintainable in good >> health by making sure that we made an effort to find a solution that >> is general-enough before solving a single specific problem you have >> today. We may end up deciding that a blank-line heuristics gives us >> good enough tradeoff, but I do not want us to make a decision before >> thinking. > > How about extending the context upward only up to and excluding a line > that is either empty *or* a function line? That would limit the extra > context to a single function in the worst case. > > Reducing context at the bottom with the aim to remove comments for the > next section is more tricky as it could remove part of the function > that we'd like to show if we get the boundary wrong. How bad would it > be to keep the southern border unchanged? I personally do not think there is any robust heuristic other than Vegard's "a blank line may be a signal enough that lines before that are not part of the beginning of the function", and I think your "hence we look for a blank line but if there is a line that matches the function header, stop there as we know we came too far back" will be a good-enough safety measure. I also agree with you that we probably do not want to futz with the southern border. Thanks.
Re: [PATCH 2/3] xdiff: -W: include immediately preceding non-empty lines in context
Am 14.01.2017 um 00:56 schrieb Junio C Hamano: Vegard Nossumwrites: The patch will work as intended and as expected for 95% of the users out there (javadoc, Doxygen, kerneldoc, etc. all have the comment immediately preceding the function) and fixes a very real problem for me (and I expect many others) _today_; for the remaining 5% (who put a blank line between their comment and the start of the function) it will revert back to the current behaviour, so there should be no regression for them. I notice your 95% are all programming languages, but I am more worried about the contents written in non programming languages (René gave HTML an an example--there may be other types of contents that we programmer types do not deal with every day, but Git users depend on). I am also more focused on keeping the codebase maintainable in good health by making sure that we made an effort to find a solution that is general-enough before solving a single specific problem you have today. We may end up deciding that a blank-line heuristics gives us good enough tradeoff, but I do not want us to make a decision before thinking. How about extending the context upward only up to and excluding a line that is either empty *or* a function line? That would limit the extra context to a single function in the worst case. Reducing context at the bottom with the aim to remove comments for the next section is more tricky as it could remove part of the function that we'd like to show if we get the boundary wrong. How bad would it be to keep the southern border unchanged? René
Re: [PATCH 2/3] xdiff: -W: include immediately preceding non-empty lines in context
Vegard Nossumwrites: > The patch will work as intended and as expected for 95% of the users out > there (javadoc, Doxygen, kerneldoc, etc. all have the comment > immediately preceding the function) and fixes a very real problem for me > (and I expect many others) _today_; for the remaining 5% (who put a > blank line between their comment and the start of the function) it will > revert back to the current behaviour, so there should be no regression > for them. I notice your 95% are all programming languages, but I am more worried about the contents written in non programming languages (René gave HTML an an example--there may be other types of contents that we programmer types do not deal with every day, but Git users depend on). I am also more focused on keeping the codebase maintainable in good health by making sure that we made an effort to find a solution that is general-enough before solving a single specific problem you have today. We may end up deciding that a blank-line heuristics gives us good enough tradeoff, but I do not want us to make a decision before thinking. >> The way "diff -W" codepath used it as if it were always the very >> first line of a function was bound to invite a patch like this, and >> if we want to be extra elaborate, I agree that an extra mechanism to >> say "the line the funcline regexp matches is not the beginning of a >> function, but the beginning is a line that matches this other regexp >> before that line" may help. >> >> Do we really want to be that elaborate, though? I dunno. > > Adding a regex instead of the simple "blank line" test doesn't seem very > difficult to do, but I am doubtful that it will make any difference in > practice. But that can be done incrementally as well by the people who > would actually need it (who I strongly suspect do not exist in the first > place). At least, the damage can be limited to the cases we know would work well if we go that way.
Re: [PATCH 2/3] xdiff: -W: include immediately preceding non-empty lines in context
On 13/01/2017 20:51, Junio C Hamano wrote: René Scharfewrites: That's true, but I'm not sure "non-empty line before function line" is good enough a definition for desirable lines. It wouldn't work for people who don't believe in empty lines. Or for those that put a blank line between comment and function. (I have an opinion on such habits, but git diff should probably stay neutral.) And that's just for C code; I have no idea how this heuristic would hold up for other file types like HTML. As you are, I am fairly negative on the heuristic based on the "non-blank" thing. We tried once with compaction-heuristics already and it did not quite perform well. Let's not hardcode another one. The patch will work as intended and as expected for 95% of the users out there (javadoc, Doxygen, kerneldoc, etc. all have the comment immediately preceding the function) and fixes a very real problem for me (and I expect many others) _today_; for the remaining 5% (who put a blank line between their comment and the start of the function) it will revert back to the current behaviour, so there should be no regression for them. For the 0% who don't put even a single blank line between their functions, it will probably not work as expected, but then again I have never seen such a coding style in any language, so I am doubtful if this is something that needs to be taken into account in the first place. We can identify function lines with arbitrary precision (with a xfuncname regex, if needed), but there is no accurate way to classify lines as comments, or as the end of functions. Adding optional regexes for single- and multi-line comments would help, at least for C. The funcline regexp is used for two related but different purposes. It identifies a single line to be placed on @@ ... @@ line before a diff hunk. This line however does not have to be at the beginning of a function. It has to be the line that conveys the most significant information (e.g. the name of the function). The way "diff -W" codepath used it as if it were always the very first line of a function was bound to invite a patch like this, and if we want to be extra elaborate, I agree that an extra mechanism to say "the line the funcline regexp matches is not the beginning of a function, but the beginning is a line that matches this other regexp before that line" may help. Do we really want to be that elaborate, though? I dunno. Adding a regex instead of the simple "blank line" test doesn't seem very difficult to do, but I am doubtful that it will make any difference in practice. But that can be done incrementally as well by the people who would actually need it (who I strongly suspect do not exist in the first place). I wonder if it would be sufficient to make -W take an optional number, e.g. "git show -W4", to add extre context lines before the funcline. I don't like specifying a fixed number, that negates almost all the reason for using -W in the first place; I would much prefer adding a config variable to control the -W behaviour (or a new diff flag). Vegard
Re: [PATCH 2/3] xdiff: -W: include immediately preceding non-empty lines in context
René Scharfewrites: > Am 13.01.2017 um 17:15 schrieb Vegard Nossum: >> When using -W to include the whole function in the diff context, you >> are typically doing this to be able to review the change in its entirety >> within the context of the function. It is therefore almost always >> desirable to include any comments that immediately precede the function. >> >> This also the fixes the case for C where the declaration is split across >> multiple lines (where the first line of the declaration would not be >> included in the output), e.g.: >> >> void >> dummy(void) >> { >> ... >> } >> > > That's true, but I'm not sure "non-empty line before function line" is > good enough a definition for desirable lines. It wouldn't work for > people who don't believe in empty lines. Or for those that put a > blank line between comment and function. (I have an opinion on such > habits, but git diff should probably stay neutral.) And that's just > for C code; I have no idea how this heuristic would hold up for other > file types like HTML. As you are, I am fairly negative on the heuristic based on the "non-blank" thing. We tried once with compaction-heuristics already and it did not quite perform well. Let's not hardcode another one. > We can identify function lines with arbitrary precision (with a > xfuncname regex, if needed), but there is no accurate way to classify > lines as comments, or as the end of functions. Adding optional > regexes for single- and multi-line comments would help, at least for > C. The funcline regexp is used for two related but different purposes. It identifies a single line to be placed on @@ ... @@ line before a diff hunk. This line however does not have to be at the beginning of a function. It has to be the line that conveys the most significant information (e.g. the name of the function). The way "diff -W" codepath used it as if it were always the very first line of a function was bound to invite a patch like this, and if we want to be extra elaborate, I agree that an extra mechanism to say "the line the funcline regexp matches is not the beginning of a function, but the beginning is a line that matches this other regexp before that line" may help. Do we really want to be that elaborate, though? I dunno. I wonder if it would be sufficient to make -W take an optional number, e.g. "git show -W4", to add extre context lines before the funcline.
Re: [PATCH 2/3] xdiff: -W: include immediately preceding non-empty lines in context
On Fri, Jan 13, 2017 at 10:19 AM, René Scharfewrote: > Am 13.01.2017 um 17:15 schrieb Vegard Nossum: >> >> When using -W to include the whole function in the diff context, you >> are typically doing this to be able to review the change in its entirety >> within the context of the function. It is therefore almost always >> desirable to include any comments that immediately precede the function. Do we need a small comment in the actual code to hint at why we count upwards there? >> >> This also the fixes the case for C where the declaration is split across >> multiple lines (where the first line of the declaration would not be >> included in the output), e.g.: >> >> void >> dummy(void) >> { >> ... >> } >> > > That's true, but I'm not sure "non-empty line before function line" is good > enough a definition for desirable lines. It wouldn't work for people who > don't believe in empty lines. Or for those that put a blank line between > comment and function. (I have an opinion on such habits, but git diff > should probably stay neutral.) And that's just for C code; I have no idea > how this heuristic would hold up for other file types like HTML. I think empty lines are "good as a first approach", see e.g. 433860f3d0beb0c6 the "compaction" heuristic for a similar thing (the compaction was introduced at d634d61ed), and then we can build a more elaborate thing on top. > > We can identify function lines with arbitrary precision (with a xfuncname > regex, if needed), but there is no accurate way to classify lines as > comments, or as the end of functions. Adding optional regexes for single- > and multi-line comments would help, at least for C. That would cover Java and whole lot of other C like languages. So a good start as well IMHO. > > René
Re: [PATCH 2/3] xdiff: -W: include immediately preceding non-empty lines in context
Am 13.01.2017 um 17:15 schrieb Vegard Nossum: When using -W to include the whole function in the diff context, you are typically doing this to be able to review the change in its entirety within the context of the function. It is therefore almost always desirable to include any comments that immediately precede the function. This also the fixes the case for C where the declaration is split across multiple lines (where the first line of the declaration would not be included in the output), e.g.: void dummy(void) { ... } That's true, but I'm not sure "non-empty line before function line" is good enough a definition for desirable lines. It wouldn't work for people who don't believe in empty lines. Or for those that put a blank line between comment and function. (I have an opinion on such habits, but git diff should probably stay neutral.) And that's just for C code; I have no idea how this heuristic would hold up for other file types like HTML. We can identify function lines with arbitrary precision (with a xfuncname regex, if needed), but there is no accurate way to classify lines as comments, or as the end of functions. Adding optional regexes for single- and multi-line comments would help, at least for C. René