Re: [PATCH 2/3] xdiff: -W: include immediately preceding non-empty lines in context

2017-01-15 Thread Junio C Hamano
René Scharfe  writes:

> ...  How bad would it be to only
> implement the first part (as in the patch I just sent) without adding
> new config settings or parameters?

That probably is a good place to stop ;-)


Re: [PATCH 2/3] xdiff: -W: include immediately preceding non-empty lines in context

2017-01-15 Thread René Scharfe

Am 15.01.2017 um 11:06 schrieb Vegard Nossum:

On 15/01/2017 03:39, Junio C Hamano wrote:

René Scharfe  writes:

How about extending the context upward only up to and excluding a line
that is either empty *or* a function line?  That would limit the extra
context to a single function in the worst case.

Reducing context at the bottom with the aim to remove comments for the
next section is more tricky as it could remove part of the function
that we'd like to show if we get the boundary wrong.  How bad would it
be to keep the southern border unchanged?


I personally do not think there is any robust heuristic other than
Vegard's "a blank line may be a signal enough that lines before that
are not part of the beginning of the function", and I think your
"hence we look for a blank line but if there is a line that matches
the function header, stop there as we know we came too far back"
will be a good-enough safety measure.

I also agree with you that we probably do not want to futz with the
southern border.


You are right, trying to change the southern border in this way is not
quite reliable if there are no empty lines whatsoever and can
erroneously cause the function context to not include the bottom of the
function being changed.

I'm splitting the function boundary detection logic into separate
functions and trying to solve the above case without breaking the tests
(and adding a new test for the above case too).

I'll see if I can additionally provide some toggles (flags or config
variables) to control the new behaviour, what I had in mind was:

  -W[=preamble,=no-preamble]
  --function-context[=preamble,=no-preamble]
  diff.functionContextPreamble = 

(where the new logic is controlled by the new config variable and
overridden by the presence of =preamble or =no-preamble).


Adding comments before a function is useful, removing comments after a 
function sounds to me as only nice to have (under the assumption that 
they belong to the next function[*]).  How bad would it be to only 
implement the first part (as in the patch I just sent) without adding 
new config settings or parameters?


Thanks,
René


[*] Silly counter-example (the #endif line):
#ifdef SOMETHING
int f(...) {
// implementation for SOMETHING
}
#else
inf f(...) {
// implementation without SOMETHING
}
#endif /* SOMETHING */



Re: [PATCH 2/3] xdiff: -W: include immediately preceding non-empty lines in context

2017-01-15 Thread René Scharfe
Am 15.01.2017 um 03:39 schrieb Junio C Hamano:
> René Scharfe  writes:
> 
>>> I am also more focused on keeping the codebase maintainable in good
>>> health by making sure that we made an effort to find a solution that
>>> is general-enough before solving a single specific problem you have
>>> today.  We may end up deciding that a blank-line heuristics gives us
>>> good enough tradeoff, but I do not want us to make a decision before
>>> thinking.
>>
>> How about extending the context upward only up to and excluding a line
>> that is either empty *or* a function line?  That would limit the extra
>> context to a single function in the worst case.
>>
>> Reducing context at the bottom with the aim to remove comments for the
>> next section is more tricky as it could remove part of the function
>> that we'd like to show if we get the boundary wrong.  How bad would it
>> be to keep the southern border unchanged?
> 
> I personally do not think there is any robust heuristic other than
> Vegard's "a blank line may be a signal enough that lines before that
> are not part of the beginning of the function", and I think your
> "hence we look for a blank line but if there is a line that matches
> the function header, stop there as we know we came too far back"
> will be a good-enough safety measure.
> 
> I also agree with you that we probably do not want to futz with the
> southern border.

A replacement patch for 2/3 with these changes would look like this:

diff --git a/xdiff/xemit.c b/xdiff/xemit.c
index 8c88dbde38..9ed54cd318 100644
--- a/xdiff/xemit.c
+++ b/xdiff/xemit.c
@@ -174,11 +174,11 @@ int xdl_emit_diff(xdfenv_t *xe, xdchange_t *xscr, 
xdemitcb_t *ecb,
s2 = XDL_MAX(xch->i2 - xecfg->ctxlen, 0);
 
if (xecfg->flags & XDL_EMIT_FUNCCONTEXT) {
+   char dummy[1];
long fs1, i1 = xch->i1;
 
/* Appended chunk? */
if (i1 >= xe->xdf1.nrec) {
-   char dummy[1];
long i2 = xch->i2;
 
/*
@@ -200,6 +200,10 @@ int xdl_emit_diff(xdfenv_t *xe, xdchange_t *xscr, 
xdemitcb_t *ecb,
}
 
fs1 = get_func_line(xe, xecfg, NULL, i1, -1);
+   while (fs1 > 0 && !is_empty_rec(>xdf1, fs1 - 1) &&
+  match_func_rec(>xdf1, xecfg, fs1 - 1,
+ dummy, sizeof(dummy)) < 0)
+   fs1--;
if (fs1 < 0)
fs1 = 0;
if (fs1 < s1) {



Re: [PATCH 2/3] xdiff: -W: include immediately preceding non-empty lines in context

2017-01-15 Thread Vegard Nossum

On 15/01/2017 03:39, Junio C Hamano wrote:

René Scharfe  writes:


I am also more focused on keeping the codebase maintainable in good
health by making sure that we made an effort to find a solution that
is general-enough before solving a single specific problem you have
today.  We may end up deciding that a blank-line heuristics gives us
good enough tradeoff, but I do not want us to make a decision before
thinking.


You are right; I appreciate this approach.


How about extending the context upward only up to and excluding a line
that is either empty *or* a function line?  That would limit the extra
context to a single function in the worst case.

Reducing context at the bottom with the aim to remove comments for the
next section is more tricky as it could remove part of the function
that we'd like to show if we get the boundary wrong.  How bad would it
be to keep the southern border unchanged?


I personally do not think there is any robust heuristic other than
Vegard's "a blank line may be a signal enough that lines before that
are not part of the beginning of the function", and I think your
"hence we look for a blank line but if there is a line that matches
the function header, stop there as we know we came too far back"
will be a good-enough safety measure.

I also agree with you that we probably do not want to futz with the
southern border.


You are right, trying to change the southern border in this way is not
quite reliable if there are no empty lines whatsoever and can
erroneously cause the function context to not include the bottom of the
function being changed.

I'm splitting the function boundary detection logic into separate
functions and trying to solve the above case without breaking the tests
(and adding a new test for the above case too).

I'll see if I can additionally provide some toggles (flags or config
variables) to control the new behaviour, what I had in mind was:

  -W[=preamble,=no-preamble]
  --function-context[=preamble,=no-preamble]
  diff.functionContextPreamble = 

(where the new logic is controlled by the new config variable and
overridden by the presence of =preamble or =no-preamble).

Then it also shouldn't be too difficult to add

  diff..preamble = 
  diff..xpreamble = 

to override the heuristic used for function border detection in
exceptional cases.

You can argue about the naming now ;-) But I will use this for a start,
renaming/reworking it (or throwing it away) afterwards should be easy
once the code has been written.


Vegard


Re: [PATCH 2/3] xdiff: -W: include immediately preceding non-empty lines in context

2017-01-14 Thread Junio C Hamano
René Scharfe  writes:

>> I am also more focused on keeping the codebase maintainable in good
>> health by making sure that we made an effort to find a solution that
>> is general-enough before solving a single specific problem you have
>> today.  We may end up deciding that a blank-line heuristics gives us
>> good enough tradeoff, but I do not want us to make a decision before
>> thinking.
>
> How about extending the context upward only up to and excluding a line
> that is either empty *or* a function line?  That would limit the extra
> context to a single function in the worst case.
>
> Reducing context at the bottom with the aim to remove comments for the
> next section is more tricky as it could remove part of the function
> that we'd like to show if we get the boundary wrong.  How bad would it
> be to keep the southern border unchanged?

I personally do not think there is any robust heuristic other than
Vegard's "a blank line may be a signal enough that lines before that
are not part of the beginning of the function", and I think your
"hence we look for a blank line but if there is a line that matches
the function header, stop there as we know we came too far back"
will be a good-enough safety measure.

I also agree with you that we probably do not want to futz with the
southern border.

Thanks.


Re: [PATCH 2/3] xdiff: -W: include immediately preceding non-empty lines in context

2017-01-14 Thread René Scharfe

Am 14.01.2017 um 00:56 schrieb Junio C Hamano:

Vegard Nossum  writes:


The patch will work as intended and as expected for 95% of the users out
there (javadoc, Doxygen, kerneldoc, etc. all have the comment
immediately preceding the function) and fixes a very real problem for me
(and I expect many others) _today_; for the remaining 5% (who put a
blank line between their comment and the start of the function) it will
revert back to the current behaviour, so there should be no regression
for them.


I notice your 95% are all programming languages, but I am more
worried about the contents written in non programming languages
(René gave HTML an an example--there may be other types of contents
that we programmer types do not deal with every day, but Git users
depend on).

I am also more focused on keeping the codebase maintainable in good
health by making sure that we made an effort to find a solution that
is general-enough before solving a single specific problem you have
today.  We may end up deciding that a blank-line heuristics gives us
good enough tradeoff, but I do not want us to make a decision before
thinking.


How about extending the context upward only up to and excluding a line 
that is either empty *or* a function line?  That would limit the extra 
context to a single function in the worst case.


Reducing context at the bottom with the aim to remove comments for the 
next section is more tricky as it could remove part of the function that 
we'd like to show if we get the boundary wrong.  How bad would it be to 
keep the southern border unchanged?


René



Re: [PATCH 2/3] xdiff: -W: include immediately preceding non-empty lines in context

2017-01-13 Thread Junio C Hamano
Vegard Nossum  writes:

> The patch will work as intended and as expected for 95% of the users out
> there (javadoc, Doxygen, kerneldoc, etc. all have the comment
> immediately preceding the function) and fixes a very real problem for me
> (and I expect many others) _today_; for the remaining 5% (who put a
> blank line between their comment and the start of the function) it will
> revert back to the current behaviour, so there should be no regression
> for them.

I notice your 95% are all programming languages, but I am more
worried about the contents written in non programming languages
(René gave HTML an an example--there may be other types of contents
that we programmer types do not deal with every day, but Git users
depend on).  

I am also more focused on keeping the codebase maintainable in good
health by making sure that we made an effort to find a solution that
is general-enough before solving a single specific problem you have
today.  We may end up deciding that a blank-line heuristics gives us
good enough tradeoff, but I do not want us to make a decision before
thinking.

>> The way "diff -W" codepath used it as if it were always the very
>> first line of a function was bound to invite a patch like this, and
>> if we want to be extra elaborate, I agree that an extra mechanism to
>> say "the line the funcline regexp matches is not the beginning of a
>> function, but the beginning is a line that matches this other regexp
>> before that line" may help.
>>
>> Do we really want to be that elaborate, though?  I dunno.
>
> Adding a regex instead of the simple "blank line" test doesn't seem very
> difficult to do, but I am doubtful that it will make any difference in
> practice. But that can be done incrementally as well by the people who
> would actually need it (who I strongly suspect do not exist in the first
> place).

At least, the damage can be limited to the cases we know would work
well if we go that way.


Re: [PATCH 2/3] xdiff: -W: include immediately preceding non-empty lines in context

2017-01-13 Thread Vegard Nossum

On 13/01/2017 20:51, Junio C Hamano wrote:

René Scharfe  writes:

That's true, but I'm not sure "non-empty line before function line" is
good enough a definition for desirable lines.  It wouldn't work for
people who don't believe in empty lines.  Or for those that put a
blank line between comment and function.  (I have an opinion on such
habits, but git diff should probably stay neutral.)  And that's just
for C code; I have no idea how this heuristic would hold up for other
file types like HTML.


As you are, I am fairly negative on the heuristic based on the
"non-blank" thing.  We tried once with compaction-heuristics already
and it did not quite perform well.  Let's not hardcode another one.


The patch will work as intended and as expected for 95% of the users out
there (javadoc, Doxygen, kerneldoc, etc. all have the comment
immediately preceding the function) and fixes a very real problem for me
(and I expect many others) _today_; for the remaining 5% (who put a
blank line between their comment and the start of the function) it will
revert back to the current behaviour, so there should be no regression
for them.

For the 0% who don't put even a single blank line between their
functions, it will probably not work as expected, but then again I have
never seen such a coding style in any language, so I am doubtful if this
is something that needs to be taken into account in the first place.


We can identify function lines with arbitrary precision (with a
xfuncname regex, if needed), but there is no accurate way to classify
lines as comments, or as the end of functions.  Adding optional
regexes for single- and multi-line comments would help, at least for
C.


The funcline regexp is used for two related but different purposes.
It identifies a single line to be placed on @@ ... @@ line before a
diff hunk.  This line however does not have to be at the beginning
of a function.  It has to be the line that conveys the most
significant information (e.g. the name of the function).

The way "diff -W" codepath used it as if it were always the very
first line of a function was bound to invite a patch like this, and
if we want to be extra elaborate, I agree that an extra mechanism to
say "the line the funcline regexp matches is not the beginning of a
function, but the beginning is a line that matches this other regexp
before that line" may help.

Do we really want to be that elaborate, though?  I dunno.


Adding a regex instead of the simple "blank line" test doesn't seem very
difficult to do, but I am doubtful that it will make any difference in
practice. But that can be done incrementally as well by the people who
would actually need it (who I strongly suspect do not exist in the first
place).


I wonder if it would be sufficient to make -W take an optional
number, e.g. "git show -W4", to add extre context lines before the
funcline.



I don't like specifying a fixed number, that negates almost all the
reason for using -W in the first place; I would much prefer adding
a config variable to control the -W behaviour (or a new diff flag).


Vegard


Re: [PATCH 2/3] xdiff: -W: include immediately preceding non-empty lines in context

2017-01-13 Thread Junio C Hamano
René Scharfe  writes:

> Am 13.01.2017 um 17:15 schrieb Vegard Nossum:
>> When using -W to include the whole function in the diff context, you
>> are typically doing this to be able to review the change in its entirety
>> within the context of the function. It is therefore almost always
>> desirable to include any comments that immediately precede the function.
>>
>> This also the fixes the case for C where the declaration is split across
>> multiple lines (where the first line of the declaration would not be
>> included in the output), e.g.:
>>
>>  void
>>  dummy(void)
>>  {
>>  ...
>>  }
>>
>
> That's true, but I'm not sure "non-empty line before function line" is
> good enough a definition for desirable lines.  It wouldn't work for
> people who don't believe in empty lines.  Or for those that put a
> blank line between comment and function.  (I have an opinion on such
> habits, but git diff should probably stay neutral.)  And that's just
> for C code; I have no idea how this heuristic would hold up for other
> file types like HTML.

As you are, I am fairly negative on the heuristic based on the
"non-blank" thing.  We tried once with compaction-heuristics already
and it did not quite perform well.  Let's not hardcode another one.

> We can identify function lines with arbitrary precision (with a
> xfuncname regex, if needed), but there is no accurate way to classify
> lines as comments, or as the end of functions.  Adding optional
> regexes for single- and multi-line comments would help, at least for
> C.

The funcline regexp is used for two related but different purposes.
It identifies a single line to be placed on @@ ... @@ line before a
diff hunk.  This line however does not have to be at the beginning
of a function.  It has to be the line that conveys the most
significant information (e.g. the name of the function).

The way "diff -W" codepath used it as if it were always the very
first line of a function was bound to invite a patch like this, and
if we want to be extra elaborate, I agree that an extra mechanism to
say "the line the funcline regexp matches is not the beginning of a
function, but the beginning is a line that matches this other regexp
before that line" may help.

Do we really want to be that elaborate, though?  I dunno.

I wonder if it would be sufficient to make -W take an optional
number, e.g. "git show -W4", to add extre context lines before the
funcline.


Re: [PATCH 2/3] xdiff: -W: include immediately preceding non-empty lines in context

2017-01-13 Thread Stefan Beller
On Fri, Jan 13, 2017 at 10:19 AM, René Scharfe  wrote:
> Am 13.01.2017 um 17:15 schrieb Vegard Nossum:
>>
>> When using -W to include the whole function in the diff context, you
>> are typically doing this to be able to review the change in its entirety
>> within the context of the function. It is therefore almost always
>> desirable to include any comments that immediately precede the function.

Do we need a small comment in the actual code to hint at why we count
upwards there?

>>
>> This also the fixes the case for C where the declaration is split across
>> multiple lines (where the first line of the declaration would not be
>> included in the output), e.g.:
>>
>> void
>> dummy(void)
>> {
>> ...
>> }
>>
>
> That's true, but I'm not sure "non-empty line before function line" is good
> enough a definition for desirable lines.  It wouldn't work for people who
> don't believe in empty lines.  Or for those that put a blank line between
> comment and function.  (I have an opinion on such habits, but git diff
> should probably stay neutral.)  And that's just for C code; I have no idea
> how this heuristic would hold up for other file types like HTML.

I think empty lines are "good as a first approach", see e.g.
433860f3d0beb0c6 the "compaction" heuristic for a similar
thing (the compaction was introduced at d634d61ed), and then
we can build a more elaborate thing on top.

>
> We can identify function lines with arbitrary precision (with a xfuncname
> regex, if needed), but there is no accurate way to classify lines as
> comments, or as the end of functions.  Adding optional regexes for single-
> and multi-line comments would help, at least for C.

That would cover Java and whole lot of other C like languages. So a good
start as well IMHO.

>
> René


Re: [PATCH 2/3] xdiff: -W: include immediately preceding non-empty lines in context

2017-01-13 Thread René Scharfe

Am 13.01.2017 um 17:15 schrieb Vegard Nossum:

When using -W to include the whole function in the diff context, you
are typically doing this to be able to review the change in its entirety
within the context of the function. It is therefore almost always
desirable to include any comments that immediately precede the function.

This also the fixes the case for C where the declaration is split across
multiple lines (where the first line of the declaration would not be
included in the output), e.g.:

void
dummy(void)
{
...
}



That's true, but I'm not sure "non-empty line before function line" is 
good enough a definition for desirable lines.  It wouldn't work for 
people who don't believe in empty lines.  Or for those that put a blank 
line between comment and function.  (I have an opinion on such habits, 
but git diff should probably stay neutral.)  And that's just for C code; 
I have no idea how this heuristic would hold up for other file types 
like HTML.


We can identify function lines with arbitrary precision (with a 
xfuncname regex, if needed), but there is no accurate way to classify 
lines as comments, or as the end of functions.  Adding optional regexes 
for single- and multi-line comments would help, at least for C.


René