Re: [RFC/PATCH 0/2] place cherry pick line below commit title

2016-10-06 Thread Junio C Hamano
Jonathan Tan  writes:

> How important is this feature? It doesn't seem too difficult to add,
> although it does break compatibility (in particular, "--signoff" must
> now be documented as "after the last trailer" instead of "at the end
> of the commit message").

The sign-off has always been meant to be appended at the end of the
trailer block, so there is no "compatibility" issue.  It's just how
you phrase the explanation of the behaviour, I would think.


Re: [RFC/PATCH 0/2] place cherry pick line below commit title

2016-10-05 Thread Junio C Hamano
Jonathan Tan  writes:

> Sounds reasonable to me. Would the "[" be a bit of overspecification,
> though, since Git doesn't produce it? Also, identifying it as a
> garbage line probably wouldn't change any behavior - in the Linux
> kernel examples, it is used to show what happened in between
> sign-offs, so there will always be one "Signed-off-by:" at the top.

Good thinking.  As "interpret trailers" cannot locate such a line to
manipulate (as it lacks ), we can simply treat it as a
garbage line.

>> struct {
>>  const char *whole;
>>  const char *end_of_message_proper;
>>  struct {
>>  const char *token;
>>  const char *contents;
>>  } *trailer;
>>  int alloc_trailers, nr_trailers;
>> };
>>
>> where
>>
>>  - whole points at the first byte of the input, i.e. the beginning
>>of the commit message buffer.
>>
>>  - end-of-message-proper points at the first byte of the trailer
>>block into the buffer at "whole".
>>
>>  - token is a canonical header name for easy comparison for
>>interpret-trailers (you can use NULL for garbage lines, and made
>>up token like "[bracket]" and "(cherrypick)" that would not clash
>>with real tokens like "Signed-off-by").
>>
>>  - contents is the bytes on the logical line, including the header
>>part
>>
>> E.g. an element in trailer[] array may say
>>
>> {
>>  .token = "Signed-off-by",
>> .contents = "Signed-Off-By: Some Body \n",
>> }
>
> I get the impression from the rest of your e-mail that no strings are
> meant to be copied - is that true? (That sounds like a good idea to
> me.)

I was envisioning that "whole", "end-of-message" can point into the
input buffer, while trailer[].contents may have to be copied, if
only to make it to a NUL-terminated string.  But I am fine with
 or  pair to avoid copying .contents if that is
desired.  You'd need to worry about differentiating .contents that
need to be freed after "interpret trailers" inserted a new entry or
replaced the contents, though.


Re: [RFC/PATCH 0/2] place cherry pick line below commit title

2016-10-05 Thread Jonathan Tan

On 10/04/2016 11:28 AM, Junio C Hamano wrote:

An addendum.  We may also want to be prepared to accept an input
that has some garbage lines _after_ the trailer block, if we can
clearly identify them as such.  For example, we could change the
definition of "the last paragraph" as "the block of lines that
do not have any empty (or blank) line, that appear either at the end
of the input, or immediately before three-dash lines", to allow

commit title

explanation of the change

Signed-off-by: Some Body 
[some other things]
Acked-by: Some Other Person 

---
 additional comment

which (unfortunately) is a rather common pattern for people who plan
to send the commit over e-mail.

If we add a new field "const char *beginning_of_tail_garbage" next
to "end_of_message_proper" that points at the blank line before the
three-dash line in the above example, the parser should be able to
break such an input into a parsed form, allow the trailer[] array to
be manipulated and reproduce a commit log message.


How important is this feature? It doesn't seem too difficult to add, 
although it does break compatibility (in particular, "--signoff" must 
now be documented as "after the last trailer" instead of "at the end of 
the commit message").


Re: [RFC/PATCH 0/2] place cherry pick line below commit title

2016-10-05 Thread Jonathan Tan

On 10/04/2016 10:25 AM, Junio C Hamano wrote:

So I would say it is perfectly OK if your update works only for
cases we can clearly define the semantics for.  For example, we can
even start with something simple like:

 * A RFC822-header like line, together with any number of whitespace
   indented lines that immediately follow it, will be taken as a
   single logical trailer element (with embedded LF in it if it uses
   the "line folding").  For the purpose of "replace", the entire
   single logical trailer element is replaced.

 * A line that begins with "(cherry picked from" and "[" becomes a
   single logical trailer element.  No continuation of anything
   fancy.

 * A line with any other shape is a garbage line in a trailer
   block.  It is kept in its place, but because it does not even
   have  part, it will not participate in locating with
   "trailer.where", "trailer.ifexists", etc.


Sounds reasonable to me. Would the "[" be a bit of overspecification, 
though, since Git doesn't produce it? Also, identifying it as a garbage 
line probably wouldn't change any behavior - in the Linux kernel 
examples, it is used to show what happened in between sign-offs, so 
there will always be one "Signed-off-by:" at the top.  (But I do not 
feel strongly about this.)



A block of lines that appear as the last paragraph in a commit
message is a trailer block if and only if certain number or
percentage of lines are non-garbage lines according to the above
definition.


I think the number should be 1 - that seems like the easiest to explain. 
But I'm OK with other suggestions.



I wonder if we can share a new helper function to do the detection
(and classification) of a trailer block and parsing the logical
lines out of a commit log message.  The function signature could be
as simple as taking a single  (or a strbuf) that holds
a commit log message, and splitting it out into something like:

struct {
const char *whole;
const char *end_of_message_proper;
struct {
const char *token;
const char *contents;
} *trailer;
int alloc_trailers, nr_trailers;
};

where

 - whole points at the first byte of the input, i.e. the beginning
   of the commit message buffer.

 - end-of-message-proper points at the first byte of the trailer
   block into the buffer at "whole".

 - token is a canonical header name for easy comparison for
   interpret-trailers (you can use NULL for garbage lines, and made
   up token like "[bracket]" and "(cherrypick)" that would not clash
   with real tokens like "Signed-off-by").

 - contents is the bytes on the logical line, including the header
   part

E.g. an element in trailer[] array may say

{
.token = "Signed-off-by",
.contents = "Signed-Off-By: Some Body \n",
}


I get the impression from the rest of your e-mail that no strings are 
meant to be copied - is that true? (That sounds like a good idea to me.) 
In which case this might be better:


  struct {
const char *first_trailer; /* = end_of_message_proper */
struct {
  const char *start;
  const char *value;
  const char *end;
} *trailers;
int trailers_nr, trailers_alloc;
  };

start = value for "[", "(cherry picked from" and garbage lines. We also 
need end because there is no \0 there (we didn't copy any strings).


The existing code (in trailer.c) uses a linked list to store trailers, 
but an array (as written in your e-mail) is probably better for us since 
clients would want to access the last element (as also written in your 
e-mail).



With something like that, you can manipulate the "insert at ...",
"replace", etc. in the trailer[] array and then produce an updated
commit message fairly easily (i.e. copy out the bytes beginning at
"whole" up to "end_of_message_proper", then iterate over trailer[]
array and show their contents field).  The codepaths in the core
part only need to know

 - how to check the last item in trailer[] array to see if it ends
   with the same sign-off as they are trying to add.

 - how to append one new element to the trailer[] array.

 - reproduce an updated commit log message after the above.


I don't think we need trailer block struct -> commit message conversion 
- when adding a new trailer or replacing an existing trailer, the client 
code can just remember the index and then modify its behavior 
accordingly when iterating through all trailers. But this conversion can 
be easily added if/when we need it.


> Hmm?

Overall, this seems like a good idea - I'll go ahead and do this if 
there are no other objections.


It just occurred to me that there could be some corner cases when the 
trailer separator is configured to not include ":" - I'll make sure to 
include tests that check those corner cases.


Re: [RFC/PATCH 0/2] place cherry pick line below commit title

2016-10-04 Thread Junio C Hamano
Junio C Hamano  writes:

> A block of lines that appear as the last paragraph in a commit
> message is a trailer block if and only if certain number or
> percentage of lines are non-garbage lines according to the above
> definition.
> ...
> I wonder if we can share a new helper function to do the detection
> (and classification) of a trailer block and parsing the logical
> lines out of a commit log message.  The function signature could be
> as simple as taking a single  (or a strbuf) that holds
> a commit log message, and splitting it out into something like:
>
> struct {
>   const char *whole;
>   const char *end_of_message_proper;
>   struct {
>   const char *token;
>   const char *contents;
>   } *trailer;
>   int alloc_trailers, nr_trailers;
> };
>
> where 
> ...

An addendum.  We may also want to be prepared to accept an input
that has some garbage lines _after_ the trailer block, if we can
clearly identify them as such.  For example, we could change the
definition of "the last paragraph" as "the block of lines that
do not have any empty (or blank) line, that appear either at the end
of the input, or immediately before three-dash lines", to allow

commit title

explanation of the change

Signed-off-by: Some Body 
[some other things]
Acked-by: Some Other Person 

---
 additional comment

which (unfortunately) is a rather common pattern for people who plan
to send the commit over e-mail.

If we add a new field "const char *beginning_of_tail_garbage" next
to "end_of_message_proper" that points at the blank line before the
three-dash line in the above example, the parser should be able to
break such an input into a parsed form, allow the trailer[] array to
be manipulated and reproduce a commit log message.



Re: [RFC/PATCH 0/2] place cherry pick line below commit title

2016-10-04 Thread Junio C Hamano
Jonathan Tan  writes:

> One alternative is to postpone this decision by changing sequencer
> only (and not trailer) to tolerate other lines in the trailer. This
> would make them even more divergent (sequencer supports arbitrary
> lines while trailer doesn't), but they were divergent already
> (sequencer supports "(cherry picked by" but trailer doesn't).

Given that we internally do not use the "trailers" for anything
real, anything you decide to do here would be an improvement ;-)
Before, users couldn't even get any of the examples (below, from
your message) recognized as trailer blocks.

>   Signed-off-by: A 
>   [This has nothing to do with the above line]
>   Signed-off-by: B 
>
> and:
>
>   Link 1: a link
> a continuation of the above
>
> and:
>
>   Signed-off-by: Some body  (comment
>   on two lines)

So I would say it is perfectly OK if your update works only for
cases we can clearly define the semantics for.  For example, we can
even start with something simple like:

 * A RFC822-header like line, together with any number of whitespace
   indented lines that immediately follow it, will be taken as a
   single logical trailer element (with embedded LF in it if it uses
   the "line folding").  For the purpose of "replace", the entire
   single logical trailer element is replaced.

 * A line that begins with "(cherry picked from" and "[" becomes a
   single logical trailer element.  No continuation of anything
   fancy.

 * A line with any other shape is a garbage line in a trailer
   block.  It is kept in its place, but because it does not even
   have  part, it will not participate in locating with
   "trailer.where", "trailer.ifexists", etc.

A block of lines that appear as the last paragraph in a commit
message is a trailer block if and only if certain number or
percentage of lines are non-garbage lines according to the above
definition.

The operations done by the codepaths in the core part of the system
are much simpler subset of what "interpret-trailers" wants to
support, namely, 

 - append "(cherry picked from X)" at the end.

 - append "S-o-b:" at the end.

 - append "S-o-b:" unless the same line appears as the last line in
   the existing trailer block.

and these are quite compatible with a simplified definition of what
a logical line is illustrated in the above example, I would think.

I wonder if we can share a new helper function to do the detection
(and classification) of a trailer block and parsing the logical
lines out of a commit log message.  The function signature could be
as simple as taking a single  (or a strbuf) that holds
a commit log message, and splitting it out into something like:

struct {
const char *whole;
const char *end_of_message_proper;
struct {
const char *token;
const char *contents;
} *trailer;
int alloc_trailers, nr_trailers;
};

where 

 - whole points at the first byte of the input, i.e. the beginning
   of the commit message buffer.

 - end-of-message-proper points at the first byte of the trailer
   block into the buffer at "whole".

 - token is a canonical header name for easy comparison for
   interpret-trailers (you can use NULL for garbage lines, and made
   up token like "[bracket]" and "(cherrypick)" that would not clash
   with real tokens like "Signed-off-by").

 - contents is the bytes on the logical line, including the header
   part

E.g. an element in trailer[] array may say

{
.token = "Signed-off-by",
.contents = "Signed-Off-By: Some Body \n",
}

With something like that, you can manipulate the "insert at ...",
"replace", etc. in the trailer[] array and then produce an updated
commit message fairly easily (i.e. copy out the bytes beginning at
"whole" up to "end_of_message_proper", then iterate over trailer[]
array and show their contents field).  The codepaths in the core
part only need to know 

 - how to check the last item in trailer[] array to see if it ends
   with the same sign-off as they are trying to add.

 - how to append one new element to the trailer[] array.

 - reproduce an updated commit log message after the above.

Hmm?


Re: [RFC/PATCH 0/2] place cherry pick line below commit title

2016-10-03 Thread Jonathan Tan

On 10/03/2016 03:13 PM, Junio C Hamano wrote:

Jonathan Tan  writes:


There are other options like checking for indentation or checking for
balanced parentheses/brackets, but I think that these would lead to
surprising behavior for the user (this would mean that whitespace or
certain characters could turn a valid trailer into an invalid one or
vice versa, or change the behavior of trailer.ifexists, especially
"replace").


Yes, that is exactly why I said that it may be necessary for the
code to analize the lines in a block identified as "likely to be a
trailing block" more carefully.  We can afford to be loose as long
as the only allowed operation is to append one at the end, but once
we start removing/replacing an existing entry, etc., the definition
of what an entry is becomes very much relevant.


I agree, and I was trying to discuss the possible alternatives for the 
definition of what an entry is in my previous e-mail.


If you think that the alternatives are still too loose, I'm not sure if 
we can make it any tighter. As far as I know, we're dealing with 
trailers like the following:


  Signed-off-by: A 
  [This has nothing to do with the above line]
  Signed-off-by: B 

and:

  Link 1: a link
a continuation of the above

and:

  Signed-off-by: Some body  (comment
  on two lines)

As I stated in the quoted paragraph, one possibility is to use 
indentation and/or balanced parentheses/brackets to determine if a 
trailer line continues onto the next line, and this would handle all the 
above cases, but I still think that these would lead to surprising 
behavior. Hence my suggestion to just simply define it as a single 
physical line. But if you think that the pros (of the more complicated 
approach) outweigh the cons, I'm OK with that.


One alternative is to postpone this decision by changing sequencer only 
(and not trailer) to tolerate other lines in the trailer. This would 
make them even more divergent (sequencer supports arbitrary lines while 
trailer doesn't), but they were divergent already (sequencer supports 
"(cherry picked by" but trailer doesn't).


Re: [RFC/PATCH 0/2] place cherry pick line below commit title

2016-10-03 Thread Junio C Hamano
Jonathan Tan  writes:

> There are other options like checking for indentation or checking for
> balanced parentheses/brackets, but I think that these would lead to
> surprising behavior for the user (this would mean that whitespace or
> certain characters could turn a valid trailer into an invalid one or
> vice versa, or change the behavior of trailer.ifexists, especially
> "replace").

Yes, that is exactly why I said that it may be necessary for the
code to analize the lines in a block identified as "likely to be a
trailing block" more carefully.  We can afford to be loose as long
as the only allowed operation is to append one at the end, but once
we start removing/replacing an existing entry, etc., the definition
of what an entry is becomes very much relevant.



Re: [RFC/PATCH 0/2] place cherry pick line below commit title

2016-10-03 Thread Jonathan Tan

On 10/03/2016 12:17 PM, Junio C Hamano wrote:

It may be necessary for the code to analize the lines in a block
identified as "likely to be a trailing block" more carefully,
though.  The example 59f0aa94 in the message you are responding to
has "Link 1:" that consists of 3 physical lines.  An instruction to
interpret-trailers to add a new one _after_ "Link-$n:" may have to
treat these as a single logical line and do the addition after them,
i.e. before "Link 2:" line, for example.

I also saw

Signed-off-by: Some body  (some comment
that extends to the next line without being indented)
Sined-off-by: Some body Else 

where the only clue that the second line is logically a part of the
first one was the balancing of parentheses (or [brakets]).  To
accomodate real-world use cases, you may have to take into account a
lot more than the strict rfc-822 style "line folding".


If we define a logical trailer line as the set of physical lines until 
the next trailer line...it is true that this has some precedence in that 
such lines can be written (although this might not be intentional):


  $ git interpret-trailers --trailer "a=foo
  bar" 1. Checking for existence (trailer.ifexists) is now conceptually more 
difficult. For example, handling of inner whitespace in between lines 
might be confusing (currently, there is only one line, and whitespace 
handling is clearly described).

2. Replacement (trailer.ifexists=replace) might replace more than expected.
3. There is a corner case - the first line might not be a trailer line. 
Even if interpret-trailers knew about "(cherry picked from", a user 
might enter something else there. (We could just declare such blocks as 
non-trailers, but if we are already loosening the definition of a 
trailer, this might be something that we should allow.)


My initial thought was to think of a trailer as a block of trailer lines 
possibly interspersed with other lines. This leads to interpret-trailers 
placing the trailer in the wrong place if trailer.where=after, as you 
describe, but this might not be a big problem if trailer.where=after is 
not widely used (and it is not the default). (The other options behave 
as expected.)


There are other options like checking for indentation or checking for 
balanced parentheses/brackets, but I think that these would lead to 
surprising behavior for the user (this would mean that whitespace or 
certain characters could turn a valid trailer into an invalid one or 
vice versa, or change the behavior of trailer.ifexists, especially 
"replace").


Re: [RFC/PATCH 0/2] place cherry pick line below commit title

2016-10-03 Thread Junio C Hamano
Jonathan Tan  writes:

> That sounds reasonable to me. Would a patch set to implement this new
> trailer block heuristic (in both sequencer and trailer) be reasonable?
> And if yes, should trailer know about the "(cherry picked from"
> prefix? (I can see it both ways - knowing about the "(cherry picked
> from" prefix would make it consistent with sequencer, but it seems
> like a detail that it shouldn't know about. Writing
> "Cherry-picked-from:" instead probably wouldn't solve our problem
> because, for backwards compatibility, we would still need to support
> reading the old format.)

If we were to go that route, I'd suggest keeping the historical
practice supported, exactly because you would need to be prepared to
cherry-pick an old commit.

It may be necessary for the code to analize the lines in a block
identified as "likely to be a trailing block" more carefully,
though.  The example 59f0aa94 in the message you are responding to
has "Link 1:" that consists of 3 physical lines.  An instruction to
interpret-trailers to add a new one _after_ "Link-$n:" may have to
treat these as a single logical line and do the addition after them,
i.e. before "Link 2:" line, for example.

I also saw

Signed-off-by: Some body  (some comment
that extends to the next line without being indented)
Sined-off-by: Some body Else 

where the only clue that the second line is logically a part of the
first one was the balancing of parentheses (or [brakets]).  To
accomodate real-world use cases, you may have to take into account a
lot more than the strict rfc-822 style "line folding".




Re: [RFC/PATCH 0/2] place cherry pick line below commit title

2016-10-03 Thread Jonathan Tan

On 09/30/2016 01:49 PM, Junio C Hamano wrote:

Junio C Hamano  writes:


Jonathan Tan  writes:


I vaguely recall that there were some discussion on the definition
of "what's a trailer line" with folks from the kernel land, perhaps
while discussing the interpret-trailers topic.  IIRC, when somebody
passes an improved version along, the resulting message's trailer
block may look like this:

Signed-off-by: Original Author 
[fixed typo in the variable names]
Signed-off-by: Somebhody Else 

and an obvious "wish" of theirs was to treat not just RFC2822-like
"a line that begins with token followed by a colon" but also these
short comments as part of the trailer block.  Your original wish in
[*1*] is to also treat "a line that begin with a whitespace that
follows a line that begins with token followed by a colon" as part
of the trailer block and I personally think that is a reasonable
thing to wish for, too.


If we allowed arbitrary lines in the trailer block, this would solve
my original problem, yes.


Here is an experiment I ran during my lunch break.  The script
(attached) is meant to run in the kernel repository and
for each log messages of each non-merge commit:

 * find its last paragraph, where the definition of paragraph is
   simply "a blank/empty line";

 * inspect if there is at least one RFC2822-header-looking line, or
   a line that begins with "(cherry picked from";

 * dump the ones that do not pass the above criteria.

My cursory look of the output did not spot a legitimate trailer
block that we should have identified.  The output lines shown were
ones that are not signed off at all (e.g. af8c34ce6ae32add that says
"Linux 4.7-rc2"), ones that has three-dash line "---" in them
(e.g. 133d558216d9), ones that has diffstat that should have been
after "---" (e.g. 259307074bfcf1f).

The story is the same if you run it in git.git; the "do we have at
least one rfc2822-header-looking line or '(cherry picked from' line
in the last paragraph? if so, then that is an existing trailer
block" seems to be a good heuristics to cover many cases like
these:

d0196c8d5d3057c5c21a82f3d0113ca8e501033b
Signed-off-by: Arnd Bergmann 
[tomi.valkei...@ti.com: resolved conflicts]
Signed-off-by: Tomi Valkeinen 

59f0aa9480cfef9173a648cec4537addc5f3ad94
Link 1: https://bugzilla.kernel.org/show_bug.cgi?id=9916
http://bugzilla.kernel.org/show_bug.cgi?id=10100
https://lkml.org/lkml/2008/2/25/282
Link 2: https://bugzilla.kernel.org/show_bug.cgi?id=9399
https://bugzilla.kernel.org/show_bug.cgi?id=12461
https://bugzilla.kernel.org/show_bug.cgi?id=11880
Link 3: https://bugzilla.kernel.org/show_bug.cgi?id=11884
https://bugzilla.kernel.org/show_bug.cgi?id=14081
https://bugzilla.kernel.org/show_bug.cgi?id=14086
https://bugzilla.kernel.org/show_bug.cgi?id=14446
Link 4: https://bugzilla.kernel.org/show_bug.cgi?id=112911
Signed-off-by: Lv Zheng 
Tested-by: Chris Bainbridge 
Signed-off-by: Rafael J. Wysocki 


That sounds reasonable to me. Would a patch set to implement this new 
trailer block heuristic (in both sequencer and trailer) be reasonable? 
And if yes, should trailer know about the "(cherry picked from" prefix? 
(I can see it both ways - knowing about the "(cherry picked from" prefix 
would make it consistent with sequencer, but it seems like a detail that 
it shouldn't know about. Writing "Cherry-picked-from:" instead probably 
wouldn't solve our problem because, for backwards compatibility, we 
would still need to support reading the old format.)


Re: [RFC/PATCH 0/2] place cherry pick line below commit title

2016-10-03 Thread Junio C Hamano
Jonathan Tan  writes:

> On 09/30/2016 12:34 PM, Junio C Hamano wrote:
>>> 2) The Linux kernel's repository has some "commit ... upstream." lines
>>> in this position (below the commit title) - for example, in commit
>>> dacc0987fd2e.
>>
>> "A group of people seem to prefer it there" does not lead to
>> "therefore let's move it there for everybody".  It does open a
>> possibility that we may want to add a new option to put it there,
>> but does not justify changing what existing "-x" option does.
>
> To clarify, my patch adds the new option you described (to place it
> below the title instead of at the bottom of the commit message). The
> default is still the current behavior.

Ah, sorry, I missed that.  No objection from me on this point then.

Thanks.


Re: [RFC/PATCH 0/2] place cherry pick line below commit title

2016-09-30 Thread Junio C Hamano
Junio C Hamano  writes:

> Jonathan Tan  writes:
>
>>> I vaguely recall that there were some discussion on the definition
>>> of "what's a trailer line" with folks from the kernel land, perhaps
>>> while discussing the interpret-trailers topic.  IIRC, when somebody
>>> passes an improved version along, the resulting message's trailer
>>> block may look like this:
>>>
>>> Signed-off-by: Original Author 
>>> [fixed typo in the variable names]
>>> Signed-off-by: Somebhody Else 
>>>
>>> and an obvious "wish" of theirs was to treat not just RFC2822-like
>>> "a line that begins with token followed by a colon" but also these
>>> short comments as part of the trailer block.  Your original wish in
>>> [*1*] is to also treat "a line that begin with a whitespace that
>>> follows a line that begins with token followed by a colon" as part
>>> of the trailer block and I personally think that is a reasonable
>>> thing to wish for, too.
>>
>> If we allowed arbitrary lines in the trailer block, this would solve
>> my original problem, yes.

Here is an experiment I ran during my lunch break.  The script
(attached) is meant to run in the kernel repository and
for each log messages of each non-merge commit:

 * find its last paragraph, where the definition of paragraph is
   simply "a blank/empty line";

 * inspect if there is at least one RFC2822-header-looking line, or
   a line that begins with "(cherry picked from";

 * dump the ones that do not pass the above criteria.

My cursory look of the output did not spot a legitimate trailer
block that we should have identified.  The output lines shown were
ones that are not signed off at all (e.g. af8c34ce6ae32add that says
"Linux 4.7-rc2"), ones that has three-dash line "---" in them
(e.g. 133d558216d9), ones that has diffstat that should have been
after "---" (e.g. 259307074bfcf1f).

The story is the same if you run it in git.git; the "do we have at
least one rfc2822-header-looking line or '(cherry picked from' line
in the last paragraph? if so, then that is an existing trailer
block" seems to be a good heuristics to cover many cases like
these:

d0196c8d5d3057c5c21a82f3d0113ca8e501033b
Signed-off-by: Arnd Bergmann 
[tomi.valkei...@ti.com: resolved conflicts]
Signed-off-by: Tomi Valkeinen 

59f0aa9480cfef9173a648cec4537addc5f3ad94
Link 1: https://bugzilla.kernel.org/show_bug.cgi?id=9916
http://bugzilla.kernel.org/show_bug.cgi?id=10100
https://lkml.org/lkml/2008/2/25/282
Link 2: https://bugzilla.kernel.org/show_bug.cgi?id=9399
https://bugzilla.kernel.org/show_bug.cgi?id=12461
https://bugzilla.kernel.org/show_bug.cgi?id=11880
Link 3: https://bugzilla.kernel.org/show_bug.cgi?id=11884
https://bugzilla.kernel.org/show_bug.cgi?id=14081
https://bugzilla.kernel.org/show_bug.cgi?id=14086
https://bugzilla.kernel.org/show_bug.cgi?id=14446
Link 4: https://bugzilla.kernel.org/show_bug.cgi?id=112911
Signed-off-by: Lv Zheng 
Tested-by: Chris Bainbridge 
Signed-off-by: Rafael J. Wysocki 

-- >8 --
#!/bin/sh

git log --no-merges |
perl -e '
sub flush {
my ($commit, @lines) = @_;
my $seen_good = 0;
for (@lines) {
if (/^[-A-Za-z0-9]+: / ||
/^\(cherry picked from/) {
$seen_good = 1;
last;
}
}
if (!$seen_good) {
print "\n$commit\n";
for (@lines) {
print;
}
}
}

my (@lines, $this);
while (<>) {
if (/^commit (.*)$/) {
my $next = $1;
flush($this, @lines);
@lines = ();
$this = $next;
}
if (s/^//) {
if (/^\s*$/) {
@lines = ();
} else {
push @lines, $_;
}
}
}
if (@lines && $this) {
flush($this, @lines);
}
'


Re: [RFC/PATCH 0/2] place cherry pick line below commit title

2016-09-30 Thread Jonathan Tan

On 09/30/2016 12:34 PM, Junio C Hamano wrote:

2) The Linux kernel's repository has some "commit ... upstream." lines
in this position (below the commit title) - for example, in commit
dacc0987fd2e.


"A group of people seem to prefer it there" does not lead to
"therefore let's move it there for everybody".  It does open a
possibility that we may want to add a new option to put it there,
but does not justify changing what existing "-x" option does.


To clarify, my patch adds the new option you described (to place it 
below the title instead of at the bottom of the commit message). The 
default is still the current behavior.


Re: [RFC/PATCH 0/2] place cherry pick line below commit title

2016-09-30 Thread Junio C Hamano
Jonathan Tan  writes:

>> I vaguely recall that there were some discussion on the definition
>> of "what's a trailer line" with folks from the kernel land, perhaps
>> while discussing the interpret-trailers topic.  IIRC, when somebody
>> passes an improved version along, the resulting message's trailer
>> block may look like this:
>>
>> Signed-off-by: Original Author 
>> [fixed typo in the variable names]
>> Signed-off-by: Somebhody Else 
>>
>> and an obvious "wish" of theirs was to treat not just RFC2822-like
>> "a line that begins with token followed by a colon" but also these
>> short comments as part of the trailer block.  Your original wish in
>> [*1*] is to also treat "a line that begin with a whitespace that
>> follows a line that begins with token followed by a colon" as part
>> of the trailer block and I personally think that is a reasonable
>> thing to wish for, too.
>
> If we allowed arbitrary lines in the trailer block, this would solve
> my original problem, yes.

OK.

> Looking at that, it seems that sequencer.c started interpreting the
> last paragraph of the commit message as a footer and adding an
> exception for "cherry picked from" in commit b971e04 ("sequencer.c:
> always separate "(cherry picked from" from commit body",
> 2013-02-12). So the interpretations of sequencer.c and
> interpret-trailers were already divergent, but I should have probably
> at least discussed that.

It is not too late to discuss it.  I still think it is a good longer
term plan to try to unify the definition of what a trailer block is
and the implementation of the code that determines the boundary
between the log message proper and the trailer block and that allows
us to manipulate the trailer block, that currently is scattered
across multiple places into one.  Historically, "commit -s" had one
(because it needed to decide if it needs to see if the last sign-off
is already the one it is adding, and to decide if a blank line is
needed before the sing-off being added), "am -s" had another, and
"cherry-pick" probably had one, too.  "interpret-trailers" was, at
least originally, envisioned as an effort to develop a unified
machinery that can be called from these codepaths, and to aid the
development and encourage its use, it also had its own end-user
facing command.  Your interest in the "trailer" topic may be a good
trigger for us to further that original vision.

> As for a reason:
>
> 1) I do not have a specific reason for placing it in that exact
> position, but I would like to be able to place the "cherry picked
> from" line without affecting the last paragraph (specifically, without
> making the "cherry picked from" line the only line in the last
> paragraph).
> ...
> 1a) (Avoiding the footer might also be a good way of more clearly
> defining what the footer is. For example, currently, "cherry picked
> from" is treated as a special case in sequencer.c but not in
> trailer.c, as far as I can tell. If we consistently avoided the
> footer, we wouldn't need such a special case anywhere.)

That is one of the numerous shortcomings of the "interpret-trailers"
that is still not finished, I would say.

> 2) The Linux kernel's repository has some "commit ... upstream." lines
> in this position (below the commit title) - for example, in commit
> dacc0987fd2e.

"A group of people seem to prefer it there" does not lead to
"therefore let's move it there for everybody".  It does open a
possibility that we may want to add a new option to put it there,
but does not justify changing what existing "-x" option does.


Re: [RFC/PATCH 0/2] place cherry pick line below commit title

2016-09-30 Thread Jonathan Tan

On 09/29/2016 02:56 PM, Junio C Hamano wrote:

Jonathan Tan  writes:


This is somewhat of a follow-up to my previous e-mail with subject
"[PATCH] sequencer: support folding in rfc2822 footer" [1], in which I
proposed relaxing the definition of a commit message footer to allow
multiple-line field bodies (as described in RFC2822), but its strictness
was deemed deliberate.


It does not necessarily mean we can never change it when we did
something deliberately, though.  With a good enough justification,
and with a transitition plan if the backward incompatibility is
severe enough to warrant one, we can change things.

I vaguely recall that there were some discussion on the definition
of "what's a trailer line" with folks from the kernel land, perhaps
while discussing the interpret-trailers topic.  IIRC, when somebody
passes an improved version along, the resulting message's trailer
block may look like this:

Signed-off-by: Original Author 
[fixed typo in the variable names]
Signed-off-by: Somebhody Else 

and an obvious "wish" of theirs was to treat not just RFC2822-like
"a line that begins with token followed by a colon" but also these
short comments as part of the trailer block.  Your original wish in
[*1*] is to also treat "a line that begin with a whitespace that
follows a line that begins with token followed by a colon" as part
of the trailer block and I personally think that is a reasonable
thing to wish for, too.


If we allowed arbitrary lines in the trailer block, this would solve my 
original problem, yes.



I recall that I was somewhat surprised and dissapointed to see no
change to interpret-trailers when you tried [*1*], which was really
about improving the definition of what the trailer block is, by the
way.


Sorry, I had missed that.

Looking at that, it seems that sequencer.c started interpreting the last 
paragraph of the commit message as a footer and adding an exception for 
"cherry picked from" in commit b971e04 ("sequencer.c: always separate 
"(cherry picked from" from commit body", 2013-02-12). So the 
interpretations of sequencer.c and interpret-trailers were already 
divergent, but I should have probably at least discussed that.



Below is a patch set that allows placing the "cherry picked from" line
without taking into account the definition of a commit message footer.
For example, "git cherry-pick -x" (with the appropriate configuration
variable or argument) would, to this commit message:

  commit title

  This is an explanatory paragraph.

  Footer: foo

place the "(cherry picked from ...)" line below "commit title".

Would this be better?


It is not immediately obvious what such a change buys us.  Wouldn't
the current code place that line below "Footer: foo"?  I cannot
think of any reason why anybody would want to place "cherry-picked
from" immediately below the title and before the first line of the
body.


Yes, the current code would place it below "Footer: foo" without a blank 
line before it, but if it thinks that the so-called footer is not 
actually a footer, it would insert a blank line before that line.


As for a reason:

1) I do not have a specific reason for placing it in that exact 
position, but I would like to be able to place the "cherry picked from" 
line without affecting the last paragraph (specifically, without making 
the "cherry picked from" line the only line in the last paragraph). It 
seems to me that placing it below the title was the most straightforward 
way to do that - this way, Git can have its own idea of what a footer 
constitutes, and the user can treat it as completely separate from the 
"cherry picked from" line mechanism.


1a) (Avoiding the footer might also be a good way of more clearly 
defining what the footer is. For example, currently, "cherry picked 
from" is treated as a special case in sequencer.c but not in trailer.c, 
as far as I can tell. If we consistently avoided the footer, we wouldn't 
need such a special case anywhere.)


2) The Linux kernel's repository has some "commit ... upstream." lines 
in this position (below the commit title) - for example, in commit 
dacc0987fd2e.


[1] 
https://kernel.googlesource.com/pub/scm/linux/kernel/git/stable/linux-stable/+/dacc0987fd2e25a8b4b8c19778862ba12ce76d0a


Re: [RFC/PATCH 0/2] place cherry pick line below commit title

2016-09-29 Thread Junio C Hamano
Jonathan Tan  writes:

> This is somewhat of a follow-up to my previous e-mail with subject
> "[PATCH] sequencer: support folding in rfc2822 footer" [1], in which I
> proposed relaxing the definition of a commit message footer to allow
> multiple-line field bodies (as described in RFC2822), but its strictness
> was deemed deliberate.

It does not necessarily mean we can never change it when we did
something deliberately, though.  With a good enough justification,
and with a transitition plan if the backward incompatibility is
severe enough to warrant one, we can change things.

I vaguely recall that there were some discussion on the definition
of "what's a trailer line" with folks from the kernel land, perhaps
while discussing the interpret-trailers topic.  IIRC, when somebody
passes an improved version along, the resulting message's trailer
block may look like this:

Signed-off-by: Original Author 
[fixed typo in the variable names]
Signed-off-by: Somebhody Else 

and an obvious "wish" of theirs was to treat not just RFC2822-like
"a line that begins with token followed by a colon" but also these
short comments as part of the trailer block.  Your original wish in
[*1*] is to also treat "a line that begin with a whitespace that
follows a line that begins with token followed by a colon" as part
of the trailer block and I personally think that is a reasonable
thing to wish for, too.

I recall that I was somewhat surprised and dissapointed to see no
change to interpret-trailers when you tried [*1*], which was really
about improving the definition of what the trailer block is, by the
way.

In any case, if we want to improve what the trailer block is, we
would certainly need to make sure what is inserted by "cherry-pick -x"
is also considered as part of the trailer block, so it may be necessary
to change it to "Cherry-picked-from: ..." while doing so.  I dunno.

> Below is a patch set that allows placing the "cherry picked from" line
> without taking into account the definition of a commit message footer.
> For example, "git cherry-pick -x" (with the appropriate configuration
> variable or argument) would, to this commit message:
>
>   commit title
>
>   This is an explanatory paragraph.
>
>   Footer: foo
>
> place the "(cherry picked from ...)" line below "commit title".
>
> Would this be better?

It is not immediately obvious what such a change buys us.  Wouldn't
the current code place that line below "Footer: foo"?  I cannot
think of any reason why anybody would want to place "cherry-picked
from" immediately below the title and before the first line of the
body.


[Footnotes]

*1* 
http://public-inbox.org/git/1472846322-5592-1-git-send-email-jonathanta...@google.com/