Re: [BUG] submodule config does not apply to upper case submodules?

2017-02-15 Thread Jonathan Tan

On 02/15/2017 03:11 PM, Junio C Hamano wrote:

Junio C Hamano  writes:

Perhaps something like this?


This looks good. I was hoping to unify the processing logic between this 
CLI parsing and the usual stream parsing, but this approach is probably 
simpler.



 config.c | 16 +++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/config.c b/config.c
index c6b874a7bf..98bf8fee32 100644
--- a/config.c
+++ b/config.c
@@ -201,6 +201,20 @@ void git_config_push_parameter(const char *text)
strbuf_release();
 }

+static void canonicalize_config_variable_name(struct strbuf *var)
+{
+   char *first_dot = strchr(var->buf, '.');
+   char *last_dot = strrchr(var->buf, '.');
+   char *cp;
+
+   if (first_dot)
+   for (cp = var->buf; *cp && cp < first_dot; cp++)


"*cp &&" is unnecessary, as far as I can tell.


+   *cp = tolower(*cp);
+   if (last_dot)
+   for (cp = last_dot; *cp; cp++)
+   *cp = tolower(*cp);
+}
+
 int git_config_parse_parameter(const char *text,
   config_fn_t fn, void *data)
 {
@@ -223,7 +237,7 @@ int git_config_parse_parameter(const char *text,
strbuf_list_free(pair);
return error("bogus config parameter: %s", text);
}
-   strbuf_tolower(pair[0]);
+   canonicalize_config_variable_name(pair[0]);
if (fn(pair[0]->buf, value, data) < 0) {
strbuf_list_free(pair);
return -1;




Re: [BUG] submodule config does not apply to upper case submodules?

2017-02-15 Thread Jonathan Tan

On 02/15/2017 10:53 AM, Junio C Hamano wrote:

Lars Schneider  writes:


It looks like as if submodule configs ("submodule.*") for submodules
with upper case names are ignored.


This observation is surprising, as the second level in three-level
names like ".." is designed to be case
sensitive.  A code that uses the config API needs to do extra things
to cause the behaviour you showed, i.e. to get submodule.U.update
ignored while submodule.l.update to be honoured.  Perhaps somebody
downcases things too aggressively before comparing?

This is worth making it work as expected, needless to say ;-)


I had some time to look into this, and yes, command-line parameters are 
too aggressively downcased ("git_config_parse_parameter" calls 
"strbuf_tolower" on the entire key part in config.c). Updating the 
original patch to use "test_global_config" makes the test pass, and 
commenting out the "strbuf_tolower" line in config.c also makes the test 
pass.


I'll see if I can fix this.


Re: [PATCH 6/7] grep: avoid resolving revision names in --no-index case

2017-02-14 Thread Jonathan Tan

On 02/14/2017 10:04 AM, Jeff King wrote:

On Tue, Feb 14, 2017 at 08:53:04AM -0800, Jonathan Tan wrote:


On 02/13/2017 10:07 PM, Jeff King wrote:

diff --git a/builtin/grep.c b/builtin/grep.c
index e83b33bda..c4c632594 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -1176,6 +1176,12 @@ int cmd_grep(int argc, const char **argv, const char 
*prefix)
break;
}

+   if (!use_index) {
+   if (seen_dashdash)
+   die(_("--no-index cannot be used with revs"));


There is a subsequent check that prints "--no-index or --untracked cannot be
used with revs." - maybe we should just expand this part to incorporate that
case. (That is, write `if (!use_index || untracked)` instead of `if
(!use_index)`.) This also allows us to preserve the error message, which
might be useful for someone using a translated version of Git.


I wasn't sure if we wanted to treat "untracked" in the same way.
Certainly we can catch the error here for the seen_dashdash case, but is
it also the case that:

  echo content >master
  git grep --untracked pattern master

should treat "master" as a path?

-Peff


It is already the case that it cannot be treated as a rev:

  $ git grep --untracked pattern master
  fatal: --no-index or --untracked cannot be used with revs.

So I think it would be better if it was treated as a path, for 
consistency with --no-index. I'm OK either way, though.


Re: [PATCH 0/7] grep rev/path parsing fixes

2017-02-14 Thread Jonathan Tan

On 02/13/2017 10:00 PM, Jeff King wrote:

I've fixed that, along with a few other bugs and cleanups. The complete
series is below. Patch 2 is your (untouched) patch. My suggestions for
your test are in patch 3, which can either remain on its own or be
squashed in.

  [1/7]: grep: move thread initialization a little lower
  [2/7]: grep: do not unnecessarily query repo for "--"
  [3/7]: t7810: make "--no-index --" test more robust
  [4/7]: grep: re-order rev-parsing loop
  [5/7]: grep: fix "--" rev/pathspec disambiguation
  [6/7]: grep: avoid resolving revision names in --no-index case
  [7/7]: grep: do not diagnose misspelt revs with --no-index


Thanks - these look good to me. I replied to 6/7 with a comment, but I 
also think that these are good as-is. Also, 3/7 can probably be squashed in.


Re: [PATCH 6/7] grep: avoid resolving revision names in --no-index case

2017-02-14 Thread Jonathan Tan

On 02/13/2017 10:07 PM, Jeff King wrote:

diff --git a/builtin/grep.c b/builtin/grep.c
index e83b33bda..c4c632594 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -1176,6 +1176,12 @@ int cmd_grep(int argc, const char **argv, const char 
*prefix)
break;
}

+   if (!use_index) {
+   if (seen_dashdash)
+   die(_("--no-index cannot be used with revs"));


There is a subsequent check that prints "--no-index or --untracked 
cannot be used with revs." - maybe we should just expand this part to 
incorporate that case. (That is, write `if (!use_index || untracked)` 
instead of `if (!use_index)`.) This also allows us to preserve the error 
message, which might be useful for someone using a translated version of 
Git.



+   break;
+   }
+
if (get_sha1_with_context(arg, 0, sha1, )) {
if (seen_dashdash)
die(_("unable to resolve revision: %s"), arg);


[PATCH for NEXT] grep: do not unnecessarily query repo for "--"

2017-02-13 Thread Jonathan Tan
When running a command of the form

  git grep --no-index pattern -- path

in the absence of a Git repository, an error message will be printed:

  fatal: BUG: setup_git_env called without repository

This is because "git grep" tries to interpret "--" as a rev. "git grep"
has always tried to first interpret "--" as a rev for at least a few
years, but this issue was upgraded from a pessimization to a bug in
commit 59332d1 ("Resurrect "git grep --no-index"", 2010-02-06), which
calls get_sha1 regardless of whether --no-index was specified. This bug
appeared to be benign until commit b1ef400 ("setup_git_env: avoid blind
fall-back to ".git"", 2016-10-20) when Git was taught to die in this
situation.  (This "git grep" bug appears to be one of the bugs that
commit b1ef400 is meant to flush out.)

Therefore, always interpret "--" as signaling the end of options,
instead of trying to interpret it as a rev first.

Signed-off-by: Jonathan Tan <jonathanta...@google.com>
---

There is probably a similar bug for commands of the form:

  git grep --no-index pattern foo

If there is a repo and "foo" is a rev, the "--no-index or --untracked
cannot be used with revs." error would occur. If there is a repo and
"foo" is not a rev, this command would proceed as usual. If there is no
repo, the "setup_git_env called without repository" error would occur.
(This is my understanding from reading the code - I haven't tested it
out.)

This patch does not fix this similar bug, but I decided to send it out
anyway because it still fixes a bug and unlocks the ability to
specify paths with "git grep --no-index".

 builtin/grep.c  |  9 +
 t/t7810-grep.sh | 12 
 2 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index 2c727ef49..1b68d1638 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -1154,6 +1154,11 @@ int cmd_grep(int argc, const char **argv, const char 
*prefix)
const char *arg = argv[i];
unsigned char sha1[20];
struct object_context oc;
+   if (!strcmp(arg, "--")) {
+   i++;
+   seen_dashdash = 1;
+   break;
+   }
/* Is it a rev? */
if (!get_sha1_with_context(arg, 0, sha1, )) {
struct object *object = parse_object_or_die(sha1, arg);
@@ -1162,10 +1167,6 @@ int cmd_grep(int argc, const char **argv, const char 
*prefix)
add_object_array_with_path(object, arg, , oc.mode, 
oc.path);
continue;
}
-   if (!strcmp(arg, "--")) {
-   i++;
-   seen_dashdash = 1;
-   }
break;
}
 
diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh
index 19f0108f8..29202f0e7 100755
--- a/t/t7810-grep.sh
+++ b/t/t7810-grep.sh
@@ -982,6 +982,18 @@ test_expect_success 'grep -e -- -- path' '
test_cmp expected actual
 '
 
+test_expect_success 'grep --no-index pattern -- path' '
+   rm -fr non &&
+   mkdir -p non/git &&
+   (
+   GIT_CEILING_DIRECTORIES="$(pwd)/non" &&
+   export GIT_CEILING_DIRECTORIES &&
+   cd non/git &&
+   echo hello >hello &&
+   git grep --no-index o -- .
+   )
+'
+
 cat >expected <

Re: [RFC 00/14] Allow fetch-pack to send ref names (globs allowed)

2017-02-07 Thread Jonathan Tan
Looking back at the comments I have received in reply, I think that
there were two major concerns: (i) the case where a server ACKs a client
"have" line and the client forever thinks that the server has it, but it
may not be the case for future servers (or future invocations of the
same server), and (ii) what the client does with 2 "versions" of remote
refs.

For (i), the issue already exists and as far as I can tell, this patch
set does not directly impact it, positively or negatively. The
"have"/"ACK" part of negotiation is kept the same - the only difference
in this patch set is that wants can be specified by name instead of by
SHA-1 hash. This patch set does not help the "have"/"ACK" part of
negotiation, but it helps the "want" part.

For (ii), I have prepared a patch to be squashed, and extended the
commit message with an explanation of what is happening. (The commit
message and the patch are appended to this e-mail).

(There was also some discussion of the client being required to send
exact matches in its "want-ref" lines.)

Please let me know if you have any other opinions or thoughts. It does
seem to me like such a protocol update (or something similar) would help
for large repositories with many ever-changing refs (like refs/changes
in Gerrit or refs/pull in GitHub) - and the creation of a ref would look
like a deletion depending on the order in which we access the servers in
a load-balancing arrangement and the order in which those servers
synchronize themselves. And deletion of refs does not work with the
current protocol, but would work with a protocol that supports wildcards
(like this one).

-- 8< --

fetch: send want-ref and receive fetched refs

Teach fetch to send refspecs to the underlying transport, and teach all
components used by the HTTP transport (remote-curl, transport-helper,
fetch-pack) to understand and propagate the names and SHA-1s of the refs
fetched.

The do_fetch method in builtin/fetch.c originally had only one
remote-local ref map, generated from the already-fetched list of remote
refs. This patch introduces a new ref map generated from the list of
fetched refs. Usages of the list of remote refs or the remote-local ref
map are updated as follows:
 - check_not_current_branch (which checks that the current branch is not
   affected by the fetch) is performed both on the original ref map (to
   die before the fetch if we can, as an optimization) and on the new
   ref map (since the new ref map is the one actually applied).
 - Pruning is done based on the new ref map.
 - backfill_tags (for tag following) is performed using the original
   list of remote refs because the new list of fetched refs is not
   guaranteed to contain tag information. (Since backfill_tags performs
   another fetch, it does not need to be fully consistent with the
   just-returned packfile.)
The list of remote refs and the remote-local ref map are not otherwise
used by do_fetch or any function in its invocation chain (fetch_one and
cmd_fetch).
---
 builtin/fetch.c | 26 ++
 1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 87de00e49..b8432394c 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -1177,6 +1177,20 @@ static int do_fetch(struct transport *transport,
 
if (tags == TAGS_DEFAULT && autotags)
transport_set_option(transport, TRANS_OPT_FOLLOWTAGS, "1");
+   if (fetch_refs(transport, e_rs, e_rs_nr, ref_map, _remote_refs)) {
+   free_refs(ref_map);
+   retcode = 1;
+   goto cleanup;
+   }
+   if (new_remote_refs) {
+   free_refs(ref_map);
+   ref_map = get_ref_map(transport->remote, new_remote_refs,
+ refs, ref_count, tags, autotags);
+   if (!update_head_ok)
+   check_not_current_branch(ref_map);
+   free_refs(new_remote_refs);
+   }
+
if (prune) {
/*
 * We only prune based on refspecs specified
@@ -1192,18 +1206,6 @@ static int do_fetch(struct transport *transport,
   transport->url);
}
}
-   if (fetch_refs(transport, e_rs, e_rs_nr, ref_map, _remote_refs)) {
-   free_refs(ref_map);
-   retcode = 1;
-   goto cleanup;
-   }
-   if (new_remote_refs) {
-   free_refs(ref_map);
-   ref_map = get_ref_map(transport->remote, new_remote_refs,
- refs, ref_count, tags, autotags);
-   free_refs(new_remote_refs);
-   }
-
if (consume_refs(transport, ref_map)) {
free_refs(ref_map);
retcode = 1;
-- 
2.11.0.483.g087da7b7c-goog



Re: Deadlock between git-remote-http and git fetch-pack

2017-01-27 Thread Jonathan Tan

On 01/27/2017 02:31 PM, tsuna wrote:

Hi there,
While investigating a hung job in our CI system today, I think I found
a deadlock in git-remote-http

Git version: 2.9.3
Linux (amd64) kernel 4.9.0

Excerpt from the process list:

jenkins  27316  0.0  0.0  18508  6024 ?S19:30   0:00  |
   \_ git -C ../../../arista fetch --unshallow
jenkins  27317  0.0  0.0 169608 10916 ?S19:30   0:00  |
   \_ git-remote-http origin http://gerrit/arista
jenkins  27319  0.0  0.0  24160  8260 ?S19:30   0:00  |
   \_ git fetch-pack --stateless-rpc --stdin
--lock-pack --include-tag --thin --no-progress --depth=2147483647
http://gerrit/arista/

Here PID 27319 (git fetch-pack) is stuck reading on stdin, while its
parent, PID 27317 (git-remote-http) is stuck reading on its child’s
stdout.  Nothing has moved for like 2h, it’s deadlocked.


strace -fp 27319

strace: Process 27319 attached
read(0,

Here FD 0 is a pipe:

~ @8a33a534e2f7> lsof -np 27319 | grep 0r
git 27319 jenkins0r  FIFO   0,10  0t0 354519158 pipe

The writing end of which is owned by the parent process:

~ @8a33a534e2f7> lsof -n 2>/dev/null | fgrep 354519158
git-remot 27317jenkins4w FIFO   0,10  0t0
354519158 pipe
git   27319jenkins0r FIFO   0,10  0t0
354519158 pipe

And the parent process (git-remote-http) is stuck reading from another FD:


strace -fp 27317

strace: Process 27317 attached
read(5,

And here FD 5 is another pipe:

~ @8a33a534e2f7> lsof -np 27317 | grep 5r
git-remot 27317 jenkins5r  FIFO   0,10  0t0 354519159 pipe

Which is the child’s stdout:


lsof -n 2>/dev/null | fgrep 354519159

git-remot 27317jenkins5r FIFO   0,10  0t0
354519159 pipe
git   27319jenkins1w FIFO   0,10  0t0
354519159 pipe

Hence the deadlock.

Stack trace in git-remote-http:

(gdb) bt
#0  0x7f04f1e1363d in read () from target:/lib64/libpthread.so.0
#1  0x562417472d73 in xread ()
#2  0x562417472f2b in read_in_full ()
#3  0x562417438a6e in get_packet_data ()
#4  0x562417439129 in packet_read ()
#5  0x5624174245e0 in rpc_service ()
#6  0x562417424f10 in fetch_git ()
#7  0x5624174233fd in main ()

Stack trace in git fetch-pack:

(gdb) bt
#0  0x7fb3ab478620 in __read_nocancel () from target:/lib64/libpthread.so.0
#1  0x55f688827283 in xread ()
#2  0x55f68882743b in read_in_full ()
#3  0x55f6887ce35e in get_packet_data ()
#4  0x55f6887cea19 in packet_read ()
#5  0x55f6887ceb90 in packet_read_line ()
#6  0x55f68879dd05 in get_ack ()
#7  0x55f68879f6b4 in fetch_pack ()
#8  0x55f688710619 in cmd_fetch_pack ()
#9  0x55f6886dff7b in handle_builtin ()
#10 0x55f6886df026 in main ()

I looked at the diff between v2.9.3 and HEAD on fetch-pack.c and
remote-curl.c and didn’t see anything noteworthy in that area of the
code, so I presume the bug is still there in master.



I haven't looked into this in detail, but this might be related to 
something I discovered while writing my patch set. I noticed that 
upload-pack (the process on the "other side" of fetch-pack) can die 
without first writing any notification, causing fetch-pack to block 
forever on a read. A fix would probably look like that patch [1].


[1] 



Re: [RFC 03/14] upload-pack: test negotiation with changing repo

2017-01-26 Thread Jonathan Tan

On 01/26/2017 02:33 PM, Junio C Hamano wrote:

Jonathan Tan <jonathanta...@google.com> writes:


diff --git a/t/lib-httpd/one-time-sed.sh b/t/lib-httpd/one-time-sed.sh
new file mode 100644
index 0..060ec0300
--- /dev/null
+++ b/t/lib-httpd/one-time-sed.sh
@@ -0,0 +1,8 @@
+#!/bin/sh
+
+if [ -e one-time-sed ]; then
+   "$GIT_EXEC_PATH/git-http-backend" | sed "$(cat one-time-sed)"
+   rm one-time-sed
+else
+   "$GIT_EXEC_PATH/git-http-backend"
+fi


CodingGuidelines?


Thanks - done locally and will send out in the next reroll.


+inconsistency() {
+   # Simulate that the server initially reports $2 as the ref
+   # corresponding to $1, and after that, $1 as the ref corresponding to
+   # $1. This corresponds to the real-life situation where the server's
+   # repository appears to change during negotiation, for example, when
+   # different servers in a load-balancing arrangement serve (stateless)
+   # RPCs during a single negotiation.
+   printf "s/%s/%s/" \
+  $(git -C "$REPO" rev-parse $1 | tr -d "\n") \
+  $(git -C "$REPO" rev-parse $2 | tr -d "\n") \
+  >"$HTTPD_ROOT_PATH/one-time-sed"


I'd prefer for the printf'd result to have a final LF (i.e. not
leaving the resulting one-time-sed with a final incomplete line).
Also, do you really need the pipe to tr-d?  Doesn't the result of
$(command substitution) omit the final LF anyway?

$ printf '1 %s 2 %s 3\n' "$(echo foo)" "$(echo bar)"; echo OK
1 foo 2 bar 3
OK


Done.


diff --git a/upload-pack.c b/upload-pack.c
index b88ed8e26..0678c53d6 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -862,9 +862,13 @@ static void receive_needs(struct string_list 
*wanted_ns_refs)
} else if (skip_prefix(line, "want ", ) &&
   !get_sha1_hex(arg, sha1_buf)) {
o = parse_object(sha1_buf);
-   if (!o)
+   if (!o) {
+   packet_write_fmt(1,
+"ERR upload-pack: not our ref 
%s",
+sha1_to_hex(sha1_buf));
die("git upload-pack: not our ref %s",
sha1_to_hex(sha1_buf));
+   }


This somehow looks like a good thing to do even in production.  Am I
mistaken?


Yes, that's true. If this patch set stalls (for whatever reason), I'll 
spin this off into an independent patch.


Re: [RFC 02/14] upload-pack: allow ref name and glob requests

2017-01-26 Thread Jonathan Tan



On 01/26/2017 02:23 PM, Junio C Hamano wrote:

Jonathan Tan <jonathanta...@google.com> writes:


Currently, while performing packfile negotiation [1], upload-pack allows
clients to specify their desired objects only as SHA-1s. This causes:
(a) vulnerability to failure when an object turns non-existent during
negotiation, which may happen if, for example, upload-pack is
provided by multiple Git servers in a load-balancing arrangement,
and
(b) dependence on the server first publishing a list of refs with
associated objects.

To eliminate (a) and take a step towards eliminating (b), teach
upload-pack to support requests in the form of ref names and globs (in
addition to the existing support for SHA-1s) through a new line of the
form "want-ref " where ref is the full name of a ref, a glob
pattern, or a SHA-1. At the conclusion of negotiation, the server will
write "wanted-ref  " for all requests that have been
specified this way.


I am not sure if this "at the conclusion of" is sensible.  It is OK
to assume that what the client side has is fixed, and it is probably
OK to desire that what the server side has can change, but at the
same time, it feels quite fragile to move the goalpost in between.


Do you have any specific concerns as to this fragility? Peff mentioned 
some concerns with the client making some decisions based on the initial 
SHA-1 vs the SHA-1 reported by "wanted-ref", to which I replied [1].



Stepping back a bit, in an environment that involves multiple server
instances that have inconsistent set of refs, can the negotiation
even be sensibly and safely implemented?  The first server the
client contacts may, in response to a "have", say "I do have that
commit so you do not have to send its ancestors to me.  We found one
cut-off point.  Please do explore other lines of histories."  The
next server that concludes the negotiation exchange may not have
that commit and will be unable to produce a pack that excludes the
objects reachable from that commit---wouldn't that become a problem?


It's true that this patch set wouldn't solve this problem. This problem 
only occurs when there is a commit that the client knows but only a few 
of the servers know (maybe because the client just pushed it to one of 
them). If, for example, the client does not know a commit and only a few 
of the servers know it (for example, because another user just pushed 
it), this patch set does help. The latter scenario seems like it would 
occur relatively commonly.



One way to prevent such a problem from hurting clients may be for
these multiple server instances to coordinate and make sure they
have a shared perception of the common history among them.  Some
pushes may have come to one instance but may not have propagated to
other instances, and such a commit cannot be accepted as usable
"have" if the servers anticipate that the final client request would
go to any of the servers.  Otherwise the multiple server arrangement
would not work safely, methinks.

And if the servers are ensuring the safety using such a mechanism,
they can use the same mechanism to restrain "faster" instances from
sending too fresh state of refs that other instances haven't caught
up to, which would mean they can present a consistent set of refs to
the client in the first place, no?

So I am not sure if the mechanism to request history by refname
instead of the tip commit would help the multi-server environment as
advertised.  It may help solving other problems, though (e.g. like
"somebody pushed to update after the initial advertisement was sent
out" which can happen even in a single server environment).


This patch set would solve the problem you describe (whether in a single 
server environment or the coordination between multiple servers that 
provides "strong consistency"). (Although it may not be an important 
problem to solve, since it is probably OK if the client got a "slow" 
version of the state of the refs.)



To be flexible with respect to client needs, the server does not
indicate an error if a "want-ref" line corresponds to no refs, but
instead relies on the client to ensure that what the user needs has been
fetched. For example, a client could reasonably expand an abbreviated
name "foo" to "want-ref foo", "want-ref refs/heads/foo", "want-ref
refs/tags/foo", among others, and ensure that at least one such ref has
been fetched.


Cute.  This may be one way to implement the DWIM thing within the
constraint of eventually wanting to go to "client speaks first, the
server does not advertise things the client is not interested in"
world.

But at the same time it may end up bloating the set of refs the
client asks instead.  Instead of receiving the advertisement and
then sending one request after picking the matching one from it,
t

Re: [RFC 00/14] Allow fetch-pack to send ref names (globs allowed)

2017-01-26 Thread Jonathan Tan

Thanks for your comments.

On 01/26/2017 03:00 PM, Jeff King wrote:

On Wed, Jan 25, 2017 at 02:02:53PM -0800, Jonathan Tan wrote:


Negotiation currently happens by upload-pack initially sending a list of
refs with names and SHA-1 hashes, and then several request/response
pairs in which the request from fetch-pack consists of SHA-1 hashes
(selected from the initial list). Allowing the request to consist of
names instead of SHA-1 hashes increases tolerance to refs changing
(due to time, and due to having load-balanced servers without strong
consistency),


Interesting. My big question is: what happens when a ref _does_ change?
How does the client handle this?

The existing uploadpack.allowReachableSHA1InWant is there to work around
the problem that an http client may get a ref advertisement in one step,
and then come back later to do the want/have negotiation, at which point
the server has moved on (or maybe it's even a different server). There
the client says "I want sha1 X", and the server needs to say "well, X
isn't my tip now, but it's still acceptable for you to fetch".

But this seems to go in the opposite direction. After the advertisement,
the client decides "OK, I want to fetch refs/heads/master which is at
SHA1 X". It connects to the server and says "I want refs/heads/master".
Let's say the server has moved its version of the ref to SHA1 Y.

What happens? I think the server will say "wanted-ref master Y". Does
the client just decide to use "Y" then?  How does that interact with any
decisions the client might have made about X? I guess things like
fast-forwards have to be decided after we fetch the objects anyway
(since we cannot compute them until we get the pack), so maybe there
aren't any such decisions. I haven't checked.


Yes, the server will say "wanted-ref master Y". The relevant code 
regarding the decisions the client makes regarding X or Y is in do_fetch 
in builtin/fetch.c.


There, I can see these decisions done using X:
 - check_not_current_branch (forbidding fetching that modifies the
   current branch) (I just noticed that this has to be done for Y too,
   and will do so)
 - prune refs [*]
 - automatic tag following [*]

[*] X and Y may differ in that one relevant ref appears in one set but 
not in the other (because a ref was added or removed in the meantime), 
causing a different result if these decisions were to be done using Y, 
but I think that it is OK either way.


Fetch optimizations (for example, everything_local in fetch-pack.c) that 
check if the client really needs to fetch are also done using X, of 
course (and if the optimization succeeds, there is no Y).


Fast-forwards (and everything else in store_updated_refs) are decided 
using Y.



and is a step towards eliminating the need for the server
to send the list of refs first (possibly improving performance).


I'm not sure it is all that useful towards that end. You still have to
break compatibility so that the client tells the server to suppress the
ref advertisement. After that, it is just a question of asking for the
refs. And you have two options:

  1. Ask the server to tell you the values of some subset of the refs,
 pick what you want, and then do the want/have as normal.

  2. Go straight to the want/have, but tell the server the refs you want
 instead of their sha1s.

I think your approach here would lead to (2).

But (1), besides being closer to how the protocol works now, seems like
it's more flexible. I can ask about the ref state without necessarily
having to retrieve the objects. How would you write git-ls-remote with
such a system?


Assuming a new protocol with the appropriate backwards compatibility 
(which would have to be done for both options), (2) does save a 
request/response pair (because we can send the ref names directly as 
"want-ref" in the initial request instead of sending ref names in the 
initial request and then confirming them using "want " in the 
subsequent request). Also, (2) is more tolerant towards refs changing 
over time. (1) might be closer to the current protocol, but I think that 
the difference is not so significant (only in "want-ref" vs "want" 
request and the "wanted-ref" response).


As for git-ls-remote, I currently think that it would have to use the 
existing protocol.



[1] There has been some discussion about whether the server should
accept partial ref names, e.g. [2]. In this patch set, I have made the
server only accept full names, and it is the responsibility of the
client to send the multiple patterns which it wants to match. Quoting
from the commit message of the second patch:

  For example, a client could reasonably expand an abbreviated
  name "foo" to "want-ref foo", "want-ref refs/heads/foo", "want-ref
  refs/tags/foo", among others, and ensure that at least one such ref has
  been fetc

Re: [RFC 12/14] fetch-pack: do not printf after closing stdout

2017-01-26 Thread Jonathan Tan

On 01/25/2017 04:50 PM, Stefan Beller wrote:

On Wed, Jan 25, 2017 at 2:03 PM, Jonathan Tan <jonathanta...@google.com> wrote:

In fetch-pack, during a stateless RPC, printf is invoked after stdout is
closed. Update the code to not do this, preserving the existing
behavior.


This seems to me as if it could go as an independent
bugfix(?) or refactoring as this seems to be unclear from the code?


The subsequent patches in this patch set are dependent on this patch, 
but it's true that this could be sent out on its own first.


I'm not sure if bugfix is the right word, since the existing behavior is 
correct (except perhaps that we rely on the fact that printf after 
closing stdout does effectively nothing).


[RFC 06/14] fetch: refactor to make function args narrower

2017-01-25 Thread Jonathan Tan
Refactor find_non_local_tags and get_ref_map to only take the
information they need instead of the entire transport struct. Besides
improving code clarity, this also improves their flexibility, which will
be needed in a future patch.

Signed-off-by: Jonathan Tan <jonathanta...@google.com>
---
 builtin/fetch.c | 23 ---
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 43e35c494..ae7c6daa1 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -220,7 +220,7 @@ static int will_fetch(struct ref **head, const unsigned 
char *sha1)
return 0;
 }
 
-static void find_non_local_tags(struct transport *transport,
+static void find_non_local_tags(const struct ref *refs,
struct ref **head,
struct ref ***tail)
 {
@@ -230,7 +230,7 @@ static void find_non_local_tags(struct transport *transport,
struct string_list_item *item = NULL;
 
for_each_ref(add_existing, _refs);
-   for (ref = transport_get_remote_refs(transport); ref; ref = ref->next) {
+   for (ref = refs; ref; ref = ref->next) {
if (!starts_with(ref->name, "refs/tags/"))
continue;
 
@@ -302,7 +302,8 @@ static void find_non_local_tags(struct transport *transport,
string_list_clear(_refs, 0);
 }
 
-static struct ref *get_ref_map(struct transport *transport,
+static struct ref *get_ref_map(const struct remote *remote,
+  const struct ref *remote_refs,
   struct refspec *refspecs, int refspec_count,
   int tags, int *autotags)
 {
@@ -314,8 +315,6 @@ static struct ref *get_ref_map(struct transport *transport,
/* opportunistically-updated references: */
struct ref *orefs = NULL, **oref_tail = 
 
-   const struct ref *remote_refs = transport_get_remote_refs(transport);
-
struct string_list existing_refs = STRING_LIST_INIT_DUP;
 
if (refspec_count) {
@@ -355,8 +354,8 @@ static struct ref *get_ref_map(struct transport *transport,
fetch_refspec = parse_fetch_refspec(refmap_nr, 
refmap_array);
fetch_refspec_nr = refmap_nr;
} else {
-   fetch_refspec = transport->remote->fetch;
-   fetch_refspec_nr = transport->remote->fetch_refspec_nr;
+   fetch_refspec = remote->fetch;
+   fetch_refspec_nr = remote->fetch_refspec_nr;
}
 
for (i = 0; i < fetch_refspec_nr; i++)
@@ -365,7 +364,6 @@ static struct ref *get_ref_map(struct transport *transport,
die("--refmap option is only meaningful with command-line 
refspec(s).");
} else {
/* Use the defaults */
-   struct remote *remote = transport->remote;
struct branch *branch = branch_get(NULL);
int has_merge = branch_has_merge_config(branch);
if (remote &&
@@ -404,7 +402,7 @@ static struct ref *get_ref_map(struct transport *transport,
/* also fetch all tags */
get_fetch_map(remote_refs, tag_refspec, , 0);
else if (tags == TAGS_DEFAULT && *autotags)
-   find_non_local_tags(transport, _map, );
+   find_non_local_tags(remote_refs, _map, );
 
/* Now append any refs to be updated opportunistically: */
*tail = orefs;
@@ -1083,6 +1081,7 @@ static int do_fetch(struct transport *transport,
struct ref *ref_map;
int autotags = (transport->remote->fetch_tags == 1);
int retcode = 0;
+   const struct ref *remote_refs;
 
if (tags == TAGS_DEFAULT) {
if (transport->remote->fetch_tags == 2)
@@ -1101,7 +1100,9 @@ static int do_fetch(struct transport *transport,
goto cleanup;
}
 
-   ref_map = get_ref_map(transport, refs, ref_count, tags, );
+   remote_refs = transport_get_remote_refs(transport);
+   ref_map = get_ref_map(transport->remote, remote_refs, refs, ref_count,
+ tags, );
if (!update_head_ok)
check_not_current_branch(ref_map);
 
@@ -1134,7 +1135,7 @@ static int do_fetch(struct transport *transport,
if (tags == TAGS_DEFAULT && autotags) {
struct ref **tail = _map;
ref_map = NULL;
-   find_non_local_tags(transport, _map, );
+   find_non_local_tags(remote_refs, _map, );
if (ref_map)
backfill_tags(transport, ref_map);
free_refs(ref_map);
-- 
2.11.0.483.g087da7b7c-goog



[RFC 10/14] fetch-pack: support partial names and globs

2017-01-25 Thread Jonathan Tan
Teach fetch-pack to support partial ref names and ref patterns as input.

This does not use "want-ref" yet - support for that will be added in a
future patch.

Signed-off-by: Jonathan Tan <jonathanta...@google.com>
---
 builtin/fetch-pack.c  | 40 -
 remote.c  | 55 +++
 remote.h  | 16 +++
 t/t5500-fetch-pack.sh | 38 +++
 4 files changed, 122 insertions(+), 27 deletions(-)

diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index a18fd0c44..5f14242ae 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -11,32 +11,12 @@ static const char fetch_pack_usage[] =
 "[--include-tag] [--upload-pack=] [--depth=] "
 "[--no-progress] [--diag-url] [-v] [:] [...]";
 
-static void add_sought_entry(const struct ref ***sought, int *nr, int *alloc,
+static void add_sought_entry(struct refspec **sought, int *nr, int *alloc,
 const char *name)
 {
-   struct ref *ref;
-   struct object_id oid;
-
-   if (!get_oid_hex(name, )) {
-   if (name[GIT_SHA1_HEXSZ] == ' ') {
-   /*  , find refname */
-   name += GIT_SHA1_HEXSZ + 1;
-   } else if (name[GIT_SHA1_HEXSZ] == '\0') {
-   ; /* , leave sha1 as name */
-   } else {
-   /* , clear cruft from oid */
-   oidclr();
-   }
-   } else {
-   /* , clear cruft from get_oid_hex */
-   oidclr();
-   }
-
-   ref = alloc_ref(name);
-   oidcpy(>old_oid, );
(*nr)++;
ALLOC_GROW(*sought, *nr, *alloc);
-   (*sought)[*nr - 1] = ref;
+   parse_ref_or_pattern(&(*sought)[*nr - 1], name);
 }
 
 int cmd_fetch_pack(int argc, const char **argv, const char *prefix)
@@ -44,8 +24,10 @@ int cmd_fetch_pack(int argc, const char **argv, const char 
*prefix)
int i, ret;
struct ref *ref = NULL;
const char *dest = NULL;
-   const struct ref **sought = NULL;
+   struct refspec *sought = NULL;
int nr_sought = 0, alloc_sought = 0;
+   const struct ref **sought_refs;
+   int nr_sought_refs;
int fd[2];
char *pack_lockfile = NULL;
char **pack_lockfile_ptr = NULL;
@@ -195,8 +177,9 @@ int cmd_fetch_pack(int argc, const char **argv, const char 
*prefix)
return args.diag_url ? 0 : 1;
}
get_remote_heads(fd[0], NULL, 0, , 0, NULL, );
+   get_ref_array(_refs, _sought_refs, ref, sought, nr_sought);
 
-   ref = fetch_pack(, fd, conn, ref, dest, sought, nr_sought,
+   ref = fetch_pack(, fd, conn, ref, dest, sought_refs, 
nr_sought_refs,
 , pack_lockfile_ptr);
if (pack_lockfile) {
printf("lock %s\n", pack_lockfile);
@@ -222,12 +205,15 @@ int cmd_fetch_pack(int argc, const char **argv, const 
char *prefix)
 */
for (i = 0; i < nr_sought; i++) {
struct ref *r;
-   for (r = ref; r; r = r->next)
-   if (!sought[i] || refname_match(sought[i]->name, 
r->name))
+   if (sought[i].pattern)
+   continue; /* patterns do not need to match anything */
+   for (r = ref; r; r = r->next) {
+   if (refname_match(sought[i].src, r->name))
break;
+   }
if (r)
continue;
-   error("no such remote ref %s", sought[i]->name);
+   error("no such remote ref %s", sought[i].src);
ret = 1;
}
 
diff --git a/remote.c b/remote.c
index 725a2d39a..08f3c910e 100644
--- a/remote.c
+++ b/remote.c
@@ -612,6 +612,39 @@ static struct refspec *parse_refspec_internal(int 
nr_refspec, const char **refsp
die("Invalid refspec '%s'", refspec[i]);
 }
 
+void parse_ref_or_pattern(struct refspec *refspec, const char *str)
+{
+   struct object_id oid;
+   memset(refspec, 0, sizeof(*refspec));
+
+   if (!get_oid_hex(str, )) {
+   if (str[GIT_SHA1_HEXSZ] == ' ') {
+   struct object_id oid2;
+   /*  , find refname */
+   refspec->src = xstrdup(str + GIT_SHA1_HEXSZ + 1);
+   if (!get_oid_hex(refspec->src, )
+   && !oidcmp(, ))
+   /* The name is actually a SHA-1 */
+   refspec->exact_sha1 = 1;
+   } else if (str[GIT_SHA1_HEXSZ] == '\0') {
+   ; /* , leave sha1 as name */
+   refspec->src = xstrdup(str);
+   refspec->exact_sha1 = 1;
+ 

[RFC 08/14] fetch-pack: check returned refs for matches

2017-01-25 Thread Jonathan Tan
Instead of setting "matched" on matched refs, detect matches by checking
the sought refs against the returned refs.  Also, since the "matched"
field in struct ref is now no longer used, remove it.

This is the 2nd of 3 patches to eliminate using input refs to
communicate information obtained by the fetch mechanism.

(There are possibly ways more efficient than a nested for loop to
accomplish this. However, since a subsequent patch will compare the
user-provided refspecs against the fetched refs directly, and thus
necessitating the nested for loop anyway, I decided to use the nested
for loop in this patch.)

Signed-off-by: Jonathan Tan <jonathanta...@google.com>
---
 builtin/fetch-pack.c | 7 ++-
 fetch-pack.c | 9 ++---
 fetch-pack.h | 2 --
 remote.h | 3 +--
 4 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index cfe9e447c..f31f874a0 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -4,6 +4,7 @@
 #include "remote.h"
 #include "connect.h"
 #include "sha1-array.h"
+#include "refs.h"
 
 static const char fetch_pack_usage[] =
 "git fetch-pack [--all] [--stdin] [--quiet | -q] [--keep | -k] [--thin] "
@@ -220,7 +221,11 @@ int cmd_fetch_pack(int argc, const char **argv, const char 
*prefix)
 * an error.
 */
for (i = 0; i < nr_sought; i++) {
-   if (!sought[i] || sought[i]->matched)
+   struct ref *r;
+   for (r = ref; r; r = r->next)
+   if (!sought[i] || refname_match(sought[i]->name, 
r->name))
+   break;
+   if (r)
continue;
error("no such remote ref %s", sought[i]->name);
ret = 1;
diff --git a/fetch-pack.c b/fetch-pack.c
index 9a87ddd3d..d4642b05c 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -562,6 +562,7 @@ static void filter_refs(struct fetch_pack_args *args,
struct ref **newtail = 
struct ref *ref, *next;
int i;
+   int *matched = xcalloc(nr_sought, sizeof(*matched));
 
i = 0;
for (ref = *refs; ref; ref = next) {
@@ -578,7 +579,7 @@ static void filter_refs(struct fetch_pack_args *args,
break; /* definitely do not have it */
else if (cmp == 0) {
keep = 1; /* definitely have it */
-   sought[i]->matched = 1;
+   matched[i] = 1;
}
i++;
}
@@ -604,19 +605,21 @@ static void filter_refs(struct fetch_pack_args *args,
unsigned char sha1[20];
 
ref = sought[i];
-   if (ref->matched)
+   if (matched[i])
continue;
if (get_sha1_hex(ref->name, sha1) ||
ref->name[40] != '\0' ||
hashcmp(sha1, ref->old_oid.hash))
continue;
 
-   ref->matched = 1;
+   matched[i] = 1;
*newtail = copy_ref(ref);
newtail = &(*newtail)->next;
}
}
*refs = newlist;
+
+   free(matched);
 }
 
 static void mark_alternate_complete(const struct ref *ref, void *unused)
diff --git a/fetch-pack.h b/fetch-pack.h
index c912e3d32..76f7c719c 100644
--- a/fetch-pack.h
+++ b/fetch-pack.h
@@ -33,8 +33,6 @@ struct fetch_pack_args {
 
 /*
  * sought represents remote references that should be updated from.
- * On return, the names that were found on the remote will have been
- * marked as such.
  */
 struct ref *fetch_pack(struct fetch_pack_args *args,
   int fd[], struct child_process *conn,
diff --git a/remote.h b/remote.h
index 57d059431..2f7f23d47 100644
--- a/remote.h
+++ b/remote.h
@@ -89,8 +89,7 @@ struct ref {
force:1,
forced_update:1,
expect_old_sha1:1,
-   deletion:1,
-   matched:1;
+   deletion:1;
 
/*
 * Order is important here, as we write to FETCH_HEAD
-- 
2.11.0.483.g087da7b7c-goog



[RFC 09/14] transport: put ref oid in out param

2017-01-25 Thread Jonathan Tan
Return new OID information obtained through fetching in new structs
instead of reusing the existing ones. With this change, the input
structs are no longer used for output, and are thus marked const.

This is the 3rd of 3 patches to eliminate using input refs to
communicate information obtained by the fetch mechanism.

Signed-off-by: Jonathan Tan <jonathanta...@google.com>
---
 builtin/clone.c  | 14 --
 builtin/fetch-pack.c |  4 ++--
 fetch-pack.c | 26 +++---
 fetch-pack.h |  2 +-
 transport-helper.c   | 34 +++---
 transport.c  |  6 +++---
 transport.h  | 13 +
 7 files changed, 61 insertions(+), 38 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 0135c5f1c..3191da391 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -858,6 +858,8 @@ int cmd_clone(int argc, const char **argv, const char 
*prefix)
struct refspec *refspec;
const char *fetch_pattern;
 
+   struct ref *new_remote_refs = NULL;
+
packet_trace_identity("clone");
argc = parse_options(argc, argv, prefix, builtin_clone_options,
 builtin_clone_usage, 0);
@@ -1075,8 +1077,15 @@ int cmd_clone(int argc, const char **argv, const char 
*prefix)
break;
}
 
-   if (!is_local && !complete_refs_before_fetch)
-   transport_fetch_refs(transport, mapped_refs, NULL);
+   if (!is_local && !complete_refs_before_fetch) {
+   transport_fetch_refs(transport, mapped_refs,
+_remote_refs);
+   if (new_remote_refs) {
+   refs = new_remote_refs;
+   free_refs(mapped_refs);
+   mapped_refs = wanted_peer_refs(refs, refspec);
+   }
+   }
 
remote_head = find_ref_by_name(refs, "HEAD");
remote_head_points_at =
@@ -1148,5 +1157,6 @@ int cmd_clone(int argc, const char **argv, const char 
*prefix)
junk_mode = JUNK_LEAVE_ALL;
 
free(refspec);
+   free_refs(new_remote_refs);
return err;
 }
diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index f31f874a0..a18fd0c44 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -11,7 +11,7 @@ static const char fetch_pack_usage[] =
 "[--include-tag] [--upload-pack=] [--depth=] "
 "[--no-progress] [--diag-url] [-v] [:] [...]";
 
-static void add_sought_entry(struct ref ***sought, int *nr, int *alloc,
+static void add_sought_entry(const struct ref ***sought, int *nr, int *alloc,
 const char *name)
 {
struct ref *ref;
@@ -44,7 +44,7 @@ int cmd_fetch_pack(int argc, const char **argv, const char 
*prefix)
int i, ret;
struct ref *ref = NULL;
const char *dest = NULL;
-   struct ref **sought = NULL;
+   const struct ref **sought = NULL;
int nr_sought = 0, alloc_sought = 0;
int fd[2];
char *pack_lockfile = NULL;
diff --git a/fetch-pack.c b/fetch-pack.c
index d4642b05c..8cc85c19f 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -52,6 +52,10 @@ static int non_common_revs, multi_ack, use_sideband;
 #define ALLOW_REACHABLE_SHA1   02
 static unsigned int allow_unadvertised_object_request;
 
+/* An arbitrary non-NULL non-const pointer to assign to the util field of
+ * string list items when we need one. */
+#define ARBITRARY (_unpack_limit)
+
 __attribute__((format (printf, 2, 3)))
 static inline void print_verbose(const struct fetch_pack_args *args,
 const char *fmt, ...)
@@ -556,7 +560,7 @@ static void mark_recent_complete_commits(struct 
fetch_pack_args *args,
 
 static void filter_refs(struct fetch_pack_args *args,
struct ref **refs,
-   struct ref **sought, int nr_sought)
+   const struct ref **sought, int nr_sought)
 {
struct ref *newlist = NULL;
struct ref **newtail = 
@@ -604,16 +608,16 @@ static void filter_refs(struct fetch_pack_args *args,
for (i = 0; i < nr_sought; i++) {
unsigned char sha1[20];
 
-   ref = sought[i];
+   const struct ref *sref = sought[i];
if (matched[i])
continue;
-   if (get_sha1_hex(ref->name, sha1) ||
-   ref->name[40] != '\0' ||
-   hashcmp(sha1, ref->old_oid.hash))
+   if (get_sha1_hex(sref->name, sha1) ||
+   sref->name[40] != '\0' ||
+   hashcmp(sha1, sref->old_oid.hash))
c

[RFC 14/14] DONT USE advertise_ref_in_want=1

2017-01-25 Thread Jonathan Tan
---
 t/t5500-fetch-pack.sh | 2 ++
 upload-pack.c | 2 +-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh
index 18fe23c97..f39dbcab8 100755
--- a/t/t5500-fetch-pack.sh
+++ b/t/t5500-fetch-pack.sh
@@ -551,6 +551,7 @@ test_expect_success 'fetch-pack can fetch refs using a 
partial name' '
git init server &&
(
cd server &&
+   git config uploadpack.advertiseRefInWant false
test_commit 1 &&
test_commit 2 &&
git checkout -b one
@@ -587,6 +588,7 @@ test_expect_success 'fetch-pack can fetch refs using a 
glob' '
git init server &&
(
cd server &&
+   git config uploadpack.advertiseRefInWant false
test_commit 1 &&
test_commit 2 &&
git checkout -b ona &&
diff --git a/upload-pack.c b/upload-pack.c
index 0678c53d6..4998a8c7e 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -62,7 +62,7 @@ static int use_sideband;
 static int advertise_refs;
 static int stateless_rpc;
 static const char *pack_objects_hook;
-static int advertise_ref_in_want;
+static int advertise_ref_in_want = 1;
 
 static void reset_timeout(void)
 {
-- 
2.11.0.483.g087da7b7c-goog



[RFC 07/14] fetch-pack: put shallow info in out param

2017-01-25 Thread Jonathan Tan
Expand the transport fetch method signature, by adding an out parameter,
to allow transports to return information about the refs they have
fetched. Then communicate shallow status information through this
mechanism instead of by modifying the input list of refs.

This does require clients to sometimes generate the ref map twice: one
generation from the list of refs provided by the remote (as is currently
done) and potentially one regeneration from the new list of refs that
the fetch mechanism provides (added in this patch). The double
generation may result in duplicate error messages when a remote-tracking
branch seems to track more than one remote branch.

This is the 1st of 3 patches to eliminate using input refs to
communicate information obtained by the fetch mechanism.

Signed-off-by: Jonathan Tan <jonathanta...@google.com>
---
 builtin/clone.c|  4 ++--
 builtin/fetch.c| 23 +++
 fetch-pack.c   | 18 +++---
 remote.c   | 12 +++-
 remote.h   |  1 +
 t/t5536-fetch-conflicts.sh |  2 ++
 transport-helper.c |  5 +++--
 transport.c| 32 ++--
 transport.h| 12 ++--
 9 files changed, 85 insertions(+), 24 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 5ef81927a..0135c5f1c 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -1076,7 +1076,7 @@ int cmd_clone(int argc, const char **argv, const char 
*prefix)
}
 
if (!is_local && !complete_refs_before_fetch)
-   transport_fetch_refs(transport, mapped_refs);
+   transport_fetch_refs(transport, mapped_refs, NULL);
 
remote_head = find_ref_by_name(refs, "HEAD");
remote_head_points_at =
@@ -1115,7 +1115,7 @@ int cmd_clone(int argc, const char **argv, const char 
*prefix)
if (is_local)
clone_local(path, git_dir);
else if (refs && complete_refs_before_fetch)
-   transport_fetch_refs(transport, mapped_refs);
+   transport_fetch_refs(transport, mapped_refs, NULL);
 
update_remote_refs(refs, mapped_refs, remote_head_points_at,
   branch_top.buf, reflog_msg.buf, transport, 
!is_local);
diff --git a/builtin/fetch.c b/builtin/fetch.c
index ae7c6daa1..19e3f40a0 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -911,11 +911,13 @@ static int quickfetch(struct ref *ref_map)
return check_connected(iterate_ref_map, , );
 }
 
-static int fetch_refs(struct transport *transport, struct ref *ref_map)
+static int fetch_refs(struct transport *transport, struct ref *ref_map,
+ struct ref **updated_remote_refs)
 {
int ret = quickfetch(ref_map);
if (ret)
-   ret = transport_fetch_refs(transport, ref_map);
+   ret = transport_fetch_refs(transport, ref_map,
+  updated_remote_refs);
if (ret)
transport_unlock_pack(transport);
return ret;
@@ -1066,7 +1068,7 @@ static void backfill_tags(struct transport *transport, 
struct ref *ref_map)
transport_set_option(transport, TRANS_OPT_FOLLOWTAGS, NULL);
transport_set_option(transport, TRANS_OPT_DEPTH, "0");
transport_set_option(transport, TRANS_OPT_DEEPEN_RELATIVE, NULL);
-   if (!fetch_refs(transport, ref_map))
+   if (!fetch_refs(transport, ref_map, NULL))
consume_refs(transport, ref_map);
 
if (gsecondary) {
@@ -1082,6 +1084,7 @@ static int do_fetch(struct transport *transport,
int autotags = (transport->remote->fetch_tags == 1);
int retcode = 0;
const struct ref *remote_refs;
+   struct ref *new_remote_refs = NULL;
 
if (tags == TAGS_DEFAULT) {
if (transport->remote->fetch_tags == 2)
@@ -1123,7 +1126,19 @@ static int do_fetch(struct transport *transport,
   transport->url);
}
}
-   if (fetch_refs(transport, ref_map) || consume_refs(transport, ref_map)) 
{
+   if (fetch_refs(transport, ref_map, _remote_refs)) {
+   free_refs(ref_map);
+   retcode = 1;
+   goto cleanup;
+   }
+   if (new_remote_refs) {
+   free_refs(ref_map);
+   ref_map = get_ref_map(transport->remote, new_remote_refs,
+ refs, ref_count, tags, );
+   free_refs(new_remote_refs);
+   }
+
+   if (consume_refs(transport, ref_map)) {
free_refs(ref_map);
retcode = 1;
goto cleanup;
diff --git a/fetch-pack.c b/fetch-pack.c
index 601f0779a..9a87ddd3d 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -974,12 +974,13 @@ static int remove_duplicates_in_refs(struct

[RFC 11/14] fetch-pack: support want-ref

2017-01-25 Thread Jonathan Tan
Teach fetch-pack to use the want-ref mechanism whenever the server
advertises it.

Signed-off-by: Jonathan Tan <jonathanta...@google.com>
---
 builtin/fetch-pack.c  |   5 +-
 fetch-pack.c  | 173 --
 fetch-pack.h  |   2 +
 t/t5500-fetch-pack.sh |  42 
 transport.c   |   2 +-
 5 files changed, 175 insertions(+), 49 deletions(-)

diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index 5f14242ae..ae073ab24 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -179,8 +179,9 @@ int cmd_fetch_pack(int argc, const char **argv, const char 
*prefix)
get_remote_heads(fd[0], NULL, 0, , 0, NULL, );
get_ref_array(_refs, _sought_refs, ref, sought, nr_sought);
 
-   ref = fetch_pack(, fd, conn, ref, dest, sought_refs, 
nr_sought_refs,
-, pack_lockfile_ptr);
+   ref = fetch_pack(, fd, conn, ref, dest, sought, nr_sought,
+sought_refs, nr_sought_refs, ,
+pack_lockfile_ptr);
if (pack_lockfile) {
printf("lock %s\n", pack_lockfile);
fflush(stdout);
diff --git a/fetch-pack.c b/fetch-pack.c
index 8cc85c19f..02149c930 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -219,11 +219,19 @@ static void consume_shallow_list(struct fetch_pack_args 
*args, int fd)
}
 }
 
-static enum ack_type get_ack(int fd, unsigned char *result_sha1)
+/*
+ * Reads an ACK or NAK from fd. If wanted_ref_tail is not NULL, also accepts
+ * any "wanted-ref" lines before that ACK or NAK, writing them to
+ * wanted_ref_tail.
+ */
+static enum ack_type get_ack(int fd, unsigned char *result_sha1,
+struct ref ***wanted_ref_tail)
 {
int len;
-   char *line = packet_read_line(fd, );
+   char *line;
const char *arg;
+start:
+   line = packet_read_line(fd, );
 
if (!len)
die(_("git fetch-pack: expected ACK/NAK, got EOF"));
@@ -244,7 +252,19 @@ static enum ack_type get_ack(int fd, unsigned char 
*result_sha1)
return ACK;
}
}
-   die(_("git fetch-pack: expected ACK/NAK, got '%s'"), line);
+   if (wanted_ref_tail) {
+   struct object_id oid;
+   if (skip_prefix(line, "wanted-ref ", ) &&
+   !get_sha1_hex(arg, oid.hash) && arg[40] == ' ' && arg[41]) {
+   struct ref *ref = alloc_ref(arg + 41);
+   oidcpy(>old_oid, );
+   **wanted_ref_tail = ref;
+   *wanted_ref_tail = >next;
+   goto start;
+   }
+   die(_("git fetch_pack: expected ACK/NAK or wanted-ref, got 
'%s'"), line);
+   }
+   die(_("git fetch_pack: expected ACK/NAK, got '%s'"), line);
 }
 
 static void send_request(struct fetch_pack_args *args,
@@ -282,29 +302,55 @@ static int next_flush(struct fetch_pack_args *args, int 
count)
return count;
 }
 
-static int find_common(struct fetch_pack_args *args,
-  int fd[2], unsigned char *result_sha1,
-  struct ref *refs)
+static void write_capabilities(struct strbuf *sb,
+  const struct fetch_pack_args *args)
 {
-   int fetching;
-   int count = 0, flushes = 0, flush_at = INITIAL_FLUSH, retval;
-   const unsigned char *sha1;
-   unsigned in_vain = 0;
-   int got_continue = 0;
-   int got_ready = 0;
-   struct strbuf req_buf = STRBUF_INIT;
-   size_t state_len = 0;
+   if (multi_ack == 2) strbuf_addstr(sb, " multi_ack_detailed");
+   if (multi_ack == 1) strbuf_addstr(sb, " multi_ack");
+   if (no_done)strbuf_addstr(sb, " no-done");
+   if (use_sideband == 2)  strbuf_addstr(sb, " side-band-64k");
+   if (use_sideband == 1)  strbuf_addstr(sb, " side-band");
+   if (args->deepen_relative) strbuf_addstr(sb, " deepen-relative");
+   if (args->use_thin_pack) strbuf_addstr(sb, " thin-pack");
+   if (args->no_progress)   strbuf_addstr(sb, " no-progress");
+   if (args->include_tag)   strbuf_addstr(sb, " include-tag");
+   if (prefer_ofs_delta)   strbuf_addstr(sb, " ofs-delta");
+   if (deepen_since_ok)strbuf_addstr(sb, " deepen-since");
+   if (deepen_not_ok)  strbuf_addstr(sb, " deepen-not");
+   if (agent_supported)strbuf_addf(sb, " agent=%s",
+   git_user_agent_sanitized());
+}
 
-   if (args->stateless_rpc && multi_ack == 1)
-   die(_("--stateless-rpc requires multi_ack_detailed"));
-   if (marked)
-   

[RFC 03/14] upload-pack: test negotiation with changing repo

2017-01-25 Thread Jonathan Tan
Make upload-pack report "not our ref" errors to the client. (If not, the
client would be left waiting for a response when the server is already
dead.)

Add tests to check the behavior of upload-pack and fetch-pack when
upload-pack is served from a changing repository (for example, when
different servers in a load-balancing agreement participate in the same
stateless RPC negotiation). This forms a baseline of comparison to the
ref-in-want functionality (which will be introduced in subsequent
commits), and ensures that subsequent commits do not change existing
behavior.

As part of this effort, a mechanism to substitute strings in an HTTP
response only on the first invocation is added.

Signed-off-by: Jonathan Tan <jonathanta...@google.com>
---
 t/lib-httpd.sh |  1 +
 t/lib-httpd/apache.conf|  8 
 t/lib-httpd/one-time-sed.sh|  8 
 t/t5552-upload-pack-ref-in-want.sh | 91 ++
 upload-pack.c  |  6 ++-
 5 files changed, 113 insertions(+), 1 deletion(-)
 create mode 100644 t/lib-httpd/one-time-sed.sh

diff --git a/t/lib-httpd.sh b/t/lib-httpd.sh
index 435a37465..84f8efdd4 100644
--- a/t/lib-httpd.sh
+++ b/t/lib-httpd.sh
@@ -132,6 +132,7 @@ prepare_httpd() {
cp "$TEST_PATH"/passwd "$HTTPD_ROOT_PATH"
install_script broken-smart-http.sh
install_script error.sh
+   install_script one-time-sed.sh
 
ln -s "$LIB_HTTPD_MODULE_PATH" "$HTTPD_ROOT_PATH/modules"
 
diff --git a/t/lib-httpd/apache.conf b/t/lib-httpd/apache.conf
index 69174c6e3..ef218ff15 100644
--- a/t/lib-httpd/apache.conf
+++ b/t/lib-httpd/apache.conf
@@ -106,9 +106,14 @@ Alias /auth/dumb/ www/auth/dumb/
SetEnv GIT_EXEC_PATH ${GIT_EXEC_PATH}
SetEnv GIT_HTTP_EXPORT_ALL
 
+
+   SetEnv GIT_EXEC_PATH ${GIT_EXEC_PATH}
+   SetEnv GIT_HTTP_EXPORT_ALL
+
 ScriptAliasMatch /smart_*[^/]*/(.*) ${GIT_EXEC_PATH}/git-http-backend/$1
 ScriptAlias /broken_smart/ broken-smart-http.sh/
 ScriptAlias /error/ error.sh/
+ScriptAliasMatch /one_time_sed/(.*) one-time-sed.sh/$1
 
Options FollowSymlinks
 
@@ -118,6 +123,9 @@ ScriptAlias /error/ error.sh/
 
   Options ExecCGI
 
+
+   Options ExecCGI
+
 
Options ExecCGI
 
diff --git a/t/lib-httpd/one-time-sed.sh b/t/lib-httpd/one-time-sed.sh
new file mode 100644
index 0..060ec0300
--- /dev/null
+++ b/t/lib-httpd/one-time-sed.sh
@@ -0,0 +1,8 @@
+#!/bin/sh
+
+if [ -e one-time-sed ]; then
+   "$GIT_EXEC_PATH/git-http-backend" | sed "$(cat one-time-sed)"
+   rm one-time-sed
+else
+   "$GIT_EXEC_PATH/git-http-backend"
+fi
diff --git a/t/t5552-upload-pack-ref-in-want.sh 
b/t/t5552-upload-pack-ref-in-want.sh
index 3496af641..80cf2263a 100755
--- a/t/t5552-upload-pack-ref-in-want.sh
+++ b/t/t5552-upload-pack-ref-in-want.sh
@@ -292,4 +292,95 @@ test_expect_success 'hideRefs with namespaces' '
check_output
 '
 
+. "$TEST_DIRECTORY"/lib-httpd.sh
+start_httpd
+
+REPO="$HTTPD_DOCUMENT_ROOT_PATH/repo"
+LOCAL_PRISTINE="$(pwd)/local_pristine"
+
+test_expect_success 'setup repos for change-while-negotiating test' '
+   (
+   git init "$REPO" &&
+   cd "$REPO" &&
+   >.git/git-daemon-export-ok &&
+   test_commit m1 &&
+   git tag -d m1 &&
+
+   # Local repo with many commits (so that negotiation will take
+   # more than 1 request/response pair)
+   git clone "http://127.0.0.1:$LIB_HTTPD_PORT/smart/repo; 
"$LOCAL_PRISTINE" &&
+   cd "$LOCAL_PRISTINE" &&
+   git checkout -b side &&
+   for i in $(seq 1 33); do test_commit s$i; done &&
+
+   # Add novel commits to upstream
+   git checkout master &&
+   cd "$REPO" &&
+   test_commit m2 &&
+   test_commit m3 &&
+   git tag -d m2 m3
+   ) &&
+   git -C "$LOCAL_PRISTINE" remote set-url origin 
"http://127.0.0.1:$LIB_HTTPD_PORT/one_time_sed/repo;
+'
+
+inconsistency() {
+   # Simulate that the server initially reports $2 as the ref
+   # corresponding to $1, and after that, $1 as the ref corresponding to
+   # $1. This corresponds to the real-life situation where the server's
+   # repository appears to change during negotiation, for example, when
+   # different servers in a load-balancing arrangement serve (stateless)
+   # RPCs during a single negotiation.
+   printf "s/%s/%s/" \
+  $(git -C "$REPO" rev-parse $1 | tr -d "\n") \
+  $(git -C "$REPO" rev-parse $2 | tr -d "\n") \
+   

[RFC 05/14] fetch: refactor fetch_refs into two functions

2017-01-25 Thread Jonathan Tan
Refactor the fetch_refs function into a function that does the fetching
of refs and another function that stores them. This prepares for a
future patch where some processing may be done between those tasks.

Signed-off-by: Jonathan Tan <jonathanta...@google.com>
---
 builtin/fetch.c | 19 +--
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index c71d5eb9b..43e35c494 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -918,10 +918,16 @@ static int fetch_refs(struct transport *transport, struct 
ref *ref_map)
int ret = quickfetch(ref_map);
if (ret)
ret = transport_fetch_refs(transport, ref_map);
-   if (!ret)
-   ret |= store_updated_refs(transport->url,
-   transport->remote->name,
-   ref_map);
+   if (ret)
+   transport_unlock_pack(transport);
+   return ret;
+}
+
+static int consume_refs(struct transport *transport, struct ref *ref_map)
+{
+   int ret = store_updated_refs(transport->url,
+transport->remote->name,
+ref_map);
transport_unlock_pack(transport);
return ret;
 }
@@ -1062,7 +1068,8 @@ static void backfill_tags(struct transport *transport, 
struct ref *ref_map)
transport_set_option(transport, TRANS_OPT_FOLLOWTAGS, NULL);
transport_set_option(transport, TRANS_OPT_DEPTH, "0");
transport_set_option(transport, TRANS_OPT_DEEPEN_RELATIVE, NULL);
-   fetch_refs(transport, ref_map);
+   if (!fetch_refs(transport, ref_map))
+   consume_refs(transport, ref_map);
 
if (gsecondary) {
transport_disconnect(gsecondary);
@@ -1115,7 +1122,7 @@ static int do_fetch(struct transport *transport,
   transport->url);
}
}
-   if (fetch_refs(transport, ref_map)) {
+   if (fetch_refs(transport, ref_map) || consume_refs(transport, ref_map)) 
{
free_refs(ref_map);
retcode = 1;
goto cleanup;
-- 
2.11.0.483.g087da7b7c-goog



[RFC 12/14] fetch-pack: do not printf after closing stdout

2017-01-25 Thread Jonathan Tan
In fetch-pack, during a stateless RPC, printf is invoked after stdout is
closed. Update the code to not do this, preserving the existing
behavior.

Signed-off-by: Jonathan Tan <jonathanta...@google.com>
---
 builtin/fetch-pack.c | 14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index ae073ab24..24af3b7c5 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -191,10 +191,10 @@ int cmd_fetch_pack(int argc, const char **argv, const 
char *prefix)
printf("connectivity-ok\n");
fflush(stdout);
}
-   close(fd[0]);
-   close(fd[1]);
-   if (finish_connect(conn))
-   return 1;
+   if (finish_connect(conn)) {
+   ret = 1;
+   goto cleanup;
+   }
 
ret = !ref;
 
@@ -218,11 +218,17 @@ int cmd_fetch_pack(int argc, const char **argv, const 
char *prefix)
ret = 1;
}
 
+   if (args.stateless_rpc)
+   goto cleanup;
+
while (ref) {
printf("%s %s\n",
   oid_to_hex(>old_oid), ref->name);
ref = ref->next;
}
 
+cleanup:
+   close(fd[0]);
+   close(fd[1]);
return ret;
 }
-- 
2.11.0.483.g087da7b7c-goog



[RFC 13/14] fetch: send want-ref and receive fetched refs

2017-01-25 Thread Jonathan Tan
Teach fetch to send refspecs to the underlying transport, and teach all
components used by the HTTP transport (remote-curl, transport-helper,
fetch-pack) to understand and propagate the names and SHA-1s of the refs
fetched.

Signed-off-by: Jonathan Tan <jonathanta...@google.com>
---
 builtin/clone.c|   4 +-
 builtin/fetch-pack.c   |   8 ++-
 builtin/fetch.c| 100 ++---
 remote-curl.c  |  91 -
 t/t5552-upload-pack-ref-in-want.sh |   4 +-
 transport-helper.c |  74 +--
 transport.c|  10 ++--
 transport.h|  20 +---
 8 files changed, 229 insertions(+), 82 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 3191da391..765a3a3b6 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -1078,7 +1078,7 @@ int cmd_clone(int argc, const char **argv, const char 
*prefix)
}
 
if (!is_local && !complete_refs_before_fetch) {
-   transport_fetch_refs(transport, mapped_refs,
+   transport_fetch_refs(transport, NULL, 0, mapped_refs,
 _remote_refs);
if (new_remote_refs) {
refs = new_remote_refs;
@@ -1124,7 +1124,7 @@ int cmd_clone(int argc, const char **argv, const char 
*prefix)
if (is_local)
clone_local(path, git_dir);
else if (refs && complete_refs_before_fetch)
-   transport_fetch_refs(transport, mapped_refs, NULL);
+   transport_fetch_refs(transport, NULL, 0, mapped_refs, NULL);
 
update_remote_refs(refs, mapped_refs, remote_head_points_at,
   branch_top.buf, reflog_msg.buf, transport, 
!is_local);
diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index 24af3b7c5..ed1608c12 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -35,6 +35,7 @@ int cmd_fetch_pack(int argc, const char **argv, const char 
*prefix)
struct fetch_pack_args args;
struct sha1_array shallow = SHA1_ARRAY_INIT;
struct string_list deepen_not = STRING_LIST_INIT_DUP;
+   int always_print_refs = 0;
 
packet_trace_identity("fetch-pack");
 
@@ -126,6 +127,10 @@ int cmd_fetch_pack(int argc, const char **argv, const char 
*prefix)
args.update_shallow = 1;
continue;
}
+   if (!strcmp("--always-print-refs", arg)) {
+   always_print_refs = 1;
+   continue;
+   }
usage(fetch_pack_usage);
}
if (deepen_not.nr)
@@ -218,7 +223,7 @@ int cmd_fetch_pack(int argc, const char **argv, const char 
*prefix)
ret = 1;
}
 
-   if (args.stateless_rpc)
+   if (args.stateless_rpc && !always_print_refs)
goto cleanup;
 
while (ref) {
@@ -226,6 +231,7 @@ int cmd_fetch_pack(int argc, const char **argv, const char 
*prefix)
   oid_to_hex(>old_oid), ref->name);
ref = ref->next;
}
+   fflush(stdout);
 
 cleanup:
close(fd[0]);
diff --git a/builtin/fetch.c b/builtin/fetch.c
index 19e3f40a0..87de00e49 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -302,10 +302,75 @@ static void find_non_local_tags(const struct ref *refs,
string_list_clear(_refs, 0);
 }
 
+static void get_effective_refspecs(struct refspec **e_rs, int *e_rs_nr,
+  const struct remote *remote,
+  const struct refspec *cli_rs, int cli_rs_nr,
+  int tags, int *autotags)
+{
+   static struct refspec head_refspec;
+
+   const struct refspec *base_rs;
+   int base_rs_nr;
+   struct branch *merge_branch = NULL;
+   int i;
+
+   struct refspec *rs;
+   int nr, alloc;
+
+   if (cli_rs_nr) {
+   base_rs = cli_rs;
+   base_rs_nr = cli_rs_nr;
+   } else if (refmap_array) {
+   die("--refmap option is only meaningful with command-line 
refspec(s).");
+   } else {
+   /* Use the defaults */
+   struct branch *branch = branch_get(NULL);
+   int has_merge = branch_has_merge_config(branch);
+   /* Note: has_merge implies non-NULL branch->remote_name */
+   if (has_merge && !strcmp(branch->remote_name, remote->name))
+   /*
+* if the remote we're fetching from is the same
+* as given in branch..remote, we add the
+* ref given in branch..merge, too.
+*/
+ 

[RFC 00/14] Allow fetch-pack to send ref names (globs allowed)

2017-01-25 Thread Jonathan Tan
Hello everyone - this is a proposal for a protocol change to allow the
fetch-pack/upload-pack to converse in terms of ref names (globs
allowed), and also an implementation of the server (upload-pack) and
fetch-from-HTTP client (fetch-pack invoked through fetch).

Negotiation currently happens by upload-pack initially sending a list of
refs with names and SHA-1 hashes, and then several request/response
pairs in which the request from fetch-pack consists of SHA-1 hashes
(selected from the initial list). Allowing the request to consist of
names instead of SHA-1 hashes increases tolerance to refs changing
(due to time, and due to having load-balanced servers without strong
consistency), and is a step towards eliminating the need for the server
to send the list of refs first (possibly improving performance).

The protocol is extended by allowing fetch-pack to send
"want-ref ", where  is a full name (refs/...) [1], possibly
including glob characters. Upload-pack will inform the client of the
refs actually matched by sending "wanted-ref  " before
sending the last ACK or NAK.

This patch set is laid out in this way:
 1-2:
  Upload-pack, protocol documentation, tests that test upload-pack
  independently. A configuration option is added to control if the
  "ref-in-want" capability is advertised. (It is always supported even
  if not advertised - this is so that this feature in multiple
  load-balanced servers can be switched on or off without needing any
  atomic switching.)
 3:
  Mechanism to test a repo that changes over the negotiation (currently,
  only with the existing mechanism).
 4-9:
  The current transport mechanism takes in an array of refs which is
  used as both input and output. Since we are planning to extend the
  transport mechanism to also allow the specification of ref names
  (which may include globs, and thus do not have a 1-1 correspondence to
  refs), refactor to make this parameter to be solely an input
  parameter.
 10-11:
  Changes to fetch-pack (which is used by remote-curl) to support
  "want-ref".
 12-13:
  Changes to the rest (fetch -> transport -> transport-helper ->
  remote-curl) to support "want-ref" when fetching from HTTP. The
  transport fetch function signature has been widened to allow passing
  in ref names - transports may use those ref names instead of or in
  addition to refs if they support it. (I chose to preserve refs in the
  function signature because many parts of Git, including remote
  helpers, only understand the old functionality anyway, and because
  precalculating the refs allows some optimizations.)
 14:
  This is not meant for submission - this is just to show that the tests
  pass if ref-in-want was advertised by default (except for some newly
  added tests that explicitly check for the old behavior).

[1] There has been some discussion about whether the server should
accept partial ref names, e.g. [2]. In this patch set, I have made the
server only accept full names, and it is the responsibility of the
client to send the multiple patterns which it wants to match. Quoting
from the commit message of the second patch:

  For example, a client could reasonably expand an abbreviated
  name "foo" to "want-ref foo", "want-ref refs/heads/foo", "want-ref
  refs/tags/foo", among others, and ensure that at least one such ref has
  been fetched.

[2] <20161024132932.i42rqn2vlpocq...@sigill.intra.peff.net>

Jonathan Tan (14):
  upload-pack: move parsing of "want" line
  upload-pack: allow ref name and glob requests
  upload-pack: test negotiation with changing repo
  fetch: refactor the population of hashes
  fetch: refactor fetch_refs into two functions
  fetch: refactor to make function args narrower
  fetch-pack: put shallow info in out param
  fetch-pack: check returned refs for matches
  transport: put ref oid in out param
  fetch-pack: support partial names and globs
  fetch-pack: support want-ref
  fetch-pack: do not printf after closing stdout
  fetch: send want-ref and receive fetched refs
  DONT USE advertise_ref_in_want=1

 Documentation/technical/http-protocol.txt |  20 +-
 Documentation/technical/pack-protocol.txt |  24 +-
 Documentation/technical/protocol-capabilities.txt |   6 +
 builtin/clone.c   |  16 +-
 builtin/fetch-pack.c  |  64 ++--
 builtin/fetch.c   | 178 +++---
 fetch-pack.c  | 226 +
 fetch-pack.h  |   6 +-
 remote-curl.c |  91 +++--
 remote.c  |  67 +++-
 remote.h  |  20 +-
 t/lib-httpd.sh|   1 +
 t/lib-httpd/apache.conf   |   8 +
 t/lib-httpd/one-time-sed.sh  

[RFC 04/14] fetch: refactor the population of hashes

2017-01-25 Thread Jonathan Tan
Populate SHA-1 ref hashes in get_ref_map instead of do_fetch. Besides
tightening scopes of variables in the code, this also prepares for a
future patch where get_ref_map is called multiple times within do_fetch.

Signed-off-by: Jonathan Tan <jonathanta...@google.com>
---
 builtin/fetch.c | 37 +++--
 1 file changed, 19 insertions(+), 18 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index f1570e346..c71d5eb9b 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -316,6 +316,8 @@ static struct ref *get_ref_map(struct transport *transport,
 
const struct ref *remote_refs = transport_get_remote_refs(transport);
 
+   struct string_list existing_refs = STRING_LIST_INIT_DUP;
+
if (refspec_count) {
struct refspec *fetch_refspec;
int fetch_refspec_nr;
@@ -411,7 +413,23 @@ static struct ref *get_ref_map(struct transport *transport,
tail = >next;
}
 
-   return ref_remove_duplicates(ref_map);
+   ref_map = ref_remove_duplicates(ref_map);
+
+   for_each_ref(add_existing, _refs);
+   for (rm = ref_map; rm; rm = rm->next) {
+   if (rm->peer_ref) {
+   struct string_list_item *peer_item =
+   string_list_lookup(_refs,
+  rm->peer_ref->name);
+   if (peer_item) {
+   struct object_id *old_oid = peer_item->util;
+   oidcpy(>peer_ref->old_oid, old_oid);
+   }
+   }
+   }
+   string_list_clear(_refs, 1);
+
+   return ref_map;
 }
 
 #define STORE_REF_ERROR_OTHER 1
@@ -1055,14 +1073,10 @@ static void backfill_tags(struct transport *transport, 
struct ref *ref_map)
 static int do_fetch(struct transport *transport,
struct refspec *refs, int ref_count)
 {
-   struct string_list existing_refs = STRING_LIST_INIT_DUP;
struct ref *ref_map;
-   struct ref *rm;
int autotags = (transport->remote->fetch_tags == 1);
int retcode = 0;
 
-   for_each_ref(add_existing, _refs);
-
if (tags == TAGS_DEFAULT) {
if (transport->remote->fetch_tags == 2)
tags = TAGS_SET;
@@ -1084,18 +1098,6 @@ static int do_fetch(struct transport *transport,
if (!update_head_ok)
check_not_current_branch(ref_map);
 
-   for (rm = ref_map; rm; rm = rm->next) {
-   if (rm->peer_ref) {
-   struct string_list_item *peer_item =
-   string_list_lookup(_refs,
-  rm->peer_ref->name);
-   if (peer_item) {
-   struct object_id *old_oid = peer_item->util;
-   oidcpy(>peer_ref->old_oid, old_oid);
-   }
-   }
-   }
-
if (tags == TAGS_DEFAULT && autotags)
transport_set_option(transport, TRANS_OPT_FOLLOWTAGS, "1");
if (prune) {
@@ -1132,7 +1134,6 @@ static int do_fetch(struct transport *transport,
}
 
  cleanup:
-   string_list_clear(_refs, 1);
return retcode;
 }
 
-- 
2.11.0.483.g087da7b7c-goog



[RFC 02/14] upload-pack: allow ref name and glob requests

2017-01-25 Thread Jonathan Tan
Currently, while performing packfile negotiation [1], upload-pack allows
clients to specify their desired objects only as SHA-1s. This causes:
(a) vulnerability to failure when an object turns non-existent during
negotiation, which may happen if, for example, upload-pack is
provided by multiple Git servers in a load-balancing arrangement,
and
(b) dependence on the server first publishing a list of refs with
associated objects.

To eliminate (a) and take a step towards eliminating (b), teach
upload-pack to support requests in the form of ref names and globs (in
addition to the existing support for SHA-1s) through a new line of the
form "want-ref " where ref is the full name of a ref, a glob
pattern, or a SHA-1. At the conclusion of negotiation, the server will
write "wanted-ref  " for all requests that have been
specified this way.

The server indicates that it supports this feature by advertising the
capability "ref-in-want". Advertisement of this capability is by default
disabled, but can be enabled through a configuration option.

To be flexible with respect to client needs, the server does not
indicate an error if a "want-ref" line corresponds to no refs, but
instead relies on the client to ensure that what the user needs has been
fetched. For example, a client could reasonably expand an abbreviated
name "foo" to "want-ref foo", "want-ref refs/heads/foo", "want-ref
refs/tags/foo", among others, and ensure that at least one such ref has
been fetched.

[1] pack-protocol.txt

Signed-off-by: Jonathan Tan <jonathanta...@google.com>
---
 Documentation/technical/http-protocol.txt |  20 +-
 Documentation/technical/pack-protocol.txt |  24 +-
 Documentation/technical/protocol-capabilities.txt |   6 +
 t/t5552-upload-pack-ref-in-want.sh| 295 ++
 upload-pack.c |  89 ++-
 5 files changed, 411 insertions(+), 23 deletions(-)
 create mode 100755 t/t5552-upload-pack-ref-in-want.sh

diff --git a/Documentation/technical/http-protocol.txt 
b/Documentation/technical/http-protocol.txt
index 1c561bdd9..162d6bc07 100644
--- a/Documentation/technical/http-protocol.txt
+++ b/Documentation/technical/http-protocol.txt
@@ -316,7 +316,8 @@ to prevent caching of the response.
 
 Servers SHOULD support all capabilities defined here.
 
-Clients MUST send at least one "want" command in the request body.
+Clients MUST send at least one "want" or "want-ref" command in the
+request body.
 Clients MUST NOT reference an id in a "want" command which did not
 appear in the response obtained through ref discovery unless the
 server advertises capability `allow-tip-sha1-in-want` or
@@ -330,7 +331,7 @@ server advertises capability `allow-tip-sha1-in-want` or
   want_list =  PKT-LINE(want NUL cap_list LF)
   *(want_pkt)
   want_pkt  =  PKT-LINE(want LF)
-  want  =  "want" SP id
+  want  =  "want" SP id / "want-ref" SP name
   cap_list  =  *(SP capability) SP
 
   have_list =  *PKT-LINE("have" SP id LF)
@@ -352,7 +353,8 @@ C: Build an empty set, `common`, to hold the objects that 
are later
determined to be on both ends.
 
 C: Build a set, `want`, of the objects from `advertised` the client
-   wants to fetch, based on what it saw during ref discovery.
+   wants to fetch, based on what it saw during ref discovery. This is to
+   be used if the server does not support the ref-in-want capability.
 
 C: Start a queue, `c_pending`, ordered by commit time (popping newest
first).  Add all client refs.  When a commit is popped from
@@ -363,8 +365,8 @@ C: Start a queue, `c_pending`, ordered by commit time 
(popping newest
 
 C: Send one `$GIT_URL/git-upload-pack` request:
 
-   C: 0032want ...
-   C: 0032want ...
+   C: want-ref ...
+   C: want-ref ...

C: 0032have .
C: 0032have .
@@ -384,7 +386,7 @@ the pkt-line value.
 Commands MUST appear in the following order, if they appear
 at all in the request stream:
 
-* "want"
+* "want" or "want-ref"
 * "have"
 
 The stream is terminated by a pkt-line flush (``).
@@ -393,6 +395,9 @@ A single "want" or "have" command MUST have one hex 
formatted
 SHA-1 as its value.  Multiple SHA-1s MUST be sent by sending
 multiple commands.
 
+A "want-ref" command MUST be followed by a ref name (which may include
+shell glob characters) or a hex formatted SHA-1.
+
 The `have` list is created by popping the first 32 commits
 from `c_pending`.  Less can be supplied if `c_pending` empties.
 
@@ -410,6 +415,9 @@ Verify a

[RFC 01/14] upload-pack: move parsing of "want" line

2017-01-25 Thread Jonathan Tan
Refactor to parse "want" lines when the prefix is found. This makes it
easier to add support for another prefix, which will be done in a
subsequent commit.

Signed-off-by: Jonathan Tan <jonathanta...@google.com>
---
 upload-pack.c | 28 ++--
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/upload-pack.c b/upload-pack.c
index 7597ba340..15c60a204 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -793,8 +793,20 @@ static void receive_needs(void)
deepen_rev_list = 1;
continue;
}
-   if (!skip_prefix(line, "want ", ) ||
-   get_sha1_hex(arg, sha1_buf))
+   if (skip_prefix(line, "want ", ) &&
+   !get_sha1_hex(arg, sha1_buf)) {
+   o = parse_object(sha1_buf);
+   if (!o)
+   die("git upload-pack: not our ref %s",
+   sha1_to_hex(sha1_buf));
+   if (!(o->flags & WANTED)) {
+   o->flags |= WANTED;
+   if (!((allow_unadvertised_object_request & 
ALLOW_ANY_SHA1) == ALLOW_ANY_SHA1
+ || is_our_ref(o)))
+   has_non_tip = 1;
+   add_object_array(o, NULL, _obj);
+   }
+   } else
die("git upload-pack: protocol error, "
"expected to get sha, not '%s'", line);
 
@@ -820,18 +832,6 @@ static void receive_needs(void)
no_progress = 1;
if (parse_feature_request(features, "include-tag"))
use_include_tag = 1;
-
-   o = parse_object(sha1_buf);
-   if (!o)
-   die("git upload-pack: not our ref %s",
-   sha1_to_hex(sha1_buf));
-   if (!(o->flags & WANTED)) {
-   o->flags |= WANTED;
-   if (!((allow_unadvertised_object_request & 
ALLOW_ANY_SHA1) == ALLOW_ANY_SHA1
- || is_our_ref(o)))
-   has_non_tip = 1;
-   add_object_array(o, NULL, _obj);
-   }
}
 
/*
-- 
2.11.0.483.g087da7b7c-goog



Re: [PATCH 2/2] mailinfo: Understand forwarded patches

2017-01-12 Thread Jonathan Tan

On 01/12/2017 01:20 AM, Matthew Wilcox wrote:

From: Matthew Wilcox 

Extend the --scissors mechanism to strip off the preamble created by
forwarding a patch.  There are a couple of extra headers ("Sent" and
"To") added by forwarding, but other than that, the --scissors option
will now remove patches forwarded from Microsoft Outlook to a Linux
email account.

Signed-off-by: Matthew Wilcox 


Also add a test showing the kind of message that the current code 
doesn't handle, and that this commit addresses.



---
 mailinfo.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/mailinfo.c b/mailinfo.c
index 2059704a8..fc1275532 100644
--- a/mailinfo.c
+++ b/mailinfo.c
@@ -332,7 +332,7 @@ static void cleanup_subject(struct mailinfo *mi, struct 
strbuf *subject)

 #define MAX_HDR_PARSED 10
 static const char *header[MAX_HDR_PARSED] = {
-   "From","Subject","Date",
+   "From","Subject","Date","Sent","To",


Are these extra headers used in both the "real" e-mail headers and the 
in-body headers, or only one of them? (If the latter, they should 
probably be handled only in the relevant function - my previous patches 
to this file were in that direction too, if I remember correctly.) Also, 
I suspect that these will have to be handled differently to the other 3, 
but that will be clearer when you add the test with an example message.



 };

 static inline int cmp_header(const struct strbuf *line, const char *hdr)
@@ -685,6 +685,13 @@ static int is_scissors_line(const char *line)
c++;
continue;
}
+   if (!memcmp(c, "Original Message", 16)) {


1) You can use starts_with or skip_prefix.
2) This seems vulnerable to false positives. If "Original Message" 
always follows a certain kind of line, it might be better to check for 
that. (Again, it will be clearer when we have an example message.)



+   in_perforation = 1;
+   perforation += 16;
+   scissors += 16;
+   c += 15;


Why 15? Also, can skip_prefix avoid these magic numbers?


+   continue;
+   }
in_perforation = 0;
}




Re: [PATCH 1/2] mailinfo: Add support for keep_cr

2017-01-12 Thread Jonathan Tan

On 01/12/2017 01:20 AM, Matthew Wilcox wrote:

From: Matthew Wilcox 

If you have a base-64 encoded patch with CRLF endings (as produced
by forwarding a patch from Outlook to a Linux machine, for example),
the keep_cr setting is not honoured because keep_cr is only passed
to mailsplit, which does not look through the encoding.  The keep_cr
logic needs to be applied after the base64 decode.  I copied that
logic to handle_filter(), and rather than add a new keep_cr parameter
to handle_filter, I opted to add keep_cr to struct mailinfo; it seemed
appropriate given use_scissors was already there.

Then I needed to initialise keep_cr in the struct mailinfo passed from
git-am, and rather than thread a 'keep_cr' parameter all the way through
to parse_mail(), I decided to add keep_cr to struct am_state, which let
it be removed as a parameter from five other functions.

Signed-off-by: Matthew Wilcox 


A test exercising the new functionality would be nice.

Also, maybe a more descriptive title like "mailinfo: also respect 
keep_cr after base64 decode" (50 characters) is better.



@@ -143,6 +144,7 @@ static void am_state_init(struct am_state *state, const 
char *dir)

memset(state, 0, sizeof(*state));

+   state->keep_cr = -1;


Maybe query the git config here (instead of later) so that we never have 
to worry about state->keep_cr being neither 0 nor 1? This function 
already queries the git config anyway.



diff --git a/mailinfo.h b/mailinfo.h
index 04a25351d..9fddcf684 100644
--- a/mailinfo.h
+++ b/mailinfo.h
@@ -12,6 +12,7 @@ struct mailinfo {
struct strbuf email;
int keep_subject;
int keep_non_patch_brackets_in_subject;
+   int keep_cr;
int add_message_id;
int use_scissors;
int use_inbody_headers;


I personally would write "unsigned keep_cr : 1" to further emphasize 
that this can only be 0 or 1, but I don't know if it's better to keep 
with the style existing in the file (that is, using int).


Re: [PATCH] mailinfo.c: move side-effects outside of assert

2016-12-19 Thread Jonathan Tan

On 12/19/2016 12:38 PM, Kyle J. McKay wrote:

On Dec 19, 2016, at 12:03, Jeff King wrote:


On Sat, Dec 17, 2016 at 11:54:18AM -0800, Kyle J. McKay wrote:


Since 6b4b013f18 (mailinfo: handle in-body header continuations,
2016-09-20, v2.11.0) mailinfo.c has contained new code with an
assert of the form:

assert(call_a_function(...))


Thanks for spotting this - I'm not sure how I missed that.


This is obviously an improvement, but it makes me wonder if we should be
doing:

 if (!check_header(mi, >inbody_header_accum, mi->s_hdr_data))
die("BUG: some explanation of why this can never happen");

which perhaps documents the intended assumptions more clearly. A comment
regarding the side effects might also be helpful.


I wondered exactly the same thing myself.  I was hoping Jonathan would
pipe in here with some analysis about whether this is:

  a) a super paranoid, just-in-case, can't really ever fail because by
the time we get to this code we've already effectively validated
everything that could cause check_header to return false in this case

-or-

  b) Yeah, it could fail in the real world and it should "die" (and
probably have a test added that triggers such death)

-or-

  c) Actually, if check_header does return false we can keep going
without problem

-or-

  d) Actually, if check_header does return false we can keep going by
making a minor change that should be in the patch

I assume that since Jonathan added the code he will just know the answer
as to which one it is and I won't have to rely on the results of my
imaginary analysis.  ;)


The answer is "a". The only time that mi->inbody_header_accum is 
appended to is in check_inbody_header, and appending onto a blank 
mi->inbody_header_accum always happens when is_inbody_header is true 
(which guarantees a prefix that causes check_header to always return true).


Peff's suggestion sounds reasonable to me, maybe with an error message 
like "BUG: inbody_header_accum, if not empty, must always contain a 
valid in-body header".


[PATCH] transport-helper: drop broken "unchanged" feature

2016-11-30 Thread Jonathan Tan
Commit f8ec916 ("Allow helpers to report in "list" command that the ref
is unchanged", 2009-11-17) allowed a remote helper to report that a ref
is unchanged even if it did not know the contents of a ref. However,
that commit also made Git wrongly assume that such a remote ref always
has the contents of the local ref of the same name.

(Git also cannot assume that the remote ref has the value of the current
destination local ref, or any other ref, since the previous import/fetch
could have been made using a different refspec.)

Drop that assumption.

Signed-off-by: Jonathan Tan <jonathanta...@google.com>
---

Here is a patch that would remove the assumption (if it is indeed
wrong).

 Documentation/gitremote-helpers.txt |  3 +++
 transport-helper.c  | 43 -
 2 files changed, 3 insertions(+), 43 deletions(-)

diff --git a/Documentation/gitremote-helpers.txt 
b/Documentation/gitremote-helpers.txt
index 9e8681f..c862339 100644
--- a/Documentation/gitremote-helpers.txt
+++ b/Documentation/gitremote-helpers.txt
@@ -393,6 +393,9 @@ attributes are defined.
 'unchanged'::
This ref is unchanged since the last import or fetch, although
the helper cannot necessarily determine what value that produced.
+   Git may import or fetch this ref anyway, because it does not
+   keep a record of the last values corresponding to the refs of a
+   specific remote.
 
 OPTIONS
 ---
diff --git a/transport-helper.c b/transport-helper.c
index 91aed35..6ab8e2f 100644
--- a/transport-helper.c
+++ b/transport-helper.c
@@ -392,9 +392,6 @@ static int fetch_with_fetch(struct transport *transport,
 
for (i = 0; i < nr_heads; i++) {
const struct ref *posn = to_fetch[i];
-   if (posn->status & REF_STATUS_UPTODATE)
-   continue;
-
strbuf_addf(, "fetch %s %s\n",
oid_to_hex(>old_oid),
posn->symref ? posn->symref : posn->name);
@@ -492,9 +489,6 @@ static int fetch_with_import(struct transport *transport,
 
for (i = 0; i < nr_heads; i++) {
posn = to_fetch[i];
-   if (posn->status & REF_STATUS_UPTODATE)
-   continue;
-
strbuf_addf(, "import %s\n",
posn->symref ? posn->symref : posn->name);
sendline(data, );
@@ -531,8 +525,6 @@ static int fetch_with_import(struct transport *transport,
for (i = 0; i < nr_heads; i++) {
char *private, *name;
posn = to_fetch[i];
-   if (posn->status & REF_STATUS_UPTODATE)
-   continue;
name = posn->symref ? posn->symref : posn->name;
if (data->refspecs)
private = apply_refspecs(data->refspecs, 
data->refspec_nr, name);
@@ -649,21 +641,12 @@ static int fetch(struct transport *transport,
 int nr_heads, struct ref **to_fetch)
 {
struct helper_data *data = transport->data;
-   int i, count;
 
if (process_connect(transport, 0)) {
do_take_over(transport);
return transport->fetch(transport, nr_heads, to_fetch);
}
 
-   count = 0;
-   for (i = 0; i < nr_heads; i++)
-   if (!(to_fetch[i]->status & REF_STATUS_UPTODATE))
-   count++;
-
-   if (!count)
-   return 0;
-
if (data->check_connectivity &&
data->transport_options.check_self_contained_and_connected)
set_helper_option(transport, "check-connectivity", "true");
@@ -1009,23 +992,6 @@ static int push_refs(struct transport *transport,
return -1;
 }
 
-
-static int has_attribute(const char *attrs, const char *attr) {
-   int len;
-   if (!attrs)
-   return 0;
-
-   len = strlen(attr);
-   for (;;) {
-   const char *space = strchrnul(attrs, ' ');
-   if (len == space - attrs && !strncmp(attrs, attr, len))
-   return 1;
-   if (!*space)
-   return 0;
-   attrs = space + 1;
-   }
-}
-
 static struct ref *get_refs_list(struct transport *transport, int for_push)
 {
struct helper_data *data = transport->data;
@@ -1067,15 +1033,6 @@ static struct ref *get_refs_list(struct transport 
*transport, int for_push)
(*tail)->symref = xstrdup(buf.buf + 1);
else if (buf.buf[0] != '?')
get_oid_hex(buf.buf, &(*tail)->old_oid);
-   if (eon) {
-   if (has_attribute(eon + 1, "unchanged")) {
-   (*tail)->status |= REF_STATUS_UPTODATE;
- 

[BUG?] Remote helper's wrong assumption of unchanged ref

2016-11-30 Thread Jonathan Tan
Commit f8ec916 ("Allow helpers to report in "list" command that the ref
is unchanged", 2009-11-17) allowed a remote helper to report that a ref
is unchanged even if it did not know the contents of a ref. However,
that commit also made Git assume that such a remote ref has the contents
of the local ref of the same name. If I'm not missing anything, this
assumption seems wrong; the attached test illustrates one case of local
edits being made after cloning with default parameters.

The original e-mail thread [1] seems to indicate that this feature is
meant for a remote helper with no Git-specific code (which is possible
if it supports "import" but not "fetch" - in this case, it would not
deal with SHA-1s at all) to nevertheless indicate "unchanged", most
likely to support optimizations on the client side.

But it seems to me that Git cannot perform this optimization. In other
words, it should just ignore "unchanged". If this makes sense, I'll
prepare a patch to do this.

[1] "[PATCH 00/13] Native and foreign helpers"
<alpine.lnx.2.00.0908050052390.2...@iabervon.org>
---
 git-remote-testgit.sh |  8 +++-
 t/t5801-remote-helpers.sh | 40 
 2 files changed, 47 insertions(+), 1 deletion(-)

diff --git a/git-remote-testgit.sh b/git-remote-testgit.sh
index 752c763..6357868 100755
--- a/git-remote-testgit.sh
+++ b/git-remote-testgit.sh
@@ -52,7 +52,13 @@ do
echo
;;
list)
-   git for-each-ref --format='? %(refname)' 'refs/heads/'
+   if test -n "$GIT_REMOTE_TESTGIT_UNCHANGED_BRANCH_REGEX"
+   then
+   git for-each-ref --format='? %(refname)' 'refs/heads/' |
+   sed 
"/${GIT_REMOTE_TESTGIT_UNCHANGED_BRANCH_REGEX}/s/$/ unchanged/"
+   else
+   git for-each-ref --format='? %(refname)' 'refs/heads/'
+   fi
head=$(git symbolic-ref HEAD)
echo "@$head HEAD"
echo
diff --git a/t/t5801-remote-helpers.sh b/t/t5801-remote-helpers.sh
index 362b158..4a48f2b 100755
--- a/t/t5801-remote-helpers.sh
+++ b/t/t5801-remote-helpers.sh
@@ -301,4 +301,44 @@ test_expect_success 'fetch url' '
compare_refs server HEAD local FETCH_HEAD
 '
 
+test_expect_success 'setup remote repository and divergent clone' '
+   git init s2 &&
+   (
+   cd s2 &&
+   test_commit M1 &&
+   git checkout -b mybranch &&
+   test_commit B1
+   ) &&
+   git clone "testgit::${PWD}/s2" divergent &&
+
+   (
+   cd divergent &&
+   git checkout master &&
+   test_commit M2 &&
+   git checkout mybranch &&
+   test_commit B2
+   )
+'
+
+test_expect_success 'fetch with unchanged claims' '
+   rm -rf local &&
+   cp -r divergent local &&
+
+   # No unchanged branches
+
+   GIT_REMOTE_TESTGIT_UNCHANGED_BRANCH_REGEX=ABCDE git -C local fetch &&
+   compare_refs s2 M1 local refs/remotes/origin/master &&
+   compare_refs s2 B1 local refs/remotes/origin/mybranch &&
+
+   # One unchanged branch
+
+   GIT_REMOTE_TESTGIT_UNCHANGED_BRANCH_REGEX=mybranch git -C local fetch &&
+   compare_refs s2 M1 local refs/remotes/origin/master &&
+
+   # I (Jonathan Tan) would expect refs/remotes/origin/mybranch to be B1,
+   # but it is B2.
+   test_must_fail compare_refs s2 B1 local refs/remotes/origin/mybranch &&
+   compare_refs local B2 local refs/remotes/origin/mybranch
+'
+
 test_done
-- 
2.8.0.rc3.226.g39d4020



Re: What's cooking in git.git (Nov 2016, #05; Wed, 23)

2016-11-28 Thread Jonathan Tan
On Mon, Nov 28, 2016 at 4:06 PM, Junio C Hamano <gits...@pobox.com> wrote:
> Jonathan Tan <jonathanta...@google.com> writes:
>
>> On 11/23/2016 03:21 PM, Junio C Hamano wrote:
>>> * jt/use-trailer-api-in-commands (2016-11-02) 6 commits
>>>  - sequencer: use trailer's trailer layout
>>>  - trailer: have function to describe trailer layout
>>>  - trailer: avoid unnecessary splitting on lines
>>>  - commit: make ignore_non_trailer take buf/len
>>>  - SQUASH???
>>>  - trailer: be stricter in parsing separators
>>>
>>>  Commands that operate on a log message and add lines to the trailer
>>>  blocks, such as "format-patch -s", "cherry-pick (-x|-s)", and
>>>  "commit -s", have been taught to use the logic of and share the
>>>  code with "git interpret-trailer".
>>>
>>>  What's the doneness of this topic?
>>
>> Stefan Beller mentioned [1] that this seemed OK to him from a cursory
>> read. Do I need to look for another reviewer (or a more thorough
>> review)?
>
> I gave it a cursory review when it was queued, too, so another
> cursory read does not help very much ;)  If I recall correctly, I
> got an impression that it was reasonably well done.
>
> I haven't had a chance to look at the series again to see if the
> SQUASH is just the simple matter of squashing it into the one
> previous, which is the main reason why I haven't decided if it is
> ready to be in 'next'.
>
> Thanks.

Sorry, I'm not sure what you mean by this. The SQUASH is an update in
documentation (for a C function) (reproduced below [1]) which can be
squashed cleanly (just to confirm, I tried it using "git rebase -i").
It can also be rebased cleanly onto master or next (at least, as of
now).

If you are asking about whether I think the contents of your SQUASH
commit is reasonable, I think that it is.

[1]
$ git show 7e7587d
commit 7e7587d563ddb540875075e5a84359a8a96a5188
Author: Junio C Hamano <gits...@pobox.com>
Date:   Wed Nov 2 19:28:53 2016 -0700

SQUASH???

diff --git a/trailer.c b/trailer.c
index dc525e3..eefe91d 100644
--- a/trailer.c
+++ b/trailer.c
@@ -565,7 +565,9 @@ static int token_matches_item(const char *tok,
struct arg_item *item, int tok_le
 /*
  * If the given line is of the form
  * "..." or "...", return the
- * location of the separator. Otherwise, return -1.
+ * location of the separator. Otherwise, return -1.  The optional whitespace
+ * is allowed there primarily to allow things like "Bug #43" where  is
+ * "Bug" and  is "#".
  *
  * The separator-starts-line case (in which this function returns 0) is
  * distinguished from the non-well-formed-line case (in which this function


Re: What's cooking in git.git (Nov 2016, #05; Wed, 23)

2016-11-28 Thread Jonathan Tan

On 11/23/2016 03:21 PM, Junio C Hamano wrote:

* jt/use-trailer-api-in-commands (2016-11-02) 6 commits
 - sequencer: use trailer's trailer layout
 - trailer: have function to describe trailer layout
 - trailer: avoid unnecessary splitting on lines
 - commit: make ignore_non_trailer take buf/len
 - SQUASH???
 - trailer: be stricter in parsing separators

 Commands that operate on a log message and add lines to the trailer
 blocks, such as "format-patch -s", "cherry-pick (-x|-s)", and
 "commit -s", have been taught to use the logic of and share the
 code with "git interpret-trailer".

 What's the doneness of this topic?


Stefan Beller mentioned [1] that this seemed OK to him from a cursory 
read. Do I need to look for another reviewer (or a more thorough review)?


[1] 

[PATCH] doc: mention user-configured trailers

2016-11-21 Thread Jonathan Tan
In commit 1462450 ("trailer: allow non-trailers in trailer block",
2016-10-21), functionality was added (and tested [1]) to allow
non-trailer lines in trailer blocks, as long as those blocks contain at
least one Git-generated or user-configured trailer, and consists of at
least 25% trailers. The documentation was updated to mention this new
functionality, but did not mention "user-configured trailer".

Further update the documentation to also mention "user-configured
trailer".

[1] "with non-trailer lines mixed with a configured trailer" in
t/t7513-interpret-trailers.sh

Signed-off-by: Jonathan Tan <jonathanta...@google.com>
---

Yes, mentioning a trailer in a Git config will cause interpret-trailers
to treat it similarly to a Git-generated trailer (in that its presence
causes a block partially consisting of trailers to be considered a
trailer block). See the commit message above for a test case that
verifies that.

I took a look at the documentation, and it wasn't completely documented,
so here is a patch to correct that.

 Documentation/git-interpret-trailers.txt | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-interpret-trailers.txt 
b/Documentation/git-interpret-trailers.txt
index e99bda6..09074c7 100644
--- a/Documentation/git-interpret-trailers.txt
+++ b/Documentation/git-interpret-trailers.txt
@@ -49,7 +49,8 @@ will be added before the new trailer.
 
 Existing trailers are extracted from the input message by looking for
 a group of one or more lines that (i) are all trailers, or (ii) contains at
-least one Git-generated trailer and consists of at least 25% trailers.
+least one Git-generated or user-configured trailer and consists of at
+least 25% trailers.
 The group must be preceded by one or more empty (or whitespace-only) lines.
 The group must either be at the end of the message or be the last
 non-whitespace lines before a line that starts with '---'. Such three
-- 
2.8.0.rc3.226.g39d4020



Re: [PATCH v3 5/6] grep: enable recurse-submodules to work on objects

2016-11-14 Thread Jonathan Tan

On 11/14/2016 10:56 AM, Junio C Hamano wrote:

Jonathan Tan <jonathanta...@google.com> writes:


to:
HEAD:file
HEAD:sub/file

Signed-off-by: Brandon Williams <bmw...@google.com>
---


Unrelated tangent, but this makes readers wonder what the updated
trailer code would do to the last paragraph ;-).  Does it behave
sensibly (with some sane definition of sensibleness)?

I am guessing that it would, because neither To: or HEAD: is what we
normally recognize as a known trailer block element.


Yes, it behaves sensibly :-) because "Signed-off-by:" is preceded by a
blank line, so the trailer block consists only of that line.


Oh, that was not what I was wondering.  Imagine Brandon writing his
message that ends in these three questionable lines and then running
"commit -s --amend" to add his sign-off---that was the case I was
wondering.


Ah, I see. In that case, it would consider the last block as a trailer 
block and attach it directly:


  to:
  HEAD:file
  HEAD:sub/file
  Signed-off-by: ...

It is true that neither to: nor HEAD: are known trailers, but my patch 
set accepts trailer blocks that are 100% well-formed regardless of 
whether the trailers are known (to provide backwards compatibility with 
git-interpret-trailers, and to satisfy the certain use cases that I 
brought up). The "known trailer" check is used when the trailer block is 
not 100% well-formed.


This issue can be avoided if those lines were indented with at least one 
space or at least one tab.


Re: [PATCH v3 5/6] grep: enable recurse-submodules to work on objects

2016-11-14 Thread Jonathan Tan

On 11/14/2016 10:10 AM, Junio C Hamano wrote:

Brandon Williams  writes:


Teach grep to recursively search in submodules when provided with a
 object. This allows grep to search a submodule based on the state
of the submodule that is present in a commit of the super project.

When grep is provided with a  object, the name of the object is
prefixed to all output.  In order to provide uniformity of output
between the parent and child processes the option `--parent-basename`
has been added so that the child can preface all of it's output with the
name of the parent's object instead of the name of the commit SHA1 of
the submodule. This changes output from the command
`git grep -e. -l --recurse-submodules HEAD` from:
HEAD:file
:sub/file

to:
HEAD:file
HEAD:sub/file

Signed-off-by: Brandon Williams 
---


Unrelated tangent, but this makes readers wonder what the updated
trailer code would do to the last paragraph ;-).  Does it behave
sensibly (with some sane definition of sensibleness)?

I am guessing that it would, because neither To: or HEAD: is what we
normally recognize as a known trailer block element.


Yes, it behaves sensibly :-) because "Signed-off-by:" is preceded by a 
blank line, so the trailer block consists only of that line.


Having said that, it is probably better to indent those examples in the 
commit message (by at least one space or one tab) - then they will never 
be confused with trailers (once my patch set is in).


Re: [PATCH v2 5/6] grep: enable recurse-submodules to work on objects

2016-11-11 Thread Jonathan Tan

On 10/31/2016 03:38 PM, Brandon Williams wrote:

diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
index 17aa1ba..386a868 100644
--- a/Documentation/git-grep.txt
+++ b/Documentation/git-grep.txt
@@ -26,7 +26,7 @@ SYNOPSIS
   [--threads ]
   [-f ] [-e] 
   [--and|--or|--not|(|)|-e ...]
-  [--recurse-submodules]
+  [--recurse-submodules] [--parent-basename]


Maybe add something after --parent-basename, since it takes an argument 
(like --threads above).



@@ -91,7 +91,16 @@ OPTIONS

 --recurse-submodules::
Recursively search in each submodule that has been initialized and
-   checked out in the repository.
+   checked out in the repository.  When used in combination with the
+option the prefix of all submodule output will be the name of
+   the parent project's  object.
+
+--parent-basename::


Same comment as above.


diff --git a/builtin/grep.c b/builtin/grep.c
index cf4f51e..2f10930 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -19,6 +19,7 @@
 #include "dir.h"
 #include "pathspec.h"
 #include "submodule.h"
+#include "submodule-config.h"

 static char const * const grep_usage[] = {
N_("git grep [] [-e]  [...] [[--] ...]"),
@@ -28,6 +29,7 @@ static char const * const grep_usage[] = {
 static const char *super_prefix;
 static int recurse_submodules;
 static struct argv_array submodule_options = ARGV_ARRAY_INIT;
+static const char *parent_basename;


Can this be passed as an argument to the functions (grep_objects and 
grep_object, it seems) instead of having a file-visible variable?



@@ -671,12 +707,29 @@ static int grep_tree(struct grep_opt *opt, const struct 
pathspec *pathspec,
enum interesting match = entry_not_interesting;
struct name_entry entry;
int old_baselen = base->len;
+   struct strbuf name = STRBUF_INIT;
+   int name_base_len = 0;
+   if (super_prefix) {
+   name_base_len = strlen(super_prefix);
+   strbuf_addstr(, super_prefix);


Better to invoke strbuf_addstr, and then set name_base_len from 
name.len. This makes it clear where strbuf_setlen (subsequently) resets 
the strbuf to, and is also a slight performance improvement.




[PATCH] fetch: do not redundantly calculate tag refmap

2016-11-10 Thread Jonathan Tan
builtin/fetch.c redundantly calculates refmaps for tags twice. Remove
the first calculation.

This is only a code simplification and slight performance improvement -
the result is unchanged, as the redundant refmaps are subsequently
removed by the invocation to "ref_remove_duplicates" anyway.

This was introduced in commit c5a84e9 ("fetch --tags: fetch tags *in
addition to* other stuff", 2013-10-29) when modifying the effect of the
--tags parameter to "git fetch". The refmap-for-tag calculation was
copied instead of moved.

Signed-off-by: Jonathan Tan <jonathanta...@google.com>
---

(I noticed this when working on something in this file.)

 builtin/fetch.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index b6a5597..1d77e58 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -359,9 +359,6 @@ static struct ref *get_ref_map(struct transport *transport,
 
for (i = 0; i < fetch_refspec_nr; i++)
get_fetch_map(ref_map, _refspec[i], _tail, 
1);
-
-   if (tags == TAGS_SET)
-   get_fetch_map(remote_refs, tag_refspec, , 0);
} else if (refmap_array) {
die("--refmap option is only meaningful with command-line 
refspec(s).");
} else {
-- 
2.8.0.rc3.226.g39d4020



Re: [PATCH 4/5] grep: optionally recurse into submodules

2016-11-04 Thread Jonathan Tan
On Thu, Oct 27, 2016 at 3:38 PM, Brandon Williams  wrote:
> diff --git a/builtin/grep.c b/builtin/grep.c
> index 8887b6add..f34f16df9 100644
> --- a/builtin/grep.c
> +++ b/builtin/grep.c
> @@ -18,12 +18,20 @@
>  #include "quote.h"
>  #include "dir.h"
>  #include "pathspec.h"
> +#include "submodule.h"
>
>  static char const * const grep_usage[] = {
> N_("git grep [] [-e]  [...] [[--] ...]"),
> NULL
>  };
>
> +static const char *super_prefix;

I think that the super_prefix changes could be in its own patch.

> +static int recurse_submodules;
> +static struct argv_array submodule_options = ARGV_ARRAY_INIT;

I guess this has to be static because it is shared by multiple threads.

> +
> +static int grep_submodule_launch(struct grep_opt *opt,
> +const struct grep_source *gs);
> +
>  #define GREP_NUM_THREADS_DEFAULT 8
>  static int num_threads;
>
> @@ -174,7 +182,10 @@ static void *run(void *arg)
> break;
>
> opt->output_priv = w;
> -   hit |= grep_source(opt, >source);
> +   if (w->source.type == GREP_SOURCE_SUBMODULE)
> +   hit |= grep_submodule_launch(opt, >source);
> +   else
> +   hit |= grep_source(opt, >source);

It seems to me that GREP_SOURCE_SUBMODULE is of a different nature
than the other GREP_SOURCE_.* - in struct work_item, could we instead
have another variable that distinguishes between submodules and
"native" sources? This might also assuage Junio's concerns in
 about the nature of the
sources.

That variable could also be the discriminant for a tagged union, such
that we have "struct grep_source" for the "native" sources and a new
struct (holding only submodule-relevant information) for the
submodule.

> +/*
> + * Prep grep structures for a submodule grep
> + * sha1: the sha1 of the submodule or NULL if using the working tree
> + * filename: name of the submodule including tree name of parent
> + * path: location of the submodule
> + */
> +static int grep_submodule(struct grep_opt *opt, const unsigned char *sha1,
> + const char *filename, const char *path)
> +{
> +   if (!(is_submodule_initialized(path) &&
> + is_submodule_checked_out(path))) {
> +   warning("skiping submodule '%s%s' since it is not initialized 
> and checked out",
> +   super_prefix ? super_prefix: "",
> +   path);
> +   return 0;
> +   }
> +
> +#ifndef NO_PTHREADS
> +   if (num_threads) {
> +   add_work(opt, GREP_SOURCE_SUBMODULE, filename, path, sha1);
> +   return 0;
> +   } else
> +#endif
> +   {
> +   struct work_item w;
> +   int hit;
> +
> +   grep_source_init(, GREP_SOURCE_SUBMODULE,
> +filename, path, sha1);
> +   strbuf_init(, 0);
> +   opt->output_priv = 
> +   hit = grep_submodule_launch(opt, );
> +
> +   write_or_die(1, w.out.buf, w.out.len);
> +
> +   grep_source_clear();
> +   strbuf_release();
> +   return hit;
> +   }

This is at least the third invocation of this "if pthreads, add work,
otherwise do it now" pattern - could this be extracted into its own
function (in another patch)? Ideally, there would also be exactly one
function in which the grep_source.* functions are invoked, and both
"run" and the non-pthread code path can use it.

> +}
> +
> +static int grep_cache(struct grep_opt *opt, const struct pathspec *pathspec,
> + int cached)

This line isn't modified other than the line break, as far as I can
tell, so I wouldn't break it.

> diff --git a/t/t7814-grep-recurse-submodules.sh 
> b/t/t7814-grep-recurse-submodules.sh
> new file mode 100755
> index 0..b670c70cb
> --- /dev/null
> +++ b/t/t7814-grep-recurse-submodules.sh
> @@ -0,0 +1,99 @@
> +#!/bin/sh
> +
> +test_description='Test grep recurse-submodules feature
> +
> +This test verifies the recurse-submodules feature correctly greps across
> +submodules.
> +'
> +
> +. ./test-lib.sh
> +

Would it be possible to also test it while num_threads is zero? (Or,
if num_threads is already zero, to test it while it is not zero?)


Re: [PATCH v2 1/6] submodules: add helper functions to determine presence of submodules

2016-11-04 Thread Jonathan Tan

On 10/31/2016 03:38 PM, Brandon Williams wrote:

+   struct strbuf buf = STRBUF_INIT;
+   char *submodule_url = NULL;
+
+   strbuf_addf(, "submodule.%s.url", module->name);
+   ret = !git_config_get_string(buf.buf, _url);
+
+   free(submodule_url);
+   strbuf_release();
+   }
+
+   return ret;
+}
+
+/*
+ * Determine if a submodule has been checked out at a given 'path'
+ */
+int is_submodule_checked_out(const char *path)
+{
+   int ret = 0;
+   struct strbuf buf = STRBUF_INIT;
+
+   strbuf_addf(, "%s/.git", path);
+   ret = file_exists(buf.buf);
+
+   strbuf_release();


In this and the previous function, you can use xstrfmt.


[PATCH v3 2/5] commit: make ignore_non_trailer take buf/len

2016-11-02 Thread Jonathan Tan
Make ignore_non_trailer take a buf/len pair instead of struct strbuf.

Signed-off-by: Jonathan Tan <jonathanta...@google.com>
---
 builtin/commit.c |  2 +-
 commit.c | 22 +++---
 commit.h |  2 +-
 trailer.c|  2 +-
 4 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index 8976c3d..887ccc7 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -790,7 +790,7 @@ static int prepare_to_commit(const char *index_file, const 
char *prefix,
strbuf_stripspace(, 0);
 
if (signoff)
-   append_signoff(, ignore_non_trailer(), 0);
+   append_signoff(, ignore_non_trailer(sb.buf, sb.len), 0);
 
if (fwrite(sb.buf, 1, sb.len, s->fp) < sb.len)
die_errno(_("could not write commit template"));
diff --git a/commit.c b/commit.c
index 856fd4a..2cf8515 100644
--- a/commit.c
+++ b/commit.c
@@ -1649,7 +1649,7 @@ const char *find_commit_header(const char *msg, const 
char *key, size_t *out_len
 }
 
 /*
- * Inspect sb and determine the true "end" of the log message, in
+ * Inspect the given string and determine the true "end" of the log message, in
  * order to find where to put a new Signed-off-by: line.  Ignored are
  * trailing comment lines and blank lines, and also the traditional
  * "Conflicts:" block that is not commented out, so that we can use
@@ -1659,37 +1659,37 @@ const char *find_commit_header(const char *msg, const 
char *key, size_t *out_len
  * Returns the number of bytes from the tail to ignore, to be fed as
  * the second parameter to append_signoff().
  */
-int ignore_non_trailer(struct strbuf *sb)
+int ignore_non_trailer(const char *buf, size_t len)
 {
int boc = 0;
int bol = 0;
int in_old_conflicts_block = 0;
 
-   while (bol < sb->len) {
-   char *next_line;
+   while (bol < len) {
+   const char *next_line = memchr(buf + bol, '\n', len - bol);
 
-   if (!(next_line = memchr(sb->buf + bol, '\n', sb->len - bol)))
-   next_line = sb->buf + sb->len;
+   if (!next_line)
+   next_line = buf + len;
else
next_line++;
 
-   if (sb->buf[bol] == comment_line_char || sb->buf[bol] == '\n') {
+   if (buf[bol] == comment_line_char || buf[bol] == '\n') {
/* is this the first of the run of comments? */
if (!boc)
boc = bol;
/* otherwise, it is just continuing */
-   } else if (starts_with(sb->buf + bol, "Conflicts:\n")) {
+   } else if (starts_with(buf + bol, "Conflicts:\n")) {
in_old_conflicts_block = 1;
if (!boc)
boc = bol;
-   } else if (in_old_conflicts_block && sb->buf[bol] == '\t') {
+   } else if (in_old_conflicts_block && buf[bol] == '\t') {
; /* a pathname in the conflicts block */
} else if (boc) {
/* the previous was not trailing comment */
boc = 0;
in_old_conflicts_block = 0;
}
-   bol = next_line - sb->buf;
+   bol = next_line - buf;
}
-   return boc ? sb->len - boc : 0;
+   return boc ? len - boc : 0;
 }
diff --git a/commit.h b/commit.h
index afd14f3..9c12abb 100644
--- a/commit.h
+++ b/commit.h
@@ -355,7 +355,7 @@ extern const char *find_commit_header(const char *msg, 
const char *key,
  size_t *out_len);
 
 /* Find the end of the log message, the right place for a new trailer. */
-extern int ignore_non_trailer(struct strbuf *sb);
+extern int ignore_non_trailer(const char *buf, size_t len);
 
 typedef void (*each_mergetag_fn)(struct commit *commit, struct 
commit_extra_header *extra,
 void *cb_data);
diff --git a/trailer.c b/trailer.c
index dc525e3..9d7765e 100644
--- a/trailer.c
+++ b/trailer.c
@@ -840,7 +840,7 @@ static int find_trailer_end(struct strbuf **lines, int 
patch_start)
 
for (i = 0; i < patch_start; i++)
strbuf_addbuf(, lines[i]);
-   ignore_bytes = ignore_non_trailer();
+   ignore_bytes = ignore_non_trailer(sb.buf, sb.len);
strbuf_release();
for (i = patch_start - 1; i >= 0 && ignore_bytes > 0; i--)
ignore_bytes -= lines[i]->len;
-- 
2.8.0.rc3.226.g39d4020



[PATCH v3 3/5] trailer: avoid unnecessary splitting on lines

2016-11-02 Thread Jonathan Tan
trailer.c currently splits lines while processing a buffer (and also
rejoins lines when needing to invoke ignore_non_trailer).

Avoid such line splitting, except when generating the strings
corresponding to trailers (for ease of use by clients - a subsequent
patch will allow other components to obtain the layout of a trailer
block in a buffer, including the trailers themselves). The main purpose
of this is to make it easy to return pointers into the original buffer
(for a subsequent patch), but this also significantly reduces the number
of memory allocations required.

Signed-off-by: Jonathan Tan <jonathanta...@google.com>
---
 trailer.c | 194 --
 1 file changed, 100 insertions(+), 94 deletions(-)

diff --git a/trailer.c b/trailer.c
index 9d7765e..afbff4b 100644
--- a/trailer.c
+++ b/trailer.c
@@ -102,12 +102,12 @@ static int same_trailer(struct trailer_item *a, struct 
arg_item *b)
return same_token(a, b) && same_value(a, b);
 }
 
-static inline int contains_only_spaces(const char *str)
+static inline int is_blank_line(const char *str)
 {
const char *s = str;
-   while (*s && isspace(*s))
+   while (*s && *s != '\n' && isspace(*s))
s++;
-   return !*s;
+   return !*s || *s == '\n';
 }
 
 static inline void strbuf_replace(struct strbuf *sb, const char *a, const char 
*b)
@@ -700,51 +700,71 @@ static void process_command_line_args(struct list_head 
*arg_head,
free(cl_separators);
 }
 
-static struct strbuf **read_input_file(const char *file)
+static void read_input_file(struct strbuf *sb, const char *file)
 {
-   struct strbuf **lines;
-   struct strbuf sb = STRBUF_INIT;
-
if (file) {
-   if (strbuf_read_file(, file, 0) < 0)
+   if (strbuf_read_file(sb, file, 0) < 0)
die_errno(_("could not read input file '%s'"), file);
} else {
-   if (strbuf_read(, fileno(stdin), 0) < 0)
+   if (strbuf_read(sb, fileno(stdin), 0) < 0)
die_errno(_("could not read from stdin"));
}
+}
 
-   lines = strbuf_split(, '\n');
+static const char *next_line(const char *str)
+{
+   const char *nl = strchrnul(str, '\n');
+   return nl + !!*nl;
+}
 
-   strbuf_release();
+/*
+ * Return the position of the start of the last line. If len is 0, return -1.
+ */
+static int last_line(const char *buf, size_t len)
+{
+   int i;
+   if (len == 0)
+   return -1;
+   if (len == 1)
+   return 0;
+   /*
+* Skip the last character (in addition to the null terminator),
+* because if the last character is a newline, it is considered as part
+* of the last line anyway.
+*/
+   i = len - 2;
 
-   return lines;
+   for (; i >= 0; i--) {
+   if (buf[i] == '\n')
+   return i + 1;
+   }
+   return 0;
 }
 
 /*
- * Return the (0 based) index of the start of the patch or the line
- * count if there is no patch in the message.
+ * Return the position of the start of the patch or the length of str if there
+ * is no patch in the message.
  */
-static int find_patch_start(struct strbuf **lines, int count)
+static int find_patch_start(const char *str)
 {
-   int i;
+   const char *s;
 
-   /* Get the start of the patch part if any */
-   for (i = 0; i < count; i++) {
-   if (starts_with(lines[i]->buf, "---"))
-   return i;
+   for (s = str; *s; s = next_line(s)) {
+   if (starts_with(s, "---"))
+   return s - str;
}
 
-   return count;
+   return s - str;
 }
 
 /*
- * Return the (0 based) index of the first trailer line or count if
- * there are no trailers. Trailers are searched only in the lines from
- * index (count - 1) down to index 0.
+ * Return the position of the first trailer line or len if there are no
+ * trailers.
  */
-static int find_trailer_start(struct strbuf **lines, int count)
+static int find_trailer_start(const char *buf, size_t len)
 {
-   int start, end_of_title, only_spaces = 1;
+   const char *s;
+   int end_of_title, l, only_spaces = 1;
int recognized_prefix = 0, trailer_lines = 0, non_trailer_lines = 0;
/*
 * Number of possible continuation lines encountered. This will be
@@ -756,13 +776,13 @@ static int find_trailer_start(struct strbuf **lines, int 
count)
int possible_continuation_lines = 0;
 
/* The first paragraph is the title and cannot be trailers */
-   for (start = 0; start < count; start++) {
-   if (lines[start]->buf[0] == comment_line_char)
+   for (s = buf; s < buf + len; s = next_line(s)) {
+   if (s[0] == comment_line_char)
  

[PATCH v3 5/5] sequencer: use trailer's trailer layout

2016-11-02 Thread Jonathan Tan
Make sequencer use trailer.c's trailer layout definition, as opposed to
parsing the footer by itself. This makes "commit -s", "cherry-pick -x",
and "format-patch --signoff" consistent with trailer, allowing
non-trailer lines and multiple-line trailers in trailer blocks under
certain conditions, and therefore suppressing the extra newline in those
cases.

Consistency with trailer extends to respecting trailer configs.  Tests
have been included to show that.

Signed-off-by: Jonathan Tan <jonathanta...@google.com>
---
 sequencer.c  | 75 +---
 t/t3511-cherry-pick-x.sh | 16 +--
 t/t4014-format-patch.sh  | 37 
 t/t7501-commit.sh| 36 +++
 4 files changed, 95 insertions(+), 69 deletions(-)

diff --git a/sequencer.c b/sequencer.c
index 5fd75f3..d64c973 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -16,6 +16,7 @@
 #include "refs.h"
 #include "argv-array.h"
 #include "quote.h"
+#include "trailer.h"
 
 #define GIT_REFLOG_ACTION "GIT_REFLOG_ACTION"
 
@@ -56,30 +57,6 @@ static const char *get_todo_path(const struct replay_opts 
*opts)
return git_path_todo_file();
 }
 
-static int is_rfc2822_line(const char *buf, int len)
-{
-   int i;
-
-   for (i = 0; i < len; i++) {
-   int ch = buf[i];
-   if (ch == ':')
-   return 1;
-   if (!isalnum(ch) && ch != '-')
-   break;
-   }
-
-   return 0;
-}
-
-static int is_cherry_picked_from_line(const char *buf, int len)
-{
-   /*
-* We only care that it looks roughly like (cherry picked from ...)
-*/
-   return len > strlen(cherry_picked_prefix) + 1 &&
-   starts_with(buf, cherry_picked_prefix) && buf[len - 1] == ')';
-}
-
 /*
  * Returns 0 for non-conforming footer
  * Returns 1 for conforming footer
@@ -89,49 +66,25 @@ static int is_cherry_picked_from_line(const char *buf, int 
len)
 static int has_conforming_footer(struct strbuf *sb, struct strbuf *sob,
int ignore_footer)
 {
-   char prev;
-   int i, k;
-   int len = sb->len - ignore_footer;
-   const char *buf = sb->buf;
-   int found_sob = 0;
-
-   /* footer must end with newline */
-   if (!len || buf[len - 1] != '\n')
-   return 0;
+   struct trailer_info info;
+   int i;
+   int found_sob = 0, found_sob_last = 0;
 
-   prev = '\0';
-   for (i = len - 1; i > 0; i--) {
-   char ch = buf[i];
-   if (prev == '\n' && ch == '\n') /* paragraph break */
-   break;
-   prev = ch;
-   }
+   trailer_info_get(, sb->buf);
 
-   /* require at least one blank line */
-   if (prev != '\n' || buf[i] != '\n')
+   if (info.trailer_start == info.trailer_end)
return 0;
 
-   /* advance to start of last paragraph */
-   while (i < len - 1 && buf[i] == '\n')
-   i++;
-
-   for (; i < len; i = k) {
-   int found_rfc2822;
-
-   for (k = i; k < len && buf[k] != '\n'; k++)
-   ; /* do nothing */
-   k++;
+   for (i = 0; i < info.trailer_nr; i++)
+   if (sob && !strncmp(info.trailers[i], sob->buf, sob->len)) {
+   found_sob = 1;
+   if (i == info.trailer_nr - 1)
+   found_sob_last = 1;
+   }
 
-   found_rfc2822 = is_rfc2822_line(buf + i, k - i - 1);
-   if (found_rfc2822 && sob &&
-   !strncmp(buf + i, sob->buf, sob->len))
-   found_sob = k;
+   trailer_info_release();
 
-   if (!(found_rfc2822 ||
- is_cherry_picked_from_line(buf + i, k - i - 1)))
-   return 0;
-   }
-   if (found_sob == i)
+   if (found_sob_last)
return 3;
if (found_sob)
return 2;
diff --git a/t/t3511-cherry-pick-x.sh b/t/t3511-cherry-pick-x.sh
index 9cce5ae..bf0a5c9 100755
--- a/t/t3511-cherry-pick-x.sh
+++ b/t/t3511-cherry-pick-x.sh
@@ -25,9 +25,8 @@ Signed-off-by: B.U. Thor <but...@example.com>"
 
 mesg_broken_footer="$mesg_no_footer
 
-The signed-off-by string should begin with the words Signed-off-by followed
-by a colon and space, and then the signers name and email address. e.g.
-Signed-off-by: $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL>"
+This is not recognized as a footer because Myfooter is not a recognized token.
+Myfooter: A.U. Thor <aut...@example.com>"
 
 mesg_with_footer_sob="$mesg_with_footer
 Signed-off-by: $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL>"
@@ -112,6 +111,17 @@ test_e

[PATCH v3 4/5] trailer: have function to describe trailer layout

2016-11-02 Thread Jonathan Tan
Create a function that, taking a string, describes the position of its
trailer block (if available) and the contents thereof, and make trailer
use it. This makes it easier for other Git components, in the future, to
interpret trailer blocks in the same way as trailer.

In a subsequent patch, another component will be made to use this.

Signed-off-by: Jonathan Tan <jonathanta...@google.com>
---
 trailer.c | 118 +++---
 trailer.h |  25 +
 2 files changed, 107 insertions(+), 36 deletions(-)

diff --git a/trailer.c b/trailer.c
index afbff4b..bc6893b 100644
--- a/trailer.c
+++ b/trailer.c
@@ -46,6 +46,8 @@ static LIST_HEAD(conf_head);
 
 static char *separators = ":";
 
+static int configured;
+
 #define TRAILER_ARG_STRING "$ARG"
 
 static const char *git_generated_prefixes[] = {
@@ -546,6 +548,17 @@ static int git_trailer_config(const char *conf_key, const 
char *value, void *cb)
return 0;
 }
 
+static void ensure_configured(void)
+{
+   if (configured)
+   return;
+
+   /* Default config must be setup first */
+   git_config(git_trailer_default_config, NULL);
+   git_config(git_trailer_config, NULL);
+   configured = 1;
+}
+
 static const char *token_from_item(struct arg_item *item, char *tok)
 {
if (item->conf.key)
@@ -873,59 +886,43 @@ static int process_input_file(FILE *outfile,
  const char *str,
  struct list_head *head)
 {
-   int patch_start, trailer_start, trailer_end;
+   struct trailer_info info;
struct strbuf tok = STRBUF_INIT;
struct strbuf val = STRBUF_INIT;
-   struct trailer_item *last = NULL;
-   struct strbuf *trailer, **trailer_lines, **ptr;
+   int i;
 
-   patch_start = find_patch_start(str);
-   trailer_end = find_trailer_end(str, patch_start);
-   trailer_start = find_trailer_start(str, trailer_end);
+   trailer_info_get(, str);
 
/* Print lines before the trailers as is */
-   fwrite(str, 1, trailer_start, outfile);
+   fwrite(str, 1, info.trailer_start - str, outfile);
 
-   if (!ends_with_blank_line(str, trailer_start))
+   if (!info.blank_line_before_trailer)
fprintf(outfile, "\n");
 
-   /* Parse trailer lines */
-   trailer_lines = strbuf_split_buf(str + trailer_start,
-trailer_end - trailer_start,
-'\n',
-0);
-   for (ptr = trailer_lines; *ptr; ptr++) {
+   for (i = 0; i < info.trailer_nr; i++) {
int separator_pos;
-   trailer = *ptr;
-   if (trailer->buf[0] == comment_line_char)
-   continue;
-   if (last && isspace(trailer->buf[0])) {
-   struct strbuf sb = STRBUF_INIT;
-   strbuf_addf(, "%s\n%s", last->value, trailer->buf);
-   strbuf_strip_suffix(, "\n");
-   free(last->value);
-   last->value = strbuf_detach(, NULL);
+   char *trailer = info.trailers[i];
+   if (trailer[0] == comment_line_char)
continue;
-   }
-   separator_pos = find_separator(trailer->buf, separators);
+   separator_pos = find_separator(trailer, separators);
if (separator_pos >= 1) {
-   parse_trailer(, , NULL, trailer->buf,
+   parse_trailer(, , NULL, trailer,
  separator_pos);
-   last = add_trailer_item(head,
-   strbuf_detach(, NULL),
-   strbuf_detach(, NULL));
+   add_trailer_item(head,
+strbuf_detach(, NULL),
+strbuf_detach(, NULL));
} else {
-   strbuf_addbuf(, trailer);
+   strbuf_addstr(, trailer);
strbuf_strip_suffix(, "\n");
add_trailer_item(head,
 NULL,
 strbuf_detach(, NULL));
-   last = NULL;
}
}
-   strbuf_list_free(trailer_lines);
 
-   return trailer_end;
+   trailer_info_release();
+
+   return info.trailer_end - str;
 }
 
 static void free_all(struct list_head *head)
@@ -976,9 +973,7 @@ void process_trailers(const char *file, int in_place, int 
trim_empty, struct str
int trailer_end;
FILE *outfile = stdout;
 
-   /* Default config must be setup first */
-   git_config

[PATCH v3 0/5] Make other git commands use trailer layout

2016-11-02 Thread Jonathan Tan
This is the same as v2 except that in 1/5, the comment about
find_separators has been moved from the commit message to the code
itself. Also, a trailing whitespace and unused variable fix.

Jonathan Tan (5):
  trailer: be stricter in parsing separators
  commit: make ignore_non_trailer take buf/len
  trailer: avoid unnecessary splitting on lines
  trailer: have function to describe trailer layout
  sequencer: use trailer's trailer layout

 builtin/commit.c |   2 +-
 commit.c |  22 ++--
 commit.h |   2 +-
 sequencer.c  |  75 +++-
 t/t3511-cherry-pick-x.sh |  16 ++-
 t/t4014-format-patch.sh  |  37 +-
 t/t7501-commit.sh|  36 ++
 trailer.c| 299 +--
 trailer.h|  25 
 9 files changed, 316 insertions(+), 198 deletions(-)

-- 
2.8.0.rc3.226.g39d4020



[PATCH v3 1/5] trailer: be stricter in parsing separators

2016-11-02 Thread Jonathan Tan
Currently, a line is interpreted to be a trailer line if it contains a
separator. Make parsing stricter by requiring the text on the left of
the separator, if not the empty string, to be of the "" form.

Signed-off-by: Jonathan Tan <jonathanta...@google.com>
---
 trailer.c | 27 +--
 1 file changed, 21 insertions(+), 6 deletions(-)

diff --git a/trailer.c b/trailer.c
index f0ecde2..dc525e3 100644
--- a/trailer.c
+++ b/trailer.c
@@ -563,15 +563,30 @@ static int token_matches_item(const char *tok, struct 
arg_item *item, int tok_le
 }
 
 /*
- * Return the location of the first separator in line, or -1 if there is no
- * separator.
+ * If the given line is of the form
+ * "..." or "...", return the
+ * location of the separator. Otherwise, return -1.
+ *
+ * The separator-starts-line case (in which this function returns 0) is
+ * distinguished from the non-well-formed-line case (in which this function
+ * returns -1) because some callers of this function need such a distinction.
  */
 static int find_separator(const char *line, const char *separators)
 {
-   int loc = strcspn(line, separators);
-   if (!line[loc])
-   return -1;
-   return loc;
+   int whitespace_found = 0;
+   const char *c;
+   for (c = line; *c; c++) {
+   if (strchr(separators, *c))
+   return c - line;
+   if (!whitespace_found && (isalnum(*c) || *c == '-'))
+   continue;
+   if (c != line && (*c == ' ' || *c == '\t')) {
+   whitespace_found = 1;
+   continue;
+   }
+   break;
+   }
+   return -1;
 }
 
 /*
-- 
2.8.0.rc3.226.g39d4020



Re: [PATCH v2 1/5] trailer: be stricter in parsing separators

2016-11-01 Thread Jonathan Tan

On 11/01/2016 01:37 PM, Junio C Hamano wrote:

Junio C Hamano <gits...@pobox.com> writes:


Jonathan Tan <jonathanta...@google.com> writes:


Currently, a line is interpreted to be a trailer line if it contains a
separator. Make parsing stricter by requiring the text on the left of
the separator, if not the empty string, to be of the "" form.


Hmph.  The optional whitespace is to allow for what kind of line?

It is not for "Signed off by:" that is a misspelt "Signed-off-by:";
it may not hurt but I do not think of a case that would be useful
offhand.


This is to allow trailers of the form "Fix #42" (mentioned in the 
git-interpret-trailers documentation and also in the test).



Other than this "Hmph" (which is not an objection---just something
that the reviewer did not understand), the rest looked good to me.

Will re-queue.

Thanks.


Thanks!


[PATCH v2 0/5] Make other git commands use trailer layout

2016-11-01 Thread Jonathan Tan
Thanks for all your comments.

This patch set is now built off master (since jt/trailer-with-cruft is
merged).

I couldn't think of an easy way to clearly decide if a token with spaces
should be considered a token, so I've tightened the restrictions. One
benefit is that we no longer need to create temporary strings that
include '\n' to be passed into the find_separator method.

In 2/4 (now 3/5), I've also changed some variable names as requested
(e.g. sb -> input, and un-did some others).

Jonathan Tan (5):
  trailer: be stricter in parsing separators
  commit: make ignore_non_trailer take buf/len
  trailer: avoid unnecessary splitting on lines
  trailer: have function to describe trailer layout
  sequencer: use trailer's trailer layout

 builtin/commit.c |   2 +-
 commit.c |  22 ++--
 commit.h |   2 +-
 sequencer.c  |  75 +++-
 t/t3511-cherry-pick-x.sh |  16 ++-
 t/t4014-format-patch.sh  |  37 +-
 t/t7501-commit.sh|  36 ++
 trailer.c| 296 ---
 trailer.h|  25 
 9 files changed, 313 insertions(+), 198 deletions(-)

-- 
2.8.0.rc3.226.g39d4020



[PATCH v2 1/5] trailer: be stricter in parsing separators

2016-11-01 Thread Jonathan Tan
Currently, a line is interpreted to be a trailer line if it contains a
separator. Make parsing stricter by requiring the text on the left of
the separator, if not the empty string, to be of the "" form.

(The find_separator function distinguishes the no-separator case from
the separator-starts-line case because some callers of this function
need such a distinction.)

Signed-off-by: Jonathan Tan <jonathanta...@google.com>
---
 trailer.c | 23 +--
 1 file changed, 17 insertions(+), 6 deletions(-)

diff --git a/trailer.c b/trailer.c
index f0ecde2..0ee634f 100644
--- a/trailer.c
+++ b/trailer.c
@@ -563,15 +563,26 @@ static int token_matches_item(const char *tok, struct 
arg_item *item, int tok_le
 }
 
 /*
- * Return the location of the first separator in line, or -1 if there is no
- * separator.
+ * If the given line is of the form 
+ * "..." or "...", return the
+ * location of the separator. Otherwise, return -1.
  */
 static int find_separator(const char *line, const char *separators)
 {
-   int loc = strcspn(line, separators);
-   if (!line[loc])
-   return -1;
-   return loc;
+   int whitespace_found = 0;
+   const char *c;
+   for (c = line; *c; c++) {
+   if (strchr(separators, *c))
+   return c - line;
+   if (!whitespace_found && (isalnum(*c) || *c == '-'))
+   continue;
+   if (c != line && (*c == ' ' || *c == '\t')) {
+   whitespace_found = 1;
+   continue;
+   }
+   break;
+   }
+   return -1;
 }
 
 /*
-- 
2.8.0.rc3.226.g39d4020



[PATCH v2 3/5] trailer: avoid unnecessary splitting on lines

2016-11-01 Thread Jonathan Tan
trailer.c currently splits lines while processing a buffer (and also
rejoins lines when needing to invoke ignore_non_trailer).

Avoid such line splitting, except when generating the strings
corresponding to trailers (for ease of use by clients - a subsequent
patch will allow other components to obtain the layout of a trailer
block in a buffer, including the trailers themselves). The main purpose
of this is to make it easy to return pointers into the original buffer
(for a subsequent patch), but this also significantly reduces the number
of memory allocations required.

Signed-off-by: Jonathan Tan <jonathanta...@google.com>
---
 trailer.c | 195 --
 1 file changed, 101 insertions(+), 94 deletions(-)

diff --git a/trailer.c b/trailer.c
index 04edab2..f5427ec 100644
--- a/trailer.c
+++ b/trailer.c
@@ -102,12 +102,12 @@ static int same_trailer(struct trailer_item *a, struct 
arg_item *b)
return same_token(a, b) && same_value(a, b);
 }
 
-static inline int contains_only_spaces(const char *str)
+static inline int is_blank_line(const char *str)
 {
const char *s = str;
-   while (*s && isspace(*s))
+   while (*s && *s != '\n' && isspace(*s))
s++;
-   return !*s;
+   return !*s || *s == '\n';
 }
 
 static inline void strbuf_replace(struct strbuf *sb, const char *a, const char 
*b)
@@ -696,51 +696,71 @@ static void process_command_line_args(struct list_head 
*arg_head,
free(cl_separators);
 }
 
-static struct strbuf **read_input_file(const char *file)
+static void read_input_file(struct strbuf *sb, const char *file)
 {
-   struct strbuf **lines;
-   struct strbuf sb = STRBUF_INIT;
-
if (file) {
-   if (strbuf_read_file(, file, 0) < 0)
+   if (strbuf_read_file(sb, file, 0) < 0)
die_errno(_("could not read input file '%s'"), file);
} else {
-   if (strbuf_read(, fileno(stdin), 0) < 0)
+   if (strbuf_read(sb, fileno(stdin), 0) < 0)
die_errno(_("could not read from stdin"));
}
+}
 
-   lines = strbuf_split(, '\n');
+static const char *next_line(const char *str)
+{
+   const char *nl = strchrnul(str, '\n');
+   return nl + !!*nl;
+}
 
-   strbuf_release();
+/*
+ * Return the position of the start of the last line. If len is 0, return -1.
+ */
+static int last_line(const char *buf, size_t len)
+{
+   int i;
+   if (len == 0)
+   return -1;
+   if (len == 1)
+   return 0;
+   /*
+* Skip the last character (in addition to the null terminator),
+* because if the last character is a newline, it is considered as part
+* of the last line anyway.
+*/
+   i = len - 2;
 
-   return lines;
+   for (; i >= 0; i--) {
+   if (buf[i] == '\n')
+   return i + 1;
+   }
+   return 0;
 }
 
 /*
- * Return the (0 based) index of the start of the patch or the line
- * count if there is no patch in the message.
+ * Return the position of the start of the patch or the length of str if there
+ * is no patch in the message.
  */
-static int find_patch_start(struct strbuf **lines, int count)
+static int find_patch_start(const char *str)
 {
-   int i;
+   const char *s;
 
-   /* Get the start of the patch part if any */
-   for (i = 0; i < count; i++) {
-   if (starts_with(lines[i]->buf, "---"))
-   return i;
+   for (s = str; *s; s = next_line(s)) {
+   if (starts_with(s, "---"))
+   return s - str;
}
 
-   return count;
+   return s - str;
 }
 
 /*
- * Return the (0 based) index of the first trailer line or count if
- * there are no trailers. Trailers are searched only in the lines from
- * index (count - 1) down to index 0.
+ * Return the position of the first trailer line or len if there are no
+ * trailers.
  */
-static int find_trailer_start(struct strbuf **lines, int count)
+static int find_trailer_start(const char *buf, size_t len)
 {
-   int start, end_of_title, only_spaces = 1;
+   const char *s;
+   int end_of_title, l, only_spaces = 1;
int recognized_prefix = 0, trailer_lines = 0, non_trailer_lines = 0;
/*
 * Number of possible continuation lines encountered. This will be
@@ -750,15 +770,16 @@ static int find_trailer_start(struct strbuf **lines, int 
count)
 * are to be considered non-trailers).
 */
int possible_continuation_lines = 0;
+   int ret;
 
/* The first paragraph is the title and cannot be trailers */
-   for (start = 0; start < count; start++) {
-   if (lines[start]->buf[0] == comment_line_char)
+   for (s = buf; s < bu

[PATCH v2 4/5] trailer: have function to describe trailer layout

2016-11-01 Thread Jonathan Tan
Create a function that, taking a string, describes the position of its
trailer block (if available) and the contents thereof, and make trailer
use it. This makes it easier for other Git components, in the future, to
interpret trailer blocks in the same way as trailer.

In a subsequent patch, another component will be made to use this.

Signed-off-by: Jonathan Tan <jonathanta...@google.com>
---
 trailer.c | 118 +++---
 trailer.h |  25 +
 2 files changed, 107 insertions(+), 36 deletions(-)

diff --git a/trailer.c b/trailer.c
index f5427ec..7265a50 100644
--- a/trailer.c
+++ b/trailer.c
@@ -46,6 +46,8 @@ static LIST_HEAD(conf_head);
 
 static char *separators = ":";
 
+static int configured;
+
 #define TRAILER_ARG_STRING "$ARG"
 
 static const char *git_generated_prefixes[] = {
@@ -546,6 +548,17 @@ static int git_trailer_config(const char *conf_key, const 
char *value, void *cb)
return 0;
 }
 
+static void ensure_configured(void)
+{
+   if (configured)
+   return;
+
+   /* Default config must be setup first */
+   git_config(git_trailer_default_config, NULL);
+   git_config(git_trailer_config, NULL);
+   configured = 1;
+}
+
 static const char *token_from_item(struct arg_item *item, char *tok)
 {
if (item->conf.key)
@@ -870,59 +883,43 @@ static int process_input_file(FILE *outfile,
  const char *str,
  struct list_head *head)
 {
-   int patch_start, trailer_start, trailer_end;
+   struct trailer_info info;
struct strbuf tok = STRBUF_INIT;
struct strbuf val = STRBUF_INIT;
-   struct trailer_item *last = NULL;
-   struct strbuf *trailer, **trailer_lines, **ptr;
+   int i;
 
-   patch_start = find_patch_start(str);
-   trailer_end = find_trailer_end(str, patch_start);
-   trailer_start = find_trailer_start(str, trailer_end);
+   trailer_info_get(, str);
 
/* Print lines before the trailers as is */
-   fwrite(str, 1, trailer_start, outfile);
+   fwrite(str, 1, info.trailer_start - str, outfile);
 
-   if (!ends_with_blank_line(str, trailer_start))
+   if (!info.blank_line_before_trailer)
fprintf(outfile, "\n");
 
-   /* Parse trailer lines */
-   trailer_lines = strbuf_split_buf(str + trailer_start,
-trailer_end - trailer_start,
-'\n',
-0);
-   for (ptr = trailer_lines; *ptr; ptr++) {
+   for (i = 0; i < info.trailer_nr; i++) {
int separator_pos;
-   trailer = *ptr;
-   if (trailer->buf[0] == comment_line_char)
-   continue;
-   if (last && isspace(trailer->buf[0])) {
-   struct strbuf sb = STRBUF_INIT;
-   strbuf_addf(, "%s\n%s", last->value, trailer->buf);
-   strbuf_strip_suffix(, "\n");
-   free(last->value);
-   last->value = strbuf_detach(, NULL);
+   char *trailer = info.trailers[i];
+   if (trailer[0] == comment_line_char)
continue;
-   }
-   separator_pos = find_separator(trailer->buf, separators);
+   separator_pos = find_separator(trailer, separators);
if (separator_pos >= 1) {
-   parse_trailer(, , NULL, trailer->buf,
+   parse_trailer(, , NULL, trailer,
  separator_pos);
-   last = add_trailer_item(head,
-   strbuf_detach(, NULL),
-   strbuf_detach(, NULL));
+   add_trailer_item(head,
+strbuf_detach(, NULL),
+strbuf_detach(, NULL));
} else {
-   strbuf_addbuf(, trailer);
+   strbuf_addstr(, trailer);
strbuf_strip_suffix(, "\n");
add_trailer_item(head,
 NULL,
 strbuf_detach(, NULL));
-   last = NULL;
}
}
-   strbuf_list_free(trailer_lines);
 
-   return trailer_end;
+   trailer_info_release();
+
+   return info.trailer_end - str;
 }
 
 static void free_all(struct list_head *head)
@@ -973,9 +970,7 @@ void process_trailers(const char *file, int in_place, int 
trim_empty, struct str
int trailer_end;
FILE *outfile = stdout;
 
-   /* Default config must be setup first */
-   git_config

[PATCH v2 2/5] commit: make ignore_non_trailer take buf/len

2016-11-01 Thread Jonathan Tan
Make ignore_non_trailer take a buf/len pair instead of struct strbuf.

Signed-off-by: Jonathan Tan <jonathanta...@google.com>
---
 builtin/commit.c |  2 +-
 commit.c | 22 +++---
 commit.h |  2 +-
 trailer.c|  2 +-
 4 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index 8976c3d..887ccc7 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -790,7 +790,7 @@ static int prepare_to_commit(const char *index_file, const 
char *prefix,
strbuf_stripspace(, 0);
 
if (signoff)
-   append_signoff(, ignore_non_trailer(), 0);
+   append_signoff(, ignore_non_trailer(sb.buf, sb.len), 0);
 
if (fwrite(sb.buf, 1, sb.len, s->fp) < sb.len)
die_errno(_("could not write commit template"));
diff --git a/commit.c b/commit.c
index 856fd4a..2cf8515 100644
--- a/commit.c
+++ b/commit.c
@@ -1649,7 +1649,7 @@ const char *find_commit_header(const char *msg, const 
char *key, size_t *out_len
 }
 
 /*
- * Inspect sb and determine the true "end" of the log message, in
+ * Inspect the given string and determine the true "end" of the log message, in
  * order to find where to put a new Signed-off-by: line.  Ignored are
  * trailing comment lines and blank lines, and also the traditional
  * "Conflicts:" block that is not commented out, so that we can use
@@ -1659,37 +1659,37 @@ const char *find_commit_header(const char *msg, const 
char *key, size_t *out_len
  * Returns the number of bytes from the tail to ignore, to be fed as
  * the second parameter to append_signoff().
  */
-int ignore_non_trailer(struct strbuf *sb)
+int ignore_non_trailer(const char *buf, size_t len)
 {
int boc = 0;
int bol = 0;
int in_old_conflicts_block = 0;
 
-   while (bol < sb->len) {
-   char *next_line;
+   while (bol < len) {
+   const char *next_line = memchr(buf + bol, '\n', len - bol);
 
-   if (!(next_line = memchr(sb->buf + bol, '\n', sb->len - bol)))
-   next_line = sb->buf + sb->len;
+   if (!next_line)
+   next_line = buf + len;
else
next_line++;
 
-   if (sb->buf[bol] == comment_line_char || sb->buf[bol] == '\n') {
+   if (buf[bol] == comment_line_char || buf[bol] == '\n') {
/* is this the first of the run of comments? */
if (!boc)
boc = bol;
/* otherwise, it is just continuing */
-   } else if (starts_with(sb->buf + bol, "Conflicts:\n")) {
+   } else if (starts_with(buf + bol, "Conflicts:\n")) {
in_old_conflicts_block = 1;
if (!boc)
boc = bol;
-   } else if (in_old_conflicts_block && sb->buf[bol] == '\t') {
+   } else if (in_old_conflicts_block && buf[bol] == '\t') {
; /* a pathname in the conflicts block */
} else if (boc) {
/* the previous was not trailing comment */
boc = 0;
in_old_conflicts_block = 0;
}
-   bol = next_line - sb->buf;
+   bol = next_line - buf;
}
-   return boc ? sb->len - boc : 0;
+   return boc ? len - boc : 0;
 }
diff --git a/commit.h b/commit.h
index afd14f3..9c12abb 100644
--- a/commit.h
+++ b/commit.h
@@ -355,7 +355,7 @@ extern const char *find_commit_header(const char *msg, 
const char *key,
  size_t *out_len);
 
 /* Find the end of the log message, the right place for a new trailer. */
-extern int ignore_non_trailer(struct strbuf *sb);
+extern int ignore_non_trailer(const char *buf, size_t len);
 
 typedef void (*each_mergetag_fn)(struct commit *commit, struct 
commit_extra_header *extra,
 void *cb_data);
diff --git a/trailer.c b/trailer.c
index 0ee634f..04edab2 100644
--- a/trailer.c
+++ b/trailer.c
@@ -836,7 +836,7 @@ static int find_trailer_end(struct strbuf **lines, int 
patch_start)
 
for (i = 0; i < patch_start; i++)
strbuf_addbuf(, lines[i]);
-   ignore_bytes = ignore_non_trailer();
+   ignore_bytes = ignore_non_trailer(sb.buf, sb.len);
strbuf_release();
for (i = patch_start - 1; i >= 0 && ignore_bytes > 0; i--)
ignore_bytes -= lines[i]->len;
-- 
2.8.0.rc3.226.g39d4020



[PATCH v2 5/5] sequencer: use trailer's trailer layout

2016-11-01 Thread Jonathan Tan
Make sequencer use trailer.c's trailer layout definition, as opposed to
parsing the footer by itself. This makes "commit -s", "cherry-pick -x",
and "format-patch --signoff" consistent with trailer, allowing
non-trailer lines and multiple-line trailers in trailer blocks under
certain conditions, and therefore suppressing the extra newline in those
cases.

Consistency with trailer extends to respecting trailer configs.  Tests
have been included to show that.

Signed-off-by: Jonathan Tan <jonathanta...@google.com>
---
 sequencer.c  | 75 +---
 t/t3511-cherry-pick-x.sh | 16 +--
 t/t4014-format-patch.sh  | 37 
 t/t7501-commit.sh| 36 +++
 4 files changed, 95 insertions(+), 69 deletions(-)

diff --git a/sequencer.c b/sequencer.c
index 5fd75f3..d64c973 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -16,6 +16,7 @@
 #include "refs.h"
 #include "argv-array.h"
 #include "quote.h"
+#include "trailer.h"
 
 #define GIT_REFLOG_ACTION "GIT_REFLOG_ACTION"
 
@@ -56,30 +57,6 @@ static const char *get_todo_path(const struct replay_opts 
*opts)
return git_path_todo_file();
 }
 
-static int is_rfc2822_line(const char *buf, int len)
-{
-   int i;
-
-   for (i = 0; i < len; i++) {
-   int ch = buf[i];
-   if (ch == ':')
-   return 1;
-   if (!isalnum(ch) && ch != '-')
-   break;
-   }
-
-   return 0;
-}
-
-static int is_cherry_picked_from_line(const char *buf, int len)
-{
-   /*
-* We only care that it looks roughly like (cherry picked from ...)
-*/
-   return len > strlen(cherry_picked_prefix) + 1 &&
-   starts_with(buf, cherry_picked_prefix) && buf[len - 1] == ')';
-}
-
 /*
  * Returns 0 for non-conforming footer
  * Returns 1 for conforming footer
@@ -89,49 +66,25 @@ static int is_cherry_picked_from_line(const char *buf, int 
len)
 static int has_conforming_footer(struct strbuf *sb, struct strbuf *sob,
int ignore_footer)
 {
-   char prev;
-   int i, k;
-   int len = sb->len - ignore_footer;
-   const char *buf = sb->buf;
-   int found_sob = 0;
-
-   /* footer must end with newline */
-   if (!len || buf[len - 1] != '\n')
-   return 0;
+   struct trailer_info info;
+   int i;
+   int found_sob = 0, found_sob_last = 0;
 
-   prev = '\0';
-   for (i = len - 1; i > 0; i--) {
-   char ch = buf[i];
-   if (prev == '\n' && ch == '\n') /* paragraph break */
-   break;
-   prev = ch;
-   }
+   trailer_info_get(, sb->buf);
 
-   /* require at least one blank line */
-   if (prev != '\n' || buf[i] != '\n')
+   if (info.trailer_start == info.trailer_end)
return 0;
 
-   /* advance to start of last paragraph */
-   while (i < len - 1 && buf[i] == '\n')
-   i++;
-
-   for (; i < len; i = k) {
-   int found_rfc2822;
-
-   for (k = i; k < len && buf[k] != '\n'; k++)
-   ; /* do nothing */
-   k++;
+   for (i = 0; i < info.trailer_nr; i++)
+   if (sob && !strncmp(info.trailers[i], sob->buf, sob->len)) {
+   found_sob = 1;
+   if (i == info.trailer_nr - 1)
+   found_sob_last = 1;
+   }
 
-   found_rfc2822 = is_rfc2822_line(buf + i, k - i - 1);
-   if (found_rfc2822 && sob &&
-   !strncmp(buf + i, sob->buf, sob->len))
-   found_sob = k;
+   trailer_info_release();
 
-   if (!(found_rfc2822 ||
- is_cherry_picked_from_line(buf + i, k - i - 1)))
-   return 0;
-   }
-   if (found_sob == i)
+   if (found_sob_last)
return 3;
if (found_sob)
return 2;
diff --git a/t/t3511-cherry-pick-x.sh b/t/t3511-cherry-pick-x.sh
index 9cce5ae..bf0a5c9 100755
--- a/t/t3511-cherry-pick-x.sh
+++ b/t/t3511-cherry-pick-x.sh
@@ -25,9 +25,8 @@ Signed-off-by: B.U. Thor <but...@example.com>"
 
 mesg_broken_footer="$mesg_no_footer
 
-The signed-off-by string should begin with the words Signed-off-by followed
-by a colon and space, and then the signers name and email address. e.g.
-Signed-off-by: $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL>"
+This is not recognized as a footer because Myfooter is not a recognized token.
+Myfooter: A.U. Thor <aut...@example.com>"
 
 mesg_with_footer_sob="$mesg_with_footer
 Signed-off-by: $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL>"
@@ -112,6 +111,17 @@ test_e

Re: [PATCH 4/4] sequencer: use trailer's trailer layout

2016-11-01 Thread Jonathan Tan

On 10/31/2016 06:11 PM, Junio C Hamano wrote:

Jonathan Tan <jonathanta...@google.com> writes:

diff --git a/t/t4014-format-patch.sh b/t/t4014-format-patch.sh
index ba4902d..635b394 100755
--- a/t/t4014-format-patch.sh
+++ b/t/t4014-format-patch.sh
@@ -1277,8 +1277,7 @@ EOF
 4:Subject: [PATCH] subject
 8:
 9:I want to mention about Signed-off-by: here.
-10:
-11:Signed-off-by: C O Mitter <commit...@example.com>
+10:Signed-off-by: C O Mitter <commit...@example.com>
 EOF
test_cmp expected actual
 '


The original log message is a single-liner subject line, blank, "I
want to mention..." and when asked to append S-o-b:, we would want
to see a blank before the added S-o-b, no?

This seems a bit weird.


This is because the "I want to mention" block has 100% trailer lines 
(since its only line contains a colon). We could forbid spaces in 
trailer field names, but as you said [1], it might be better to allow 
them since users might include them.


The original sequencer.c interpreted this block as not a trailer block, 
because it only accepted alphanumeric characters or '-' before the colon 
(and no spaces) - hence the difference in behavior.


[1] <xmqqbmyhr4vt@gitster.mtv.corp.google.com>


[PATCH 3/4] trailer: have function to describe trailer layout

2016-10-28 Thread Jonathan Tan
Create a function that, taking a string, describes the position of its
trailer block (if available) and the contents thereof, and make trailer
use it. This makes it easier for other Git components, in the future, to
interpret trailer blocks in the same way as trailer.

In a subsequent patch, another component will be made to use this.

Signed-off-by: Jonathan Tan <jonathanta...@google.com>
---
 trailer.c | 120 +++---
 trailer.h |  25 +
 2 files changed, 108 insertions(+), 37 deletions(-)

diff --git a/trailer.c b/trailer.c
index d4d9e10..2f5c815 100644
--- a/trailer.c
+++ b/trailer.c
@@ -46,6 +46,8 @@ static LIST_HEAD(conf_head);
 
 static char *separators = ":";
 
+static int configured = 0;
+
 #define TRAILER_ARG_STRING "$ARG"
 
 static const char *git_generated_prefixes[] = {
@@ -549,6 +551,17 @@ static int git_trailer_config(const char *conf_key, const 
char *value, void *cb)
return 0;
 }
 
+static void ensure_configured(void)
+{
+   if (configured)
+   return;
+
+   /* Default config must be setup first */
+   git_config(git_trailer_default_config, NULL);
+   git_config(git_trailer_config, NULL);
+   configured = 1;
+}
+
 static const char *token_from_item(struct arg_item *item, char *tok)
 {
if (item->conf.key)
@@ -872,59 +885,43 @@ static int process_input_file(FILE *outfile,
  const char *str,
  struct list_head *head)
 {
-   int patch_start, trailer_start, trailer_end;
+   struct trailer_info info;
struct strbuf tok = STRBUF_INIT;
struct strbuf val = STRBUF_INIT;
-   struct trailer_item *last = NULL;
-   struct strbuf *trailer, **trailer_lines, **ptr;
+   int i;
 
-   patch_start = find_patch_start(str);
-   trailer_end = find_trailer_end(str, patch_start);
-   trailer_start = find_trailer_start(str, trailer_end);
+   trailer_info_get(, str);
 
/* Print lines before the trailers as is */
-   fwrite(str, 1, trailer_start, outfile);
+   fwrite(str, 1, info.trailer_start - str, outfile);
 
-   if (!ends_with_blank_line(str, trailer_start))
+   if (!info.blank_line_before_trailer)
fprintf(outfile, "\n");
 
-   /* Parse trailer lines */
-   trailer_lines = strbuf_split_buf(str + trailer_start, 
-trailer_end - trailer_start,
-'\n',
-0);
-   for (ptr = trailer_lines; *ptr; ptr++) {
+   for (i = 0; i < info.trailer_nr; i++) {
int separator_pos;
-   trailer = *ptr;
-   if (trailer->buf[0] == comment_line_char)
+   char *trailer = info.trailers[i];
+   if (trailer[0] == comment_line_char)
continue;
-   if (last && isspace(trailer->buf[0])) {
-   struct strbuf sb = STRBUF_INIT;
-   strbuf_addf(, "%s\n%s", last->value, trailer->buf);
-   strbuf_strip_suffix(, "\n");
-   free(last->value);
-   last->value = strbuf_detach(, NULL);
-   continue;
-   }
-   separator_pos = find_separator(trailer->buf, separators);
+   separator_pos = find_separator(trailer, separators);
if (separator_pos >= 1) {
-   parse_trailer(, , NULL, trailer->buf,
- separator_pos);
-   last = add_trailer_item(head,
-   strbuf_detach(, NULL),
-   strbuf_detach(, NULL));
+   parse_trailer(, , NULL, trailer,
+ separator_pos);
+   add_trailer_item(head,
+strbuf_detach(, NULL),
+strbuf_detach(, NULL));
} else {
-   strbuf_addbuf(, trailer);
+   strbuf_addstr(, trailer);
strbuf_strip_suffix(, "\n");
add_trailer_item(head,
 NULL,
 strbuf_detach(, NULL));
-   last = NULL;
}
}
-   strbuf_list_free(trailer_lines);
 
-   return trailer_end;
+   trailer_info_release();
+
+   return info.trailer_end - str;
 }
 
 static void free_all(struct list_head *head)
@@ -975,9 +972,7 @@ void process_trailers(const char *file, int in_place, int 
trim_empty, struct str
int trailer_end;
FILE *outfile = stdout;
 
-   /* Def

[PATCH 4/4] sequencer: use trailer's trailer layout

2016-10-28 Thread Jonathan Tan
Make sequencer use trailer.c's trailer layout definition, as opposed to
parsing the footer by itself. This makes "commit -s", "cherry-pick -x",
and "format-patch --signoff" consistent with trailer, allowing
non-trailer lines and multiple-line trailers in trailer blocks under
certain conditions, and therefore suppressing the extra newline in those
cases.

Consistency with trailer extends to respecting trailer configs.  Tests
have been included to show that.

Signed-off-by: Jonathan Tan <jonathanta...@google.com>
---
 sequencer.c  | 75 +---
 t/t3511-cherry-pick-x.sh | 16 +--
 t/t4014-format-patch.sh  | 40 +-
 t/t7501-commit.sh| 36 +++
 4 files changed, 96 insertions(+), 71 deletions(-)

diff --git a/sequencer.c b/sequencer.c
index eec8a60..ec0157d 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -15,6 +15,7 @@
 #include "merge-recursive.h"
 #include "refs.h"
 #include "argv-array.h"
+#include "trailer.h"
 
 #define GIT_REFLOG_ACTION "GIT_REFLOG_ACTION"
 
@@ -26,30 +27,6 @@ static GIT_PATH_FUNC(git_path_opts_file, SEQ_OPTS_FILE)
 static GIT_PATH_FUNC(git_path_seq_dir, SEQ_DIR)
 static GIT_PATH_FUNC(git_path_head_file, SEQ_HEAD_FILE)
 
-static int is_rfc2822_line(const char *buf, int len)
-{
-   int i;
-
-   for (i = 0; i < len; i++) {
-   int ch = buf[i];
-   if (ch == ':')
-   return 1;
-   if (!isalnum(ch) && ch != '-')
-   break;
-   }
-
-   return 0;
-}
-
-static int is_cherry_picked_from_line(const char *buf, int len)
-{
-   /*
-* We only care that it looks roughly like (cherry picked from ...)
-*/
-   return len > strlen(cherry_picked_prefix) + 1 &&
-   starts_with(buf, cherry_picked_prefix) && buf[len - 1] == ')';
-}
-
 /*
  * Returns 0 for non-conforming footer
  * Returns 1 for conforming footer
@@ -59,49 +36,25 @@ static int is_cherry_picked_from_line(const char *buf, int 
len)
 static int has_conforming_footer(struct strbuf *sb, struct strbuf *sob,
int ignore_footer)
 {
-   char prev;
-   int i, k;
-   int len = sb->len - ignore_footer;
-   const char *buf = sb->buf;
-   int found_sob = 0;
-
-   /* footer must end with newline */
-   if (!len || buf[len - 1] != '\n')
-   return 0;
+   struct trailer_info info;
+   int i;
+   int found_sob = 0, found_sob_last = 0;
 
-   prev = '\0';
-   for (i = len - 1; i > 0; i--) {
-   char ch = buf[i];
-   if (prev == '\n' && ch == '\n') /* paragraph break */
-   break;
-   prev = ch;
-   }
+   trailer_info_get(, sb->buf);
 
-   /* require at least one blank line */
-   if (prev != '\n' || buf[i] != '\n')
+   if (info.trailer_start == info.trailer_end)
return 0;
 
-   /* advance to start of last paragraph */
-   while (i < len - 1 && buf[i] == '\n')
-   i++;
-
-   for (; i < len; i = k) {
-   int found_rfc2822;
-
-   for (k = i; k < len && buf[k] != '\n'; k++)
-   ; /* do nothing */
-   k++;
+   for (i = 0; i < info.trailer_nr; i++)
+   if (sob && !strncmp(info.trailers[i], sob->buf, sob->len)) {
+   found_sob = 1;
+   if (i == info.trailer_nr - 1)
+   found_sob_last = 1;
+   }
 
-   found_rfc2822 = is_rfc2822_line(buf + i, k - i - 1);
-   if (found_rfc2822 && sob &&
-   !strncmp(buf + i, sob->buf, sob->len))
-   found_sob = k;
+   trailer_info_release();
 
-   if (!(found_rfc2822 ||
- is_cherry_picked_from_line(buf + i, k - i - 1)))
-   return 0;
-   }
-   if (found_sob == i)
+   if (found_sob_last)
return 3;
if (found_sob)
return 2;
diff --git a/t/t3511-cherry-pick-x.sh b/t/t3511-cherry-pick-x.sh
index 9cce5ae..bf0a5c9 100755
--- a/t/t3511-cherry-pick-x.sh
+++ b/t/t3511-cherry-pick-x.sh
@@ -25,9 +25,8 @@ Signed-off-by: B.U. Thor <but...@example.com>"
 
 mesg_broken_footer="$mesg_no_footer
 
-The signed-off-by string should begin with the words Signed-off-by followed
-by a colon and space, and then the signers name and email address. e.g.
-Signed-off-by: $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL>"
+This is not recognized as a footer because Myfooter is not a recognized token.
+Myfooter: A.U. Thor <aut...@example.com>"
 
 mesg_with_footer_sob="$mesg_with_footer
 Signed-off-by: $G

[PATCH 1/4] commit: make ignore_non_trailer take buf/len

2016-10-28 Thread Jonathan Tan
Make ignore_non_trailer take a buf/len pair instead of struct strbuf.

Signed-off-by: Jonathan Tan <jonathanta...@google.com>
---
 builtin/commit.c |  2 +-
 commit.c | 22 +++---
 commit.h |  2 +-
 trailer.c|  2 +-
 4 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index 1cba3b7..c26b939 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -790,7 +790,7 @@ static int prepare_to_commit(const char *index_file, const 
char *prefix,
strbuf_stripspace(, 0);
 
if (signoff)
-   append_signoff(, ignore_non_trailer(), 0);
+   append_signoff(, ignore_non_trailer(sb.buf, sb.len), 0);
 
if (fwrite(sb.buf, 1, sb.len, s->fp) < sb.len)
die_errno(_("could not write commit template"));
diff --git a/commit.c b/commit.c
index 856fd4a..f70835a 100644
--- a/commit.c
+++ b/commit.c
@@ -1649,7 +1649,7 @@ const char *find_commit_header(const char *msg, const 
char *key, size_t *out_len
 }
 
 /*
- * Inspect sb and determine the true "end" of the log message, in
+ * Inspect the given string and determine the true "end" of the log message, in
  * order to find where to put a new Signed-off-by: line.  Ignored are
  * trailing comment lines and blank lines, and also the traditional
  * "Conflicts:" block that is not commented out, so that we can use
@@ -1659,37 +1659,37 @@ const char *find_commit_header(const char *msg, const 
char *key, size_t *out_len
  * Returns the number of bytes from the tail to ignore, to be fed as
  * the second parameter to append_signoff().
  */
-int ignore_non_trailer(struct strbuf *sb)
+int ignore_non_trailer(const char *buf, size_t len)
 {
int boc = 0;
int bol = 0;
int in_old_conflicts_block = 0;
 
-   while (bol < sb->len) {
-   char *next_line;
+   while (bol < len) {
+   const char *next_line;
 
-   if (!(next_line = memchr(sb->buf + bol, '\n', sb->len - bol)))
-   next_line = sb->buf + sb->len;
+   if (!(next_line = memchr(buf + bol, '\n', len - bol)))
+   next_line = buf + len;
else
next_line++;
 
-   if (sb->buf[bol] == comment_line_char || sb->buf[bol] == '\n') {
+   if (buf[bol] == comment_line_char || buf[bol] == '\n') {
/* is this the first of the run of comments? */
if (!boc)
boc = bol;
/* otherwise, it is just continuing */
-   } else if (starts_with(sb->buf + bol, "Conflicts:\n")) {
+   } else if (starts_with(buf + bol, "Conflicts:\n")) {
in_old_conflicts_block = 1;
if (!boc)
boc = bol;
-   } else if (in_old_conflicts_block && sb->buf[bol] == '\t') {
+   } else if (in_old_conflicts_block && buf[bol] == '\t') {
; /* a pathname in the conflicts block */
} else if (boc) {
/* the previous was not trailing comment */
boc = 0;
in_old_conflicts_block = 0;
}
-   bol = next_line - sb->buf;
+   bol = next_line - buf;
}
-   return boc ? sb->len - boc : 0;
+   return boc ? len - boc : 0;
 }
diff --git a/commit.h b/commit.h
index afd14f3..9c12abb 100644
--- a/commit.h
+++ b/commit.h
@@ -355,7 +355,7 @@ extern const char *find_commit_header(const char *msg, 
const char *key,
  size_t *out_len);
 
 /* Find the end of the log message, the right place for a new trailer. */
-extern int ignore_non_trailer(struct strbuf *sb);
+extern int ignore_non_trailer(const char *buf, size_t len);
 
 typedef void (*each_mergetag_fn)(struct commit *commit, struct 
commit_extra_header *extra,
 void *cb_data);
diff --git a/trailer.c b/trailer.c
index d19a92c..6d8375b 100644
--- a/trailer.c
+++ b/trailer.c
@@ -828,7 +828,7 @@ static int find_trailer_end(struct strbuf **lines, int 
patch_start)
 
for (i = 0; i < patch_start; i++)
strbuf_addbuf(, lines[i]);
-   ignore_bytes = ignore_non_trailer();
+   ignore_bytes = ignore_non_trailer(sb.buf, sb.len);
strbuf_release();
for (i = patch_start - 1; i >= 0 && ignore_bytes > 0; i--)
ignore_bytes -= lines[i]->len;
-- 
2.8.0.rc3.226.g39d4020



[PATCH 2/4] trailer: avoid unnecessary splitting on lines

2016-10-28 Thread Jonathan Tan
trailer.c currently splits lines while processing a buffer (and also
rejoins lines when needing to invoke ignore_non_trailer).

Avoid such line splitting, except when generating the strings
corresponding to trailers (for ease of use by clients - a subsequent
patch will allow other components to obtain the layout of a trailer
block in a buffer, including the trailers themselves). The main purpose
of this is to make it easy to return pointers into the original buffer
(for a subsequent patch), but this also significantly reduces the number
of memory allocations required.

Signed-off-by: Jonathan Tan <jonathanta...@google.com>
---
 trailer.c | 215 +-
 1 file changed, 116 insertions(+), 99 deletions(-)

diff --git a/trailer.c b/trailer.c
index 6d8375b..d4d9e10 100644
--- a/trailer.c
+++ b/trailer.c
@@ -102,12 +102,12 @@ static int same_trailer(struct trailer_item *a, struct 
arg_item *b)
return same_token(a, b) && same_value(a, b);
 }
 
-static inline int contains_only_spaces(const char *str)
+static inline int is_blank_line(const char *str)
 {
const char *s = str;
-   while (*s && isspace(*s))
+   while (*s && *s != '\n' && isspace(*s))
s++;
-   return !*s;
+   return !*s || *s == '\n';
 }
 
 static inline void strbuf_replace(struct strbuf *sb, const char *a, const char 
*b)
@@ -566,13 +566,18 @@ static int token_matches_item(const char *tok, struct 
arg_item *item, int tok_le
 }
 
 /*
- * Return the location of the first separator in line, or -1 if there is no
- * separator.
+ * Return the location of the first separator in the given line, or -1 if there
+ * is no separator.
+ *
+ * The interests parameter must contain the acceptable separators and may
+ * contain '\n'. If interests contains '\n', the input line is considered to
+ * end at the first newline encountered. Otherwise, the input line is
+ * considered to end at its null terminator.
  */
-static int find_separator(const char *line, const char *separators)
+static int find_separator(const char *line, const char *interests)
 {
-   int loc = strcspn(line, separators);
-   if (!line[loc])
+   int loc = strcspn(line, interests);
+   if (!line[loc] || line[loc] == '\n')
return -1;
return loc;
 }
@@ -688,51 +693,71 @@ static void process_command_line_args(struct list_head 
*arg_head,
free(cl_separators);
 }
 
-static struct strbuf **read_input_file(const char *file)
+static void read_input_file(struct strbuf *sb, const char *file)
 {
-   struct strbuf **lines;
-   struct strbuf sb = STRBUF_INIT;
-
if (file) {
-   if (strbuf_read_file(, file, 0) < 0)
+   if (strbuf_read_file(sb, file, 0) < 0)
die_errno(_("could not read input file '%s'"), file);
} else {
-   if (strbuf_read(, fileno(stdin), 0) < 0)
+   if (strbuf_read(sb, fileno(stdin), 0) < 0)
die_errno(_("could not read from stdin"));
}
+}
 
-   lines = strbuf_split(, '\n');
+static const char *next_line(const char *str)
+{
+   const char *nl = strchrnul(str, '\n');
+   return nl + !!*nl;
+}
 
-   strbuf_release();
+/*
+ * Return the position of the start of the last line. If len is 0, return -1.
+ */
+static int last_line(const char *buf, size_t len)
+{
+   int i;
+   if (len == 0)
+   return -1;
+   if (len == 1)
+   return 0;
+   /*
+* Skip the last character (in addition to the null terminator),
+* because if the last character is a newline, it is considered as part
+* of the last line anyway.
+*/
+   i = len - 2;
 
-   return lines;
+   for (; i >= 0; i--) {
+   if (buf[i] == '\n')
+   return i + 1;
+   }
+   return 0;
 }
 
 /*
- * Return the (0 based) index of the start of the patch or the line
- * count if there is no patch in the message.
+ * Return the position of the start of the patch or the length of str if there
+ * is no patch in the message.
  */
-static int find_patch_start(struct strbuf **lines, int count)
+static int find_patch_start(const char *str)
 {
-   int i;
+   const char *s;
 
-   /* Get the start of the patch part if any */
-   for (i = 0; i < count; i++) {
-   if (starts_with(lines[i]->buf, "---"))
-   return i;
+   for (s = str; *s; s = next_line(s)) {
+   if (starts_with(s, "---"))
+   return s - str;
}
 
-   return count;
+   return s - str;
 }
 
 /*
- * Return the (0 based) index of the first trailer line or count if
- * there are no trailers. Trailers are searched only in the lines from
- * index (count - 1) down to index 0.
+ * Return the position of the

[PATCH 0/4] Make other git commands use trailer layout

2016-10-28 Thread Jonathan Tan
This is built off jt/trailer-with-cruft (commit 60ef86a).

This patch set makes "commit -s", "cherry-pick -x", and
"format-patch --signoff" use the new trailer definition implemented in
jt/trailer-with-cruft, with some refactoring along the way. With this
patch set, the aforementioned commands would now handle trailers like
those described in [1].

[1] <84f28caa-2e4b-1231-1a76-3b7e765c0...@google.com>

Jonathan Tan (4):
  commit: make ignore_non_trailer take buf/len
  trailer: avoid unnecessary splitting on lines
  trailer: have function to describe trailer layout
  sequencer: use trailer's trailer layout

 builtin/commit.c |   2 +-
 commit.c |  22 ++--
 commit.h |   2 +-
 sequencer.c  |  75 +++-
 t/t3511-cherry-pick-x.sh |  16 ++-
 t/t4014-format-patch.sh  |  40 +--
 t/t7501-commit.sh|  36 ++
 trailer.c| 295 ---
 trailer.h|  25 
 9 files changed, 313 insertions(+), 200 deletions(-)

-- 
2.8.0.rc3.226.g39d4020



Re: What's cooking in git.git (Oct 2016, #07; Wed, 26)

2016-10-26 Thread Jonathan Tan

On 10/26/2016 03:29 PM, Junio C Hamano wrote:

* jt/trailer-with-cruft (2016-10-21) 8 commits
 - trailer: support values folded to multiple lines
 - trailer: forbid leading whitespace in trailers
 - trailer: allow non-trailers in trailer block
 - trailer: clarify failure modes in parse_trailer
 - trailer: make args have their own struct
 - trailer: streamline trailer item create and add
 - trailer: use list.h for doubly-linked list
 - trailer: improve const correctness

 Update "interpret-trailers" machinery and teaches it that people in
 real world write all sorts of crufts in the "trailer" that was
 originally designed to have the neat-o "Mail-Header: like thing"
 and nothing else.

 Waiting for review.


For this patch set, should I look for another reviewer? In the e-mail 
thread [1], you and Stefan seemed positive about this patch set.


(I've also checked [2] that this is not merged yet.)

[1] 
[2] git log --grep='jt/trailer'


[PATCH v5 3/8] trailer: streamline trailer item create and add

2016-10-21 Thread Jonathan Tan
Currently, creation and addition (to a list) of trailer items are spread
across multiple functions. Streamline this by only having 2 functions:
one to parse the user-supplied string, and one to add the parsed
information to a list.

Signed-off-by: Jonathan Tan <jonathanta...@google.com>
---
 trailer.c | 130 +-
 1 file changed, 60 insertions(+), 70 deletions(-)

diff --git a/trailer.c b/trailer.c
index 4e85aae..ae3972a 100644
--- a/trailer.c
+++ b/trailer.c
@@ -500,10 +500,31 @@ static int git_trailer_config(const char *conf_key, const 
char *value, void *cb)
return 0;
 }
 
-static int parse_trailer(struct strbuf *tok, struct strbuf *val, const char 
*trailer)
+static const char *token_from_item(struct trailer_item *item, char *tok)
+{
+   if (item->conf.key)
+   return item->conf.key;
+   if (tok)
+   return tok;
+   return item->conf.name;
+}
+
+static int token_matches_item(const char *tok, struct trailer_item *item, int 
tok_len)
+{
+   if (!strncasecmp(tok, item->conf.name, tok_len))
+   return 1;
+   return item->conf.key ? !strncasecmp(tok, item->conf.key, tok_len) : 0;
+}
+
+static int parse_trailer(struct strbuf *tok, struct strbuf *val,
+const struct conf_info **conf, const char *trailer)
 {
size_t len;
struct strbuf seps = STRBUF_INIT;
+   struct trailer_item *item;
+   int tok_len;
+   struct list_head *pos;
+
strbuf_addstr(, separators);
strbuf_addch(, '=');
len = strcspn(trailer, seps.buf);
@@ -523,74 +544,31 @@ static int parse_trailer(struct strbuf *tok, struct 
strbuf *val, const char *tra
strbuf_addstr(tok, trailer);
strbuf_trim(tok);
}
-   return 0;
-}
-
-static const char *token_from_item(struct trailer_item *item, char *tok)
-{
-   if (item->conf.key)
-   return item->conf.key;
-   if (tok)
-   return tok;
-   return item->conf.name;
-}
-
-static struct trailer_item *new_trailer_item(struct trailer_item *conf_item,
-char *tok, char *val)
-{
-   struct trailer_item *new = xcalloc(sizeof(*new), 1);
-   new->value = val ? val : xstrdup("");
-
-   if (conf_item) {
-   duplicate_conf(>conf, _item->conf);
-   new->token = xstrdup(token_from_item(conf_item, tok));
-   free(tok);
-   } else {
-   duplicate_conf(>conf, _conf_info);
-   new->token = tok;
-   }
-
-   return new;
-}
-
-static int token_matches_item(const char *tok, struct trailer_item *item, int 
tok_len)
-{
-   if (!strncasecmp(tok, item->conf.name, tok_len))
-   return 1;
-   return item->conf.key ? !strncasecmp(tok, item->conf.key, tok_len) : 0;
-}
-
-static struct trailer_item *create_trailer_item(const char *string)
-{
-   struct strbuf tok = STRBUF_INIT;
-   struct strbuf val = STRBUF_INIT;
-   struct trailer_item *item;
-   int tok_len;
-   struct list_head *pos;
-
-   if (parse_trailer(, , string))
-   return NULL;
-
-   tok_len = token_len_without_separator(tok.buf, tok.len);
 
/* Lookup if the token matches something in the config */
+   tok_len = token_len_without_separator(tok->buf, tok->len);
+   *conf = _conf_info;
list_for_each(pos, _head) {
item = list_entry(pos, struct trailer_item, list);
-   if (token_matches_item(tok.buf, item, tok_len))
-   return new_trailer_item(item,
-   strbuf_detach(, NULL),
-   strbuf_detach(, NULL));
+   if (token_matches_item(tok->buf, item, tok_len)) {
+   char *tok_buf = strbuf_detach(tok, NULL);
+   *conf = >conf;
+   strbuf_addstr(tok, token_from_item(item, tok_buf));
+   free(tok_buf);
+   break;
+   }
}
 
-   return new_trailer_item(NULL,
-   strbuf_detach(, NULL),
-   strbuf_detach(, NULL));
+   return 0;
 }
 
-static void add_trailer_item(struct list_head *head, struct trailer_item *new)
+static void add_trailer_item(struct list_head *head, char *tok, char *val,
+const struct conf_info *conf)
 {
-   if (!new)
-   return;
+   struct trailer_item *new = xcalloc(sizeof(*new), 1);
+   new->token = tok;
+   new->value = val;
+   duplicate_conf(>conf, conf);
list_add_tail(>list, head);
 }
 
@@ -599,21 +577,28 @@ static void process_command_line_args(struct list_head 
*arg_head,
 {
struct string_list_item *

[PATCH v5 8/8] trailer: support values folded to multiple lines

2016-10-21 Thread Jonathan Tan
Currently, interpret-trailers requires that a trailer be only on 1 line.
For example:

a: first line
   second line

would be interpreted as one trailer line followed by one non-trailer line.

Make interpret-trailers support RFC 822-style folding, treating those
lines as one logical trailer.

Signed-off-by: Jonathan Tan <jonathanta...@google.com>
---
 Documentation/git-interpret-trailers.txt |   7 +-
 t/t7513-interpret-trailers.sh| 169 +++
 trailer.c|  45 ++--
 3 files changed, 211 insertions(+), 10 deletions(-)

diff --git a/Documentation/git-interpret-trailers.txt 
b/Documentation/git-interpret-trailers.txt
index 4966b5b..e99bda6 100644
--- a/Documentation/git-interpret-trailers.txt
+++ b/Documentation/git-interpret-trailers.txt
@@ -57,11 +57,12 @@ minus signs start the patch part of the message.
 
 When reading trailers, there can be whitespaces after the
 token, the separator and the value. There can also be whitespaces
-inside the token and the value.
+inside the token and the value. The value may be split over multiple lines with
+each subsequent line starting with whitespace, like the "folding" in RFC 822.
 
 Note that 'trailers' do not follow and are not intended to follow many
-rules for RFC 822 headers. For example they do not follow the line
-folding rules, the encoding rules and probably many other rules.
+rules for RFC 822 headers. For example they do not follow
+the encoding rules and probably many other rules.
 
 OPTIONS
 ---
diff --git a/t/t7513-interpret-trailers.sh b/t/t7513-interpret-trailers.sh
index 3d94b3a..4dd1d7c 100755
--- a/t/t7513-interpret-trailers.sh
+++ b/t/t7513-interpret-trailers.sh
@@ -256,6 +256,175 @@ test_expect_success 'line with leading whitespace is not 
trailer' '
test_cmp expected actual
 '
 
+test_expect_success 'multiline field treated as one trailer for 25% check' '
+   q_to_tab >patch <<-\EOF &&
+
+   Signed-off-by: a <a...@example.com>
+   name: value on
+   Qmultiple lines
+   this is not a trailer
+   this is not a trailer
+   this is not a trailer
+   this is not a trailer
+   this is not a trailer
+   this is not a trailer
+   EOF
+   q_to_tab >expected <<-\EOF &&
+
+   Signed-off-by: a <a...@example.com>
+   name: value on
+   Qmultiple lines
+   this is not a trailer
+   this is not a trailer
+   this is not a trailer
+   this is not a trailer
+   this is not a trailer
+   this is not a trailer
+   name: value
+   EOF
+   git interpret-trailers --trailer "name: value" patch >actual &&
+   test_cmp expected actual
+'
+
+test_expect_success 'multiline field treated as atomic for placement' '
+   q_to_tab >patch <<-\EOF &&
+
+   another: trailer
+   name: value on
+   Qmultiple lines
+   another: trailer
+   EOF
+   q_to_tab >expected <<-\EOF &&
+
+   another: trailer
+   name: value on
+   Qmultiple lines
+   name: value
+   another: trailer
+   EOF
+   test_config trailer.name.where after &&
+   git interpret-trailers --trailer "name: value" patch >actual &&
+   test_cmp expected actual
+'
+
+test_expect_success 'multiline field treated as atomic for replacement' '
+   q_to_tab >patch <<-\EOF &&
+
+   another: trailer
+   name: value on
+   Qmultiple lines
+   another: trailer
+   EOF
+   q_to_tab >expected <<-\EOF &&
+
+   another: trailer
+   another: trailer
+   name: value
+   EOF
+   test_config trailer.name.ifexists replace &&
+   git interpret-trailers --trailer "name: value" patch >actual &&
+   test_cmp expected actual
+'
+
+test_expect_success 'multiline field treated as atomic for difference check' '
+   q_to_tab >patch <<-\EOF &&
+
+   another: trailer
+   name: first line
+   Qsecond line
+   another: trailer
+   EOF
+   test_config trailer.name.ifexists addIfDifferent &&
+
+   q_to_tab >trailer <<-\EOF &&
+   name: first line
+   Qsecond line
+   EOF
+   q_to_tab >expected <<-\EOF &&
+
+   another: trailer
+   name: first line
+   Qsecond line
+   another: trailer
+   EOF
+   git interpret-trailers --trailer "$(cat trailer)&quo

[PATCH v5 2/8] trailer: use list.h for doubly-linked list

2016-10-21 Thread Jonathan Tan
Replace the existing handwritten implementation of a doubly-linked list
in trailer.c with the functions and macros from list.h. This
significantly simplifies the code.

Signed-off-by: Jonathan Tan <jonathanta...@google.com>
Signed-off-by: Ramsay Jones <ram...@ramsayjones.plus.com>
---
 trailer.c | 258 ++
 1 file changed, 91 insertions(+), 167 deletions(-)

diff --git a/trailer.c b/trailer.c
index 1f191b2..4e85aae 100644
--- a/trailer.c
+++ b/trailer.c
@@ -4,6 +4,7 @@
 #include "commit.h"
 #include "tempfile.h"
 #include "trailer.h"
+#include "list.h"
 /*
  * Copyright (c) 2013, 2014 Christian Couder <chrisc...@tuxfamily.org>
  */
@@ -25,19 +26,24 @@ struct conf_info {
 static struct conf_info default_conf_info;
 
 struct trailer_item {
-   struct trailer_item *previous;
-   struct trailer_item *next;
+   struct list_head list;
char *token;
char *value;
struct conf_info conf;
 };
 
-static struct trailer_item *first_conf_item;
+static LIST_HEAD(conf_head);
 
 static char *separators = ":";
 
 #define TRAILER_ARG_STRING "$ARG"
 
+/* Iterate over the elements of the list. */
+#define list_for_each_dir(pos, head, is_reverse) \
+   for (pos = is_reverse ? (head)->prev : (head)->next; \
+   pos != (head); \
+   pos = is_reverse ? pos->prev : pos->next)
+
 static int after_or_end(enum action_where where)
 {
return (where == WHERE_AFTER) || (where == WHERE_END);
@@ -120,101 +126,49 @@ static void print_tok_val(FILE *outfile, const char 
*tok, const char *val)
fprintf(outfile, "%s%c %s\n", tok, separators[0], val);
 }
 
-static void print_all(FILE *outfile, struct trailer_item *first, int 
trim_empty)
+static void print_all(FILE *outfile, struct list_head *head, int trim_empty)
 {
+   struct list_head *pos;
struct trailer_item *item;
-   for (item = first; item; item = item->next) {
+   list_for_each(pos, head) {
+   item = list_entry(pos, struct trailer_item, list);
if (!trim_empty || strlen(item->value) > 0)
print_tok_val(outfile, item->token, item->value);
}
 }
 
-static void update_last(struct trailer_item **last)
-{
-   if (*last)
-   while ((*last)->next != NULL)
-   *last = (*last)->next;
-}
-
-static void update_first(struct trailer_item **first)
-{
-   if (*first)
-   while ((*first)->previous != NULL)
-   *first = (*first)->previous;
-}
-
 static void add_arg_to_input_list(struct trailer_item *on_tok,
- struct trailer_item *arg_tok,
- struct trailer_item **first,
- struct trailer_item **last)
-{
-   if (after_or_end(arg_tok->conf.where)) {
-   arg_tok->next = on_tok->next;
-   on_tok->next = arg_tok;
-   arg_tok->previous = on_tok;
-   if (arg_tok->next)
-   arg_tok->next->previous = arg_tok;
-   update_last(last);
-   } else {
-   arg_tok->previous = on_tok->previous;
-   on_tok->previous = arg_tok;
-   arg_tok->next = on_tok;
-   if (arg_tok->previous)
-   arg_tok->previous->next = arg_tok;
-   update_first(first);
-   }
+ struct trailer_item *arg_tok)
+{
+   if (after_or_end(arg_tok->conf.where))
+   list_add(_tok->list, _tok->list);
+   else
+   list_add_tail(_tok->list, _tok->list);
 }
 
 static int check_if_different(struct trailer_item *in_tok,
  struct trailer_item *arg_tok,
- int check_all)
+ int check_all,
+ struct list_head *head)
 {
enum action_where where = arg_tok->conf.where;
+   struct list_head *next_head;
do {
-   if (!in_tok)
-   return 1;
if (same_trailer(in_tok, arg_tok))
return 0;
/*
 * if we want to add a trailer after another one,
 * we have to check those before this one
 */
-   in_tok = after_or_end(where) ? in_tok->previous : in_tok->next;
+   next_head = after_or_end(where) ? in_tok->list.prev
+   : in_tok->list.next;
+   if (next_head == head)
+   break;
+   in_tok = list_entry(next_head, struct trailer_item, list);
} while (check_all);
return 1;
 }
 
-static void 

[PATCH v5 7/8] trailer: forbid leading whitespace in trailers

2016-10-21 Thread Jonathan Tan
Currently, interpret-trailers allows leading whitespace in trailer
lines. This leads to false positives, especially for quoted lines or
bullet lists.

Forbid leading whitespace in trailers.

Signed-off-by: Jonathan Tan <jonathanta...@google.com>
---
 Documentation/git-interpret-trailers.txt |  2 +-
 t/t7513-interpret-trailers.sh| 15 +++
 trailer.c|  2 +-
 3 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-interpret-trailers.txt 
b/Documentation/git-interpret-trailers.txt
index cf4c5ea..4966b5b 100644
--- a/Documentation/git-interpret-trailers.txt
+++ b/Documentation/git-interpret-trailers.txt
@@ -55,7 +55,7 @@ The group must either be at the end of the message or be the 
last
 non-whitespace lines before a line that starts with '---'. Such three
 minus signs start the patch part of the message.
 
-When reading trailers, there can be whitespaces before and after the
+When reading trailers, there can be whitespaces after the
 token, the separator and the value. There can also be whitespaces
 inside the token and the value.
 
diff --git a/t/t7513-interpret-trailers.sh b/t/t7513-interpret-trailers.sh
index 003e90f..3d94b3a 100755
--- a/t/t7513-interpret-trailers.sh
+++ b/t/t7513-interpret-trailers.sh
@@ -241,6 +241,21 @@ test_expect_success 'with non-trailer lines only' '
test_cmp expected actual
 '
 
+test_expect_success 'line with leading whitespace is not trailer' '
+   q_to_tab >patch <<-\EOF &&
+
+   Qtoken: value
+   EOF
+   q_to_tab >expected <<-\EOF &&
+
+   Qtoken: value
+
+   token: value
+   EOF
+   git interpret-trailers --trailer "token: value" patch >actual &&
+   test_cmp expected actual
+'
+
 test_expect_success 'with config setup' '
git config trailer.ack.key "Acked-by: " &&
cat >expected <<-\EOF &&
diff --git a/trailer.c b/trailer.c
index 199f86a..d978437 100644
--- a/trailer.c
+++ b/trailer.c
@@ -775,7 +775,7 @@ static int find_trailer_start(struct strbuf **lines, int 
count)
}
 
separator_pos = find_separator(lines[start]->buf);
-   if (separator_pos >= 1) {
+   if (separator_pos >= 1 && !isspace(lines[start]->buf[0])) {
struct list_head *pos;
 
trailer_lines++;
-- 
2.8.0.rc3.226.g39d4020



[PATCH v5 6/8] trailer: allow non-trailers in trailer block

2016-10-21 Thread Jonathan Tan
Currently, interpret-trailers requires all lines of a trailer block to
be trailers (or comments) - if not it would not identify that block as a
trailer block, and thus create its own trailer block, inserting a blank
line.  For example:

  echo -e "\nSigned-off-by: x\nnot trailer" |
  git interpret-trailers --trailer "c: d"

would result in:

  Signed-off-by: x
  not trailer

  c: d

Relax the definition of a trailer block to require that the trailers (i)
are all trailers, or (ii) contain at least one Git-generated trailer and
consists of at least 25% trailers.

  Signed-off-by: x
  not trailer
  c: d

(i) is the existing functionality. (ii) allows arbitrary lines to be
included in trailer blocks, like those in [1], and still allow
interpret-trailers to be used.

[1]
https://kernel.googlesource.com/pub/scm/linux/kernel/git/stable/linux-stable/+/e7d316a02f683864a12389f8808570e37fb90aa3

Signed-off-by: Jonathan Tan <jonathanta...@google.com>
---
 Documentation/git-interpret-trailers.txt |   5 +-
 t/t7513-interpret-trailers.sh| 115 +++
 trailer.c|  89 
 3 files changed, 194 insertions(+), 15 deletions(-)

diff --git a/Documentation/git-interpret-trailers.txt 
b/Documentation/git-interpret-trailers.txt
index 93d1db6..cf4c5ea 100644
--- a/Documentation/git-interpret-trailers.txt
+++ b/Documentation/git-interpret-trailers.txt
@@ -48,8 +48,9 @@ with only spaces at the end of the commit message part, one 
blank line
 will be added before the new trailer.
 
 Existing trailers are extracted from the input message by looking for
-a group of one or more lines that contain a colon (by default), where
-the group is preceded by one or more empty (or whitespace-only) lines.
+a group of one or more lines that (i) are all trailers, or (ii) contains at
+least one Git-generated trailer and consists of at least 25% trailers.
+The group must be preceded by one or more empty (or whitespace-only) lines.
 The group must either be at the end of the message or be the last
 non-whitespace lines before a line that starts with '---'. Such three
 minus signs start the patch part of the message.
diff --git a/t/t7513-interpret-trailers.sh b/t/t7513-interpret-trailers.sh
index aee785c..003e90f 100755
--- a/t/t7513-interpret-trailers.sh
+++ b/t/t7513-interpret-trailers.sh
@@ -126,6 +126,121 @@ test_expect_success 'with multiline title in the message' 
'
test_cmp expected actual
 '
 
+test_expect_success 'with non-trailer lines mixed with Signed-off-by' '
+   cat >patch <<-\EOF &&
+
+   this is not a trailer
+   this is not a trailer
+   Signed-off-by: a <a...@example.com>
+   this is not a trailer
+   EOF
+   cat >expected <<-\EOF &&
+
+   this is not a trailer
+   this is not a trailer
+   Signed-off-by: a <a...@example.com>
+   this is not a trailer
+   token: value
+   EOF
+   git interpret-trailers --trailer "token: value" patch >actual &&
+   test_cmp expected actual
+'
+
+test_expect_success 'with non-trailer lines mixed with cherry picked from' '
+   cat >patch <<-\EOF &&
+
+   this is not a trailer
+   this is not a trailer
+   (cherry picked from commit x)
+   this is not a trailer
+   EOF
+   cat >expected <<-\EOF &&
+
+   this is not a trailer
+   this is not a trailer
+   (cherry picked from commit x)
+   this is not a trailer
+   token: value
+   EOF
+   git interpret-trailers --trailer "token: value" patch >actual &&
+   test_cmp expected actual
+'
+
+test_expect_success 'with non-trailer lines mixed with a configured trailer' '
+   cat >patch <<-\EOF &&
+
+   this is not a trailer
+   this is not a trailer
+   My-trailer: x
+   this is not a trailer
+   EOF
+   cat >expected <<-\EOF &&
+
+   this is not a trailer
+   this is not a trailer
+   My-trailer: x
+   this is not a trailer
+   token: value
+   EOF
+   test_config trailer.my.key "My-trailer: " &&
+   git interpret-trailers --trailer "token: value" patch >actual &&
+   test_cmp expected actual
+'
+
+test_expect_success 'with non-trailer lines mixed with a non-configured 
trailer' '
+   cat >patch <<-\EOF &&
+
+   this is not a trailer
+   this is not a trailer
+   I-am-not-configured: x
+   this is not a trailer
+   EOF
+   cat >expected <<-\EOF &&
+
+  

[PATCH v5 0/8] allow non-trailers and multiple-line trailers

2016-10-21 Thread Jonathan Tan
I've updated patch 5/8 to use strcspn and to pass in the list of
separators, meaning that we no longer accept '=' in file input (and also
updated its commit message accordingly).

We also discussed inlining find_separator, but after looking at the
code, I think that it is more convenient if find_separator returns -1
when there is no separator, because of the 3 times it is used (as
of 8/8), it is checked twice with '>= 1' (since both "no separator" and
"string begins with separator" are handled in the same way - treating
them as a non-trailer line). So I have left it as its own function.

No other updates.

Jonathan Tan (8):
  trailer: improve const correctness
  trailer: use list.h for doubly-linked list
  trailer: streamline trailer item create and add
  trailer: make args have their own struct
  trailer: clarify failure modes in parse_trailer
  trailer: allow non-trailers in trailer block
  trailer: forbid leading whitespace in trailers
  trailer: support values folded to multiple lines

 Documentation/git-interpret-trailers.txt |  14 +-
 t/t7513-interpret-trailers.sh| 299 +++
 trailer.c| 620 +--
 3 files changed, 654 insertions(+), 279 deletions(-)

-- 
2.8.0.rc3.226.g39d4020



[PATCH v5 1/8] trailer: improve const correctness

2016-10-21 Thread Jonathan Tan
Change "const char *" to "char *" in struct trailer_item and in the
return value of apply_command (since those strings are owned strings).

Change "struct conf_info *" to "const struct conf_info *" (since that
struct is not modified).

Signed-off-by: Jonathan Tan <jonathanta...@google.com>
---
 trailer.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/trailer.c b/trailer.c
index c6ea9ac..1f191b2 100644
--- a/trailer.c
+++ b/trailer.c
@@ -27,8 +27,8 @@ static struct conf_info default_conf_info;
 struct trailer_item {
struct trailer_item *previous;
struct trailer_item *next;
-   const char *token;
-   const char *value;
+   char *token;
+   char *value;
struct conf_info conf;
 };
 
@@ -95,8 +95,8 @@ static void free_trailer_item(struct trailer_item *item)
free(item->conf.name);
free(item->conf.key);
free(item->conf.command);
-   free((char *)item->token);
-   free((char *)item->value);
+   free(item->token);
+   free(item->value);
free(item);
 }
 
@@ -215,13 +215,13 @@ static struct trailer_item *remove_first(struct 
trailer_item **first)
return item;
 }
 
-static const char *apply_command(const char *command, const char *arg)
+static char *apply_command(const char *command, const char *arg)
 {
struct strbuf cmd = STRBUF_INIT;
struct strbuf buf = STRBUF_INIT;
struct child_process cp = CHILD_PROCESS_INIT;
const char *argv[] = {NULL, NULL};
-   const char *result;
+   char *result;
 
strbuf_addstr(, command);
if (arg)
@@ -425,7 +425,7 @@ static int set_if_missing(struct conf_info *item, const 
char *value)
return 0;
 }
 
-static void duplicate_conf(struct conf_info *dst, struct conf_info *src)
+static void duplicate_conf(struct conf_info *dst, const struct conf_info *src)
 {
*dst = *src;
if (src->name)
-- 
2.8.0.rc3.226.g39d4020



[PATCH v5 5/8] trailer: clarify failure modes in parse_trailer

2016-10-21 Thread Jonathan Tan
The parse_trailer function has a few modes of operation, all depending
on whether the separator is present in its input, and if yes, the
separator's position. Some of these modes are failure modes, and these
failure modes are handled differently depending on whether the trailer
line was sourced from a file or from a command-line argument.

Extract a function to find the separator, allowing the invokers of
parse_trailer to determine how to handle the failure modes instead of
making parse_trailer do it. In this function, also take in the list of
separators, so that we can distinguish between command line arguments
(which allow '=' as separator) and file input (which does not allow '='
as separator).

Signed-off-by: Jonathan Tan <jonathanta...@google.com>
---
 trailer.c | 75 ---
 1 file changed, 53 insertions(+), 22 deletions(-)

diff --git a/trailer.c b/trailer.c
index 99018f8..aff858b 100644
--- a/trailer.c
+++ b/trailer.c
@@ -543,29 +543,37 @@ static int token_matches_item(const char *tok, struct 
arg_item *item, int tok_le
return item->conf.key ? !strncasecmp(tok, item->conf.key, tok_len) : 0;
 }
 
-static int parse_trailer(struct strbuf *tok, struct strbuf *val,
-const struct conf_info **conf, const char *trailer)
+/*
+ * Return the location of the first separator in line, or -1 if there is no
+ * separator.
+ */
+static int find_separator(const char *line, const char *separators)
+{
+   int loc = strcspn(line, separators);
+   if (!line[loc])
+   return -1;
+   return loc;
+}
+
+/*
+ * Obtain the token, value, and conf from the given trailer.
+ *
+ * separator_pos must not be 0, since the token cannot be an empty string.
+ *
+ * If separator_pos is -1, interpret the whole trailer as a token.
+ */
+static void parse_trailer(struct strbuf *tok, struct strbuf *val,
+const struct conf_info **conf, const char *trailer,
+int separator_pos)
 {
-   size_t len;
-   struct strbuf seps = STRBUF_INIT;
struct arg_item *item;
int tok_len;
struct list_head *pos;
 
-   strbuf_addstr(, separators);
-   strbuf_addch(, '=');
-   len = strcspn(trailer, seps.buf);
-   strbuf_release();
-   if (len == 0) {
-   int l = strlen(trailer);
-   while (l > 0 && isspace(trailer[l - 1]))
-   l--;
-   return error(_("empty trailer token in trailer '%.*s'"), l, 
trailer);
-   }
-   if (len < strlen(trailer)) {
-   strbuf_add(tok, trailer, len);
+   if (separator_pos != -1) {
+   strbuf_add(tok, trailer, separator_pos);
strbuf_trim(tok);
-   strbuf_addstr(val, trailer + len + 1);
+   strbuf_addstr(val, trailer + separator_pos + 1);
strbuf_trim(val);
} else {
strbuf_addstr(tok, trailer);
@@ -587,8 +595,6 @@ static int parse_trailer(struct strbuf *tok, struct strbuf 
*val,
break;
}
}
-
-   return 0;
 }
 
 static void add_trailer_item(struct list_head *head, char *tok, char *val)
@@ -619,6 +625,12 @@ static void process_command_line_args(struct list_head 
*arg_head,
const struct conf_info *conf;
struct list_head *pos;
 
+   /*
+* In command-line arguments, '=' is accepted (in addition to the
+* separators that are defined).
+*/
+   char *cl_separators = xstrfmt("=%s", separators);
+
/* Add an arg item for each configured trailer with a command */
list_for_each(pos, _head) {
item = list_entry(pos, struct arg_item, list);
@@ -631,12 +643,25 @@ static void process_command_line_args(struct list_head 
*arg_head,
 
/* Add an arg item for each trailer on the command line */
for_each_string_list_item(tr, trailers) {
-   if (!parse_trailer(, , , tr->string))
+   int separator_pos = find_separator(tr->string, cl_separators);
+   if (separator_pos == 0) {
+   struct strbuf sb = STRBUF_INIT;
+   strbuf_addstr(, tr->string);
+   strbuf_trim();
+   error(_("empty trailer token in trailer '%.*s'"),
+ (int) sb.len, sb.buf);
+   strbuf_release();
+   } else {
+   parse_trailer(, , , tr->string,
+ separator_pos);
add_arg_item(arg_head,
 strbuf_detach(, NULL),
 strbuf_detach(, NULL),
 conf);
+   }
}
+
+   free(cl_separators);
 }
 
 static struct strbuf **read_input_file(const 

[PATCH v5 4/8] trailer: make args have their own struct

2016-10-21 Thread Jonathan Tan
Improve type safety by making arguments (whether from configuration or
from the command line) have their own "struct arg_item" type, separate
from the "struct trailer_item" type used to represent the trailers in
the buffer being manipulated.

This change also prepares "struct trailer_item" to be further
differentiated from "struct arg_item" in future patches.

Signed-off-by: Jonathan Tan <jonathanta...@google.com>
---
 trailer.c | 135 +++---
 1 file changed, 85 insertions(+), 50 deletions(-)

diff --git a/trailer.c b/trailer.c
index ae3972a..99018f8 100644
--- a/trailer.c
+++ b/trailer.c
@@ -29,6 +29,12 @@ struct trailer_item {
struct list_head list;
char *token;
char *value;
+};
+
+struct arg_item {
+   struct list_head list;
+   char *token;
+   char *value;
struct conf_info conf;
 };
 
@@ -62,7 +68,7 @@ static size_t token_len_without_separator(const char *token, 
size_t len)
return len;
 }
 
-static int same_token(struct trailer_item *a, struct trailer_item *b)
+static int same_token(struct trailer_item *a, struct arg_item *b)
 {
size_t a_len = token_len_without_separator(a->token, strlen(a->token));
size_t b_len = token_len_without_separator(b->token, strlen(b->token));
@@ -71,12 +77,12 @@ static int same_token(struct trailer_item *a, struct 
trailer_item *b)
return !strncasecmp(a->token, b->token, min_len);
 }
 
-static int same_value(struct trailer_item *a, struct trailer_item *b)
+static int same_value(struct trailer_item *a, struct arg_item *b)
 {
return !strcasecmp(a->value, b->value);
 }
 
-static int same_trailer(struct trailer_item *a, struct trailer_item *b)
+static int same_trailer(struct trailer_item *a, struct arg_item *b)
 {
return same_token(a, b) && same_value(a, b);
 }
@@ -98,6 +104,13 @@ static inline void strbuf_replace(struct strbuf *sb, const 
char *a, const char *
 
 static void free_trailer_item(struct trailer_item *item)
 {
+   free(item->token);
+   free(item->value);
+   free(item);
+}
+
+static void free_arg_item(struct arg_item *item)
+{
free(item->conf.name);
free(item->conf.key);
free(item->conf.command);
@@ -137,17 +150,29 @@ static void print_all(FILE *outfile, struct list_head 
*head, int trim_empty)
}
 }
 
+static struct trailer_item *trailer_from_arg(struct arg_item *arg_tok)
+{
+   struct trailer_item *new = xcalloc(sizeof(*new), 1);
+   new->token = arg_tok->token;
+   new->value = arg_tok->value;
+   arg_tok->token = arg_tok->value = NULL;
+   free_arg_item(arg_tok);
+   return new;
+}
+
 static void add_arg_to_input_list(struct trailer_item *on_tok,
- struct trailer_item *arg_tok)
+ struct arg_item *arg_tok)
 {
-   if (after_or_end(arg_tok->conf.where))
-   list_add(_tok->list, _tok->list);
+   int aoe = after_or_end(arg_tok->conf.where);
+   struct trailer_item *to_add = trailer_from_arg(arg_tok);
+   if (aoe)
+   list_add(_add->list, _tok->list);
else
-   list_add_tail(_tok->list, _tok->list);
+   list_add_tail(_add->list, _tok->list);
 }
 
 static int check_if_different(struct trailer_item *in_tok,
- struct trailer_item *arg_tok,
+ struct arg_item *arg_tok,
  int check_all,
  struct list_head *head)
 {
@@ -200,7 +225,7 @@ static char *apply_command(const char *command, const char 
*arg)
return result;
 }
 
-static void apply_item_command(struct trailer_item *in_tok, struct 
trailer_item *arg_tok)
+static void apply_item_command(struct trailer_item *in_tok, struct arg_item 
*arg_tok)
 {
if (arg_tok->conf.command) {
const char *arg;
@@ -218,13 +243,13 @@ static void apply_item_command(struct trailer_item 
*in_tok, struct trailer_item
 }
 
 static void apply_arg_if_exists(struct trailer_item *in_tok,
-   struct trailer_item *arg_tok,
+   struct arg_item *arg_tok,
struct trailer_item *on_tok,
struct list_head *head)
 {
switch (arg_tok->conf.if_exists) {
case EXISTS_DO_NOTHING:
-   free_trailer_item(arg_tok);
+   free_arg_item(arg_tok);
break;
case EXISTS_REPLACE:
apply_item_command(in_tok, arg_tok);
@@ -241,39 +266,41 @@ static void apply_arg_if_exists(struct trailer_item 
*in_tok,
if (check_if_different(in_tok, arg_tok, 1, head))
add_arg_to_input_list(on_tok, arg_tok);
 

Re: [PATCH v4 5/8] trailer: clarify failure modes in parse_trailer

2016-10-20 Thread Jonathan Tan

On 10/20/2016 03:45 PM, Junio C Hamano wrote:

Jonathan Tan <jonathanta...@google.com> writes:


If we do that, there is also the necessity of creating a string that
combines the separators and '=' (I guess '\n' is not necessary now,
since all the lines are null terminated). I'm OK either way.

(We could cache that string, although I would think that if we did
that, we might as well write the loop manually, like in this patch.)


I wonder if there is a legit reason to look for '=' in the first
place.  "Signed-off-by= Jonathan Tan <j...@my.home>" does not look
like a valid trailer line to me.

Isn't that a remnant of lazy coding in the original that tried to
share a single parser for contents and command line options or
something?


That is true - I think we can take the allowed separators as an argument 
(meaning that we can have different behavior for file parsing and 
command line parsing), and since we already have that string, we can use 
strcspn. I'll try this out in the next reroll.


Re: [PATCH v4 5/8] trailer: clarify failure modes in parse_trailer

2016-10-20 Thread Jonathan Tan

On 10/20/2016 03:40 PM, Jonathan Tan wrote:

On 10/20/2016 03:14 PM, Junio C Hamano wrote:

Stefan Beller <sbel...@google.com> writes:


+static int find_separator(const char *line)
+{
+   const char *c;
+   for (c = line; ; c++) {
+   if (!*c || *c == '\n')
+   return -1;
+   if (*c == '=' || strchr(separators, *c))
+   return c - line;
+   }


I was about to suggest this function can be simplified and maybe
even inlined by the use of strspn or strcspn, but I think manual
processing of the string is fine, too, as it would not really be
shorter.


Hmm, I fear that iterating over a line one-byte-at-a-time and
running strchr(separators, *c) on it for each byte has a performance
implication over running a single call to strcspn(line, separators).


If we do that, there is also the necessity of creating a string that
combines the separators and '=' (I guess '\n' is not necessary now,
since all the lines are null terminated). I'm OK either way.

(We could cache that string, although I would think that if we did that,
we might as well write the loop manually, like in this patch.)


Actually I guess we could generate the separators_and_equal string 
whenever we obtain new separators from the config. I'll do this in the 
next reroll.


Re: [PATCH v4 5/8] trailer: clarify failure modes in parse_trailer

2016-10-20 Thread Jonathan Tan

On 10/20/2016 03:14 PM, Junio C Hamano wrote:

Stefan Beller  writes:


+static int find_separator(const char *line)
+{
+   const char *c;
+   for (c = line; ; c++) {
+   if (!*c || *c == '\n')
+   return -1;
+   if (*c == '=' || strchr(separators, *c))
+   return c - line;
+   }


I was about to suggest this function can be simplified and maybe
even inlined by the use of strspn or strcspn, but I think manual
processing of the string is fine, too, as it would not really be shorter.


Hmm, I fear that iterating over a line one-byte-at-a-time and
running strchr(separators, *c) on it for each byte has a performance
implication over running a single call to strcspn(line, separators).


If we do that, there is also the necessity of creating a string that 
combines the separators and '=' (I guess '\n' is not necessary now, 
since all the lines are null terminated). I'm OK either way.


(We could cache that string, although I would think that if we did that, 
we might as well write the loop manually, like in this patch.)


[PATCH v4 4/8] trailer: make args have their own struct

2016-10-20 Thread Jonathan Tan
Improve type safety by making arguments (whether from configuration or
from the command line) have their own "struct arg_item" type, separate
from the "struct trailer_item" type used to represent the trailers in
the buffer being manipulated.

This change also prepares "struct trailer_item" to be further
differentiated from "struct arg_item" in future patches.

Signed-off-by: Jonathan Tan <jonathanta...@google.com>
---
 trailer.c | 135 +++---
 1 file changed, 85 insertions(+), 50 deletions(-)

diff --git a/trailer.c b/trailer.c
index ae3972a..99018f8 100644
--- a/trailer.c
+++ b/trailer.c
@@ -29,6 +29,12 @@ struct trailer_item {
struct list_head list;
char *token;
char *value;
+};
+
+struct arg_item {
+   struct list_head list;
+   char *token;
+   char *value;
struct conf_info conf;
 };
 
@@ -62,7 +68,7 @@ static size_t token_len_without_separator(const char *token, 
size_t len)
return len;
 }
 
-static int same_token(struct trailer_item *a, struct trailer_item *b)
+static int same_token(struct trailer_item *a, struct arg_item *b)
 {
size_t a_len = token_len_without_separator(a->token, strlen(a->token));
size_t b_len = token_len_without_separator(b->token, strlen(b->token));
@@ -71,12 +77,12 @@ static int same_token(struct trailer_item *a, struct 
trailer_item *b)
return !strncasecmp(a->token, b->token, min_len);
 }
 
-static int same_value(struct trailer_item *a, struct trailer_item *b)
+static int same_value(struct trailer_item *a, struct arg_item *b)
 {
return !strcasecmp(a->value, b->value);
 }
 
-static int same_trailer(struct trailer_item *a, struct trailer_item *b)
+static int same_trailer(struct trailer_item *a, struct arg_item *b)
 {
return same_token(a, b) && same_value(a, b);
 }
@@ -98,6 +104,13 @@ static inline void strbuf_replace(struct strbuf *sb, const 
char *a, const char *
 
 static void free_trailer_item(struct trailer_item *item)
 {
+   free(item->token);
+   free(item->value);
+   free(item);
+}
+
+static void free_arg_item(struct arg_item *item)
+{
free(item->conf.name);
free(item->conf.key);
free(item->conf.command);
@@ -137,17 +150,29 @@ static void print_all(FILE *outfile, struct list_head 
*head, int trim_empty)
}
 }
 
+static struct trailer_item *trailer_from_arg(struct arg_item *arg_tok)
+{
+   struct trailer_item *new = xcalloc(sizeof(*new), 1);
+   new->token = arg_tok->token;
+   new->value = arg_tok->value;
+   arg_tok->token = arg_tok->value = NULL;
+   free_arg_item(arg_tok);
+   return new;
+}
+
 static void add_arg_to_input_list(struct trailer_item *on_tok,
- struct trailer_item *arg_tok)
+ struct arg_item *arg_tok)
 {
-   if (after_or_end(arg_tok->conf.where))
-   list_add(_tok->list, _tok->list);
+   int aoe = after_or_end(arg_tok->conf.where);
+   struct trailer_item *to_add = trailer_from_arg(arg_tok);
+   if (aoe)
+   list_add(_add->list, _tok->list);
else
-   list_add_tail(_tok->list, _tok->list);
+   list_add_tail(_add->list, _tok->list);
 }
 
 static int check_if_different(struct trailer_item *in_tok,
- struct trailer_item *arg_tok,
+ struct arg_item *arg_tok,
  int check_all,
  struct list_head *head)
 {
@@ -200,7 +225,7 @@ static char *apply_command(const char *command, const char 
*arg)
return result;
 }
 
-static void apply_item_command(struct trailer_item *in_tok, struct 
trailer_item *arg_tok)
+static void apply_item_command(struct trailer_item *in_tok, struct arg_item 
*arg_tok)
 {
if (arg_tok->conf.command) {
const char *arg;
@@ -218,13 +243,13 @@ static void apply_item_command(struct trailer_item 
*in_tok, struct trailer_item
 }
 
 static void apply_arg_if_exists(struct trailer_item *in_tok,
-   struct trailer_item *arg_tok,
+   struct arg_item *arg_tok,
struct trailer_item *on_tok,
struct list_head *head)
 {
switch (arg_tok->conf.if_exists) {
case EXISTS_DO_NOTHING:
-   free_trailer_item(arg_tok);
+   free_arg_item(arg_tok);
break;
case EXISTS_REPLACE:
apply_item_command(in_tok, arg_tok);
@@ -241,39 +266,41 @@ static void apply_arg_if_exists(struct trailer_item 
*in_tok,
if (check_if_different(in_tok, arg_tok, 1, head))
add_arg_to_input_list(on_tok, arg_tok);
 

[PATCH v4 3/8] trailer: streamline trailer item create and add

2016-10-20 Thread Jonathan Tan
Currently, creation and addition (to a list) of trailer items are spread
across multiple functions. Streamline this by only having 2 functions:
one to parse the user-supplied string, and one to add the parsed
information to a list.

Signed-off-by: Jonathan Tan <jonathanta...@google.com>
---
 trailer.c | 130 +-
 1 file changed, 60 insertions(+), 70 deletions(-)

diff --git a/trailer.c b/trailer.c
index 4e85aae..ae3972a 100644
--- a/trailer.c
+++ b/trailer.c
@@ -500,10 +500,31 @@ static int git_trailer_config(const char *conf_key, const 
char *value, void *cb)
return 0;
 }
 
-static int parse_trailer(struct strbuf *tok, struct strbuf *val, const char 
*trailer)
+static const char *token_from_item(struct trailer_item *item, char *tok)
+{
+   if (item->conf.key)
+   return item->conf.key;
+   if (tok)
+   return tok;
+   return item->conf.name;
+}
+
+static int token_matches_item(const char *tok, struct trailer_item *item, int 
tok_len)
+{
+   if (!strncasecmp(tok, item->conf.name, tok_len))
+   return 1;
+   return item->conf.key ? !strncasecmp(tok, item->conf.key, tok_len) : 0;
+}
+
+static int parse_trailer(struct strbuf *tok, struct strbuf *val,
+const struct conf_info **conf, const char *trailer)
 {
size_t len;
struct strbuf seps = STRBUF_INIT;
+   struct trailer_item *item;
+   int tok_len;
+   struct list_head *pos;
+
strbuf_addstr(, separators);
strbuf_addch(, '=');
len = strcspn(trailer, seps.buf);
@@ -523,74 +544,31 @@ static int parse_trailer(struct strbuf *tok, struct 
strbuf *val, const char *tra
strbuf_addstr(tok, trailer);
strbuf_trim(tok);
}
-   return 0;
-}
-
-static const char *token_from_item(struct trailer_item *item, char *tok)
-{
-   if (item->conf.key)
-   return item->conf.key;
-   if (tok)
-   return tok;
-   return item->conf.name;
-}
-
-static struct trailer_item *new_trailer_item(struct trailer_item *conf_item,
-char *tok, char *val)
-{
-   struct trailer_item *new = xcalloc(sizeof(*new), 1);
-   new->value = val ? val : xstrdup("");
-
-   if (conf_item) {
-   duplicate_conf(>conf, _item->conf);
-   new->token = xstrdup(token_from_item(conf_item, tok));
-   free(tok);
-   } else {
-   duplicate_conf(>conf, _conf_info);
-   new->token = tok;
-   }
-
-   return new;
-}
-
-static int token_matches_item(const char *tok, struct trailer_item *item, int 
tok_len)
-{
-   if (!strncasecmp(tok, item->conf.name, tok_len))
-   return 1;
-   return item->conf.key ? !strncasecmp(tok, item->conf.key, tok_len) : 0;
-}
-
-static struct trailer_item *create_trailer_item(const char *string)
-{
-   struct strbuf tok = STRBUF_INIT;
-   struct strbuf val = STRBUF_INIT;
-   struct trailer_item *item;
-   int tok_len;
-   struct list_head *pos;
-
-   if (parse_trailer(, , string))
-   return NULL;
-
-   tok_len = token_len_without_separator(tok.buf, tok.len);
 
/* Lookup if the token matches something in the config */
+   tok_len = token_len_without_separator(tok->buf, tok->len);
+   *conf = _conf_info;
list_for_each(pos, _head) {
item = list_entry(pos, struct trailer_item, list);
-   if (token_matches_item(tok.buf, item, tok_len))
-   return new_trailer_item(item,
-   strbuf_detach(, NULL),
-   strbuf_detach(, NULL));
+   if (token_matches_item(tok->buf, item, tok_len)) {
+   char *tok_buf = strbuf_detach(tok, NULL);
+   *conf = >conf;
+   strbuf_addstr(tok, token_from_item(item, tok_buf));
+   free(tok_buf);
+   break;
+   }
}
 
-   return new_trailer_item(NULL,
-   strbuf_detach(, NULL),
-   strbuf_detach(, NULL));
+   return 0;
 }
 
-static void add_trailer_item(struct list_head *head, struct trailer_item *new)
+static void add_trailer_item(struct list_head *head, char *tok, char *val,
+const struct conf_info *conf)
 {
-   if (!new)
-   return;
+   struct trailer_item *new = xcalloc(sizeof(*new), 1);
+   new->token = tok;
+   new->value = val;
+   duplicate_conf(>conf, conf);
list_add_tail(>list, head);
 }
 
@@ -599,21 +577,28 @@ static void process_command_line_args(struct list_head 
*arg_head,
 {
struct string_list_item *

[PATCH v4 6/8] trailer: allow non-trailers in trailer block

2016-10-20 Thread Jonathan Tan
Currently, interpret-trailers requires all lines of a trailer block to
be trailers (or comments) - if not it would not identify that block as a
trailer block, and thus create its own trailer block, inserting a blank
line.  For example:

  echo -e "\nSigned-off-by: x\nnot trailer" |
  git interpret-trailers --trailer "c: d"

would result in:

  Signed-off-by: x
  not trailer

  c: d

Relax the definition of a trailer block to require that the trailers (i)
are all trailers, or (ii) contain at least one Git-generated trailer and
consists of at least 25% trailers.

  Signed-off-by: x
  not trailer
  c: d

(i) is the existing functionality. (ii) allows arbitrary lines to be
included in trailer blocks, like those in [1], and still allow
interpret-trailers to be used.

[1]
https://kernel.googlesource.com/pub/scm/linux/kernel/git/stable/linux-stable/+/e7d316a02f683864a12389f8808570e37fb90aa3

Signed-off-by: Jonathan Tan <jonathanta...@google.com>
---
 Documentation/git-interpret-trailers.txt |   5 +-
 t/t7513-interpret-trailers.sh| 115 +++
 trailer.c|  89 
 3 files changed, 194 insertions(+), 15 deletions(-)

diff --git a/Documentation/git-interpret-trailers.txt 
b/Documentation/git-interpret-trailers.txt
index 93d1db6..cf4c5ea 100644
--- a/Documentation/git-interpret-trailers.txt
+++ b/Documentation/git-interpret-trailers.txt
@@ -48,8 +48,9 @@ with only spaces at the end of the commit message part, one 
blank line
 will be added before the new trailer.
 
 Existing trailers are extracted from the input message by looking for
-a group of one or more lines that contain a colon (by default), where
-the group is preceded by one or more empty (or whitespace-only) lines.
+a group of one or more lines that (i) are all trailers, or (ii) contains at
+least one Git-generated trailer and consists of at least 25% trailers.
+The group must be preceded by one or more empty (or whitespace-only) lines.
 The group must either be at the end of the message or be the last
 non-whitespace lines before a line that starts with '---'. Such three
 minus signs start the patch part of the message.
diff --git a/t/t7513-interpret-trailers.sh b/t/t7513-interpret-trailers.sh
index aee785c..003e90f 100755
--- a/t/t7513-interpret-trailers.sh
+++ b/t/t7513-interpret-trailers.sh
@@ -126,6 +126,121 @@ test_expect_success 'with multiline title in the message' 
'
test_cmp expected actual
 '
 
+test_expect_success 'with non-trailer lines mixed with Signed-off-by' '
+   cat >patch <<-\EOF &&
+
+   this is not a trailer
+   this is not a trailer
+   Signed-off-by: a <a...@example.com>
+   this is not a trailer
+   EOF
+   cat >expected <<-\EOF &&
+
+   this is not a trailer
+   this is not a trailer
+   Signed-off-by: a <a...@example.com>
+   this is not a trailer
+   token: value
+   EOF
+   git interpret-trailers --trailer "token: value" patch >actual &&
+   test_cmp expected actual
+'
+
+test_expect_success 'with non-trailer lines mixed with cherry picked from' '
+   cat >patch <<-\EOF &&
+
+   this is not a trailer
+   this is not a trailer
+   (cherry picked from commit x)
+   this is not a trailer
+   EOF
+   cat >expected <<-\EOF &&
+
+   this is not a trailer
+   this is not a trailer
+   (cherry picked from commit x)
+   this is not a trailer
+   token: value
+   EOF
+   git interpret-trailers --trailer "token: value" patch >actual &&
+   test_cmp expected actual
+'
+
+test_expect_success 'with non-trailer lines mixed with a configured trailer' '
+   cat >patch <<-\EOF &&
+
+   this is not a trailer
+   this is not a trailer
+   My-trailer: x
+   this is not a trailer
+   EOF
+   cat >expected <<-\EOF &&
+
+   this is not a trailer
+   this is not a trailer
+   My-trailer: x
+   this is not a trailer
+   token: value
+   EOF
+   test_config trailer.my.key "My-trailer: " &&
+   git interpret-trailers --trailer "token: value" patch >actual &&
+   test_cmp expected actual
+'
+
+test_expect_success 'with non-trailer lines mixed with a non-configured 
trailer' '
+   cat >patch <<-\EOF &&
+
+   this is not a trailer
+   this is not a trailer
+   I-am-not-configured: x
+   this is not a trailer
+   EOF
+   cat >expected <<-\EOF &&
+
+  

[PATCH v4 2/8] trailer: use list.h for doubly-linked list

2016-10-20 Thread Jonathan Tan
Replace the existing handwritten implementation of a doubly-linked list
in trailer.c with the functions and macros from list.h. This
significantly simplifies the code.

Signed-off-by: Jonathan Tan <jonathanta...@google.com>
Signed-off-by: Ramsay Jones <ram...@ramsayjones.plus.com>
---
 trailer.c | 258 ++
 1 file changed, 91 insertions(+), 167 deletions(-)

diff --git a/trailer.c b/trailer.c
index 1f191b2..4e85aae 100644
--- a/trailer.c
+++ b/trailer.c
@@ -4,6 +4,7 @@
 #include "commit.h"
 #include "tempfile.h"
 #include "trailer.h"
+#include "list.h"
 /*
  * Copyright (c) 2013, 2014 Christian Couder <chrisc...@tuxfamily.org>
  */
@@ -25,19 +26,24 @@ struct conf_info {
 static struct conf_info default_conf_info;
 
 struct trailer_item {
-   struct trailer_item *previous;
-   struct trailer_item *next;
+   struct list_head list;
char *token;
char *value;
struct conf_info conf;
 };
 
-static struct trailer_item *first_conf_item;
+static LIST_HEAD(conf_head);
 
 static char *separators = ":";
 
 #define TRAILER_ARG_STRING "$ARG"
 
+/* Iterate over the elements of the list. */
+#define list_for_each_dir(pos, head, is_reverse) \
+   for (pos = is_reverse ? (head)->prev : (head)->next; \
+   pos != (head); \
+   pos = is_reverse ? pos->prev : pos->next)
+
 static int after_or_end(enum action_where where)
 {
return (where == WHERE_AFTER) || (where == WHERE_END);
@@ -120,101 +126,49 @@ static void print_tok_val(FILE *outfile, const char 
*tok, const char *val)
fprintf(outfile, "%s%c %s\n", tok, separators[0], val);
 }
 
-static void print_all(FILE *outfile, struct trailer_item *first, int 
trim_empty)
+static void print_all(FILE *outfile, struct list_head *head, int trim_empty)
 {
+   struct list_head *pos;
struct trailer_item *item;
-   for (item = first; item; item = item->next) {
+   list_for_each(pos, head) {
+   item = list_entry(pos, struct trailer_item, list);
if (!trim_empty || strlen(item->value) > 0)
print_tok_val(outfile, item->token, item->value);
}
 }
 
-static void update_last(struct trailer_item **last)
-{
-   if (*last)
-   while ((*last)->next != NULL)
-   *last = (*last)->next;
-}
-
-static void update_first(struct trailer_item **first)
-{
-   if (*first)
-   while ((*first)->previous != NULL)
-   *first = (*first)->previous;
-}
-
 static void add_arg_to_input_list(struct trailer_item *on_tok,
- struct trailer_item *arg_tok,
- struct trailer_item **first,
- struct trailer_item **last)
-{
-   if (after_or_end(arg_tok->conf.where)) {
-   arg_tok->next = on_tok->next;
-   on_tok->next = arg_tok;
-   arg_tok->previous = on_tok;
-   if (arg_tok->next)
-   arg_tok->next->previous = arg_tok;
-   update_last(last);
-   } else {
-   arg_tok->previous = on_tok->previous;
-   on_tok->previous = arg_tok;
-   arg_tok->next = on_tok;
-   if (arg_tok->previous)
-   arg_tok->previous->next = arg_tok;
-   update_first(first);
-   }
+ struct trailer_item *arg_tok)
+{
+   if (after_or_end(arg_tok->conf.where))
+   list_add(_tok->list, _tok->list);
+   else
+   list_add_tail(_tok->list, _tok->list);
 }
 
 static int check_if_different(struct trailer_item *in_tok,
  struct trailer_item *arg_tok,
- int check_all)
+ int check_all,
+ struct list_head *head)
 {
enum action_where where = arg_tok->conf.where;
+   struct list_head *next_head;
do {
-   if (!in_tok)
-   return 1;
if (same_trailer(in_tok, arg_tok))
return 0;
/*
 * if we want to add a trailer after another one,
 * we have to check those before this one
 */
-   in_tok = after_or_end(where) ? in_tok->previous : in_tok->next;
+   next_head = after_or_end(where) ? in_tok->list.prev
+   : in_tok->list.next;
+   if (next_head == head)
+   break;
+   in_tok = list_entry(next_head, struct trailer_item, list);
} while (check_all);
return 1;
 }
 
-static void 

[PATCH v4 7/8] trailer: forbid leading whitespace in trailers

2016-10-20 Thread Jonathan Tan
Currently, interpret-trailers allows leading whitespace in trailer
lines. This leads to false positives, especially for quoted lines or
bullet lists.

Forbid leading whitespace in trailers.

Signed-off-by: Jonathan Tan <jonathanta...@google.com>
---
 Documentation/git-interpret-trailers.txt |  2 +-
 t/t7513-interpret-trailers.sh| 15 +++
 trailer.c|  2 +-
 3 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-interpret-trailers.txt 
b/Documentation/git-interpret-trailers.txt
index cf4c5ea..4966b5b 100644
--- a/Documentation/git-interpret-trailers.txt
+++ b/Documentation/git-interpret-trailers.txt
@@ -55,7 +55,7 @@ The group must either be at the end of the message or be the 
last
 non-whitespace lines before a line that starts with '---'. Such three
 minus signs start the patch part of the message.
 
-When reading trailers, there can be whitespaces before and after the
+When reading trailers, there can be whitespaces after the
 token, the separator and the value. There can also be whitespaces
 inside the token and the value.
 
diff --git a/t/t7513-interpret-trailers.sh b/t/t7513-interpret-trailers.sh
index 003e90f..3d94b3a 100755
--- a/t/t7513-interpret-trailers.sh
+++ b/t/t7513-interpret-trailers.sh
@@ -241,6 +241,21 @@ test_expect_success 'with non-trailer lines only' '
test_cmp expected actual
 '
 
+test_expect_success 'line with leading whitespace is not trailer' '
+   q_to_tab >patch <<-\EOF &&
+
+   Qtoken: value
+   EOF
+   q_to_tab >expected <<-\EOF &&
+
+   Qtoken: value
+
+   token: value
+   EOF
+   git interpret-trailers --trailer "token: value" patch >actual &&
+   test_cmp expected actual
+'
+
 test_expect_success 'with config setup' '
git config trailer.ack.key "Acked-by: " &&
cat >expected <<-\EOF &&
diff --git a/trailer.c b/trailer.c
index da15b79..3ef5576 100644
--- a/trailer.c
+++ b/trailer.c
@@ -770,7 +770,7 @@ static int find_trailer_start(struct strbuf **lines, int 
count)
}
 
separator_pos = find_separator(lines[start]->buf);
-   if (separator_pos >= 1) {
+   if (separator_pos >= 1 && !isspace(lines[start]->buf[0])) {
struct list_head *pos;
 
trailer_lines++;
-- 
2.8.0.rc3.226.g39d4020



[PATCH v4 8/8] trailer: support values folded to multiple lines

2016-10-20 Thread Jonathan Tan
Currently, interpret-trailers requires that a trailer be only on 1 line.
For example:

a: first line
   second line

would be interpreted as one trailer line followed by one non-trailer line.

Make interpret-trailers support RFC 822-style folding, treating those
lines as one logical trailer.

Signed-off-by: Jonathan Tan <jonathanta...@google.com>
---
 Documentation/git-interpret-trailers.txt |   7 +-
 t/t7513-interpret-trailers.sh| 169 +++
 trailer.c|  43 ++--
 3 files changed, 210 insertions(+), 9 deletions(-)

diff --git a/Documentation/git-interpret-trailers.txt 
b/Documentation/git-interpret-trailers.txt
index 4966b5b..e99bda6 100644
--- a/Documentation/git-interpret-trailers.txt
+++ b/Documentation/git-interpret-trailers.txt
@@ -57,11 +57,12 @@ minus signs start the patch part of the message.
 
 When reading trailers, there can be whitespaces after the
 token, the separator and the value. There can also be whitespaces
-inside the token and the value.
+inside the token and the value. The value may be split over multiple lines with
+each subsequent line starting with whitespace, like the "folding" in RFC 822.
 
 Note that 'trailers' do not follow and are not intended to follow many
-rules for RFC 822 headers. For example they do not follow the line
-folding rules, the encoding rules and probably many other rules.
+rules for RFC 822 headers. For example they do not follow
+the encoding rules and probably many other rules.
 
 OPTIONS
 ---
diff --git a/t/t7513-interpret-trailers.sh b/t/t7513-interpret-trailers.sh
index 3d94b3a..4dd1d7c 100755
--- a/t/t7513-interpret-trailers.sh
+++ b/t/t7513-interpret-trailers.sh
@@ -256,6 +256,175 @@ test_expect_success 'line with leading whitespace is not 
trailer' '
test_cmp expected actual
 '
 
+test_expect_success 'multiline field treated as one trailer for 25% check' '
+   q_to_tab >patch <<-\EOF &&
+
+   Signed-off-by: a <a...@example.com>
+   name: value on
+   Qmultiple lines
+   this is not a trailer
+   this is not a trailer
+   this is not a trailer
+   this is not a trailer
+   this is not a trailer
+   this is not a trailer
+   EOF
+   q_to_tab >expected <<-\EOF &&
+
+   Signed-off-by: a <a...@example.com>
+   name: value on
+   Qmultiple lines
+   this is not a trailer
+   this is not a trailer
+   this is not a trailer
+   this is not a trailer
+   this is not a trailer
+   this is not a trailer
+   name: value
+   EOF
+   git interpret-trailers --trailer "name: value" patch >actual &&
+   test_cmp expected actual
+'
+
+test_expect_success 'multiline field treated as atomic for placement' '
+   q_to_tab >patch <<-\EOF &&
+
+   another: trailer
+   name: value on
+   Qmultiple lines
+   another: trailer
+   EOF
+   q_to_tab >expected <<-\EOF &&
+
+   another: trailer
+   name: value on
+   Qmultiple lines
+   name: value
+   another: trailer
+   EOF
+   test_config trailer.name.where after &&
+   git interpret-trailers --trailer "name: value" patch >actual &&
+   test_cmp expected actual
+'
+
+test_expect_success 'multiline field treated as atomic for replacement' '
+   q_to_tab >patch <<-\EOF &&
+
+   another: trailer
+   name: value on
+   Qmultiple lines
+   another: trailer
+   EOF
+   q_to_tab >expected <<-\EOF &&
+
+   another: trailer
+   another: trailer
+   name: value
+   EOF
+   test_config trailer.name.ifexists replace &&
+   git interpret-trailers --trailer "name: value" patch >actual &&
+   test_cmp expected actual
+'
+
+test_expect_success 'multiline field treated as atomic for difference check' '
+   q_to_tab >patch <<-\EOF &&
+
+   another: trailer
+   name: first line
+   Qsecond line
+   another: trailer
+   EOF
+   test_config trailer.name.ifexists addIfDifferent &&
+
+   q_to_tab >trailer <<-\EOF &&
+   name: first line
+   Qsecond line
+   EOF
+   q_to_tab >expected <<-\EOF &&
+
+   another: trailer
+   name: first line
+   Qsecond line
+   another: trailer
+   EOF
+   git interpret-trailers --trailer "$(cat trailer)&quo

[PATCH v4 5/8] trailer: clarify failure modes in parse_trailer

2016-10-20 Thread Jonathan Tan
The parse_trailer function has a few modes of operation, all depending
on whether the separator is present in its input, and if yes, the
separator's position. Some of these modes are failure modes, and these
failure modes are handled differently depending on whether the trailer
line was sourced from a file or from a command-line argument.

Extract a function to find the separator, allowing the invokers of
parse_trailer to determine how to handle the failure modes instead of
making parse_trailer do it.

Signed-off-by: Jonathan Tan <jonathanta...@google.com>
---
 trailer.c | 70 +++
 1 file changed, 48 insertions(+), 22 deletions(-)

diff --git a/trailer.c b/trailer.c
index 99018f8..137a3fb 100644
--- a/trailer.c
+++ b/trailer.c
@@ -543,29 +543,40 @@ static int token_matches_item(const char *tok, struct 
arg_item *item, int tok_le
return item->conf.key ? !strncasecmp(tok, item->conf.key, tok_len) : 0;
 }
 
-static int parse_trailer(struct strbuf *tok, struct strbuf *val,
-const struct conf_info **conf, const char *trailer)
+/*
+ * Return the location of the first separator or '=' in line, or -1 if either a
+ * newline or the null terminator is reached first.
+ */
+static int find_separator(const char *line)
+{
+   const char *c;
+   for (c = line; ; c++) {
+   if (!*c || *c == '\n')
+   return -1;
+   if (*c == '=' || strchr(separators, *c))
+   return c - line;
+   }
+}
+
+/*
+ * Obtain the token, value, and conf from the given trailer.
+ *
+ * separator_pos must not be 0, since the token cannot be an empty string.
+ *
+ * If separator_pos is -1, interpret the whole trailer as a token.
+ */
+static void parse_trailer(struct strbuf *tok, struct strbuf *val,
+const struct conf_info **conf, const char *trailer,
+int separator_pos)
 {
-   size_t len;
-   struct strbuf seps = STRBUF_INIT;
struct arg_item *item;
int tok_len;
struct list_head *pos;
 
-   strbuf_addstr(, separators);
-   strbuf_addch(, '=');
-   len = strcspn(trailer, seps.buf);
-   strbuf_release();
-   if (len == 0) {
-   int l = strlen(trailer);
-   while (l > 0 && isspace(trailer[l - 1]))
-   l--;
-   return error(_("empty trailer token in trailer '%.*s'"), l, 
trailer);
-   }
-   if (len < strlen(trailer)) {
-   strbuf_add(tok, trailer, len);
+   if (separator_pos != -1) {
+   strbuf_add(tok, trailer, separator_pos);
strbuf_trim(tok);
-   strbuf_addstr(val, trailer + len + 1);
+   strbuf_addstr(val, trailer + separator_pos + 1);
strbuf_trim(val);
} else {
strbuf_addstr(tok, trailer);
@@ -587,8 +598,6 @@ static int parse_trailer(struct strbuf *tok, struct strbuf 
*val,
break;
}
}
-
-   return 0;
 }
 
 static void add_trailer_item(struct list_head *head, char *tok, char *val)
@@ -631,11 +640,22 @@ static void process_command_line_args(struct list_head 
*arg_head,
 
/* Add an arg item for each trailer on the command line */
for_each_string_list_item(tr, trailers) {
-   if (!parse_trailer(, , , tr->string))
+   int separator_pos = find_separator(tr->string);
+   if (separator_pos == 0) {
+   struct strbuf sb = STRBUF_INIT;
+   strbuf_addstr(, tr->string);
+   strbuf_trim();
+   error(_("empty trailer token in trailer '%.*s'"),
+ (int) sb.len, sb.buf);
+   strbuf_release();
+   } else {
+   parse_trailer(, , , tr->string,
+ separator_pos);
add_arg_item(arg_head,
 strbuf_detach(, NULL),
 strbuf_detach(, NULL),
 conf);
+   }
}
 }
 
@@ -775,11 +795,17 @@ static int process_input_file(FILE *outfile,
 
/* Parse trailer lines */
for (i = trailer_start; i < trailer_end; i++) {
-   if (lines[i]->buf[0] != comment_line_char &&
-   !parse_trailer(, , NULL, lines[i]->buf))
+   int separator_pos;
+   if (lines[i]->buf[0] == comment_line_char)
+   continue;
+   separator_pos = find_separator(lines[i]->buf);
+   if (separator_pos >= 1) {
+   parse_trailer(, , NULL, lines[i]->buf,
+ separator_pos);
 

[PATCH v4 0/8] allow non-trailers and multiple-line trailers

2016-10-20 Thread Jonathan Tan
Main changes are:
 - implemented the previously discussed trailer block recognizing rule
   (recognized trailer + 25% trailers or 100% trailers)
 - forbidding leading whitespace in trailers to avoid false positives

Once the recognized trailer + 25% trailers rule is implemented,
implementing the 100% trailer rule gives us backwards compatibility and
is only a few lines of code, so I included it.

Summary of changes from v3:
 2/6->2/8:
   - squashed Ramsay Jones's "static" patch
 new->5/8:
   - new patch
 5/6->6/8:
   - new trailer block recognizing rule
   - reverted to the existing behavior of ignoring comments, since the
 number of trailers and non-trailers in the trailer block now
 matters more
 new->7/8:
   - new patch
 6/6->8/8:
   - updated trailer block recognizing code, since the continuation
 lines must not be counted if they follow a trailer line

Jonathan Tan (8):
  trailer: improve const correctness
  trailer: use list.h for doubly-linked list
  trailer: streamline trailer item create and add
  trailer: make args have their own struct
  trailer: clarify failure modes in parse_trailer
  trailer: allow non-trailers in trailer block
  trailer: forbid leading whitespace in trailers
  trailer: support values folded to multiple lines

 Documentation/git-interpret-trailers.txt |  14 +-
 t/t7513-interpret-trailers.sh| 299 +++
 trailer.c| 619 +--
 3 files changed, 651 insertions(+), 281 deletions(-)

-- 
2.8.0.rc3.226.g39d4020



[PATCH v4 1/8] trailer: improve const correctness

2016-10-20 Thread Jonathan Tan
Change "const char *" to "char *" in struct trailer_item and in the
return value of apply_command (since those strings are owned strings).

Change "struct conf_info *" to "const struct conf_info *" (since that
struct is not modified).

Signed-off-by: Jonathan Tan <jonathanta...@google.com>
---
 trailer.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/trailer.c b/trailer.c
index c6ea9ac..1f191b2 100644
--- a/trailer.c
+++ b/trailer.c
@@ -27,8 +27,8 @@ static struct conf_info default_conf_info;
 struct trailer_item {
struct trailer_item *previous;
struct trailer_item *next;
-   const char *token;
-   const char *value;
+   char *token;
+   char *value;
struct conf_info conf;
 };
 
@@ -95,8 +95,8 @@ static void free_trailer_item(struct trailer_item *item)
free(item->conf.name);
free(item->conf.key);
free(item->conf.command);
-   free((char *)item->token);
-   free((char *)item->value);
+   free(item->token);
+   free(item->value);
free(item);
 }
 
@@ -215,13 +215,13 @@ static struct trailer_item *remove_first(struct 
trailer_item **first)
return item;
 }
 
-static const char *apply_command(const char *command, const char *arg)
+static char *apply_command(const char *command, const char *arg)
 {
struct strbuf cmd = STRBUF_INIT;
struct strbuf buf = STRBUF_INIT;
struct child_process cp = CHILD_PROCESS_INIT;
const char *argv[] = {NULL, NULL};
-   const char *result;
+   char *result;
 
strbuf_addstr(, command);
if (arg)
@@ -425,7 +425,7 @@ static int set_if_missing(struct conf_info *item, const 
char *value)
return 0;
 }
 
-static void duplicate_conf(struct conf_info *dst, struct conf_info *src)
+static void duplicate_conf(struct conf_info *dst, const struct conf_info *src)
 {
*dst = *src;
if (src->name)
-- 
2.8.0.rc3.226.g39d4020



Re: [PATCH v3 5/6] trailer: allow non-trailers in trailer block

2016-10-19 Thread Jonathan Tan

On 10/18/2016 09:36 AM, Junio C Hamano wrote:

Jonathan Tan <jonathanta...@google.com> writes:


* rs/c-auto-resets-attributes:
  pretty: avoid adding reset for %C(auto) if output is empty

And neither of the two colon containing line remotely resembles how
a typical RFC-822 header is formatted.  So that may serve as a hint
to how we can tighten it without introducing false negative.


The only "offending" character is the space (according to RFC 822),
but that sounds like a good rule to have.


I suspect that we should be willing to deviate from the letter of
RFC to reject misidentification.  I saw things like

    Thanks to: Jonathan Tan <j...@host.xz>
Signed-off-by: A U Thor <a...@th.or>

in the wild (notice the SP between Thanks and to), for example.
Rejecting leading whitespace as a line that does *not* start the
header (hence its colon does not count) may be a good compromise.


Good point.


I think that "Signed-off-by:" is not guaranteed to be
present.


But do we really care about that case where there is no S-o-b:?  I
personally do not think so.


I think we should - the use case I had in mind when I started this is 
the Android Git repository, which does not use Signed-off-by. For 
example, I quoted this commit in an earlier e-mail [1]:


https://android.googlesource.com/platform/frameworks/base/+/4c5281862f750cbc9d7355a07ef1a5545b9b3523

which has the footer:

  Bug: http://b/30562229
  Test: readelf --dyn-sym app_process32 and check that bsd_signal is 
exported
readelf --dyn-sym app_process64 and check that bsd_signal is 
not exported

  Change-Id: Iec584070b42bc7fa43b114c0f884aff2db5a6858


Defining a trailer line as "a line starting with a token,
then optional whitespace, then separator", maybe the following rule:
- at least one trailer line generated by Git ("(cherry picked by" or
"Signed-off-by") or configured in the "trailer" section in gitconfig
OR
- at least 3/4 logical trailer lines (I'm wondering if this should be
100% trailer lines)


I'd strongly suggest turning that OR to AND.  We will not safely be
able to write a commit log message that describes how S-o-b lines
are handled in its last paragraph otherwise.

I do not care too deeply about 3/4, but I meant to allow 75% cruft
but no more than that, and the fact that the threashold is set at
way more than 50% is important.  IOW, if you have

Ordinary log message here...

S-o-b: A U Thor <a...@th.or>
[a short description that is typically a single liner
in the real world use pattern we saw in the world, but
could overflow to become multi line cruft]
S-o-b: R E Layer <r...@lay.er>

"last paragraph" is 5 lines long, among which 60% are cruft that is
below the 75% threshold, and "am -s" can still add the S-o-b of the
committer at the end of that existing last paragraph.  Making it too
strict would raise the false negative ratio.


Ah, sorry, I misread your original suggestion.

Would this work then:
- at least one trailer line generated by Git ("(cherry picked by" or
  "Signed-off-by: ") or configured in the "trailer" section in
  git config AND at least 25% logical trailer lines
OR
- 100% logical trailer lines

The first part is your original suggestion except that I think that it 
is more consistent to allow other trailer lines as well (but I do not 
feel too strongly about this). The second part would satisfy the Android 
Git use case above, and also retain existing behavior when 
"Signed-off-by" (for example) is added to an existing footer that does 
not contain "Signed-off-by" yet.


What do you think?

[1] Message ID <29cb0f55-f729-80af-cdca-64e927fa9...@google.com>


Re: [PATCH v3 5/6] trailer: allow non-trailers in trailer block

2016-10-17 Thread Jonathan Tan

On 10/17/2016 06:42 PM, Junio C Hamano wrote:

Stefan Beller <sbel...@google.com> writes:


On Fri, Oct 14, 2016 at 10:38 AM, Jonathan Tan <jonathanta...@google.com> wrote:


 Existing trailers are extracted from the input message by looking for
-a group of one or more lines that contain a colon (by default), where
+a group of one or more lines in which at least one line contains a
+colon (by default), where


Please see commit
578e6021c0819d7be1179e05e7ce0e6fdb2a01b7
for an example where I think this is overly broad.


Hmph.  That's a merge.

Merge branch 'rs/c-auto-resets-attributes'

When "%C(auto)" appears at the very beginning of the pretty format
string, it did not need to issue the reset sequence, but it did.

* rs/c-auto-resets-attributes:
  pretty: avoid adding reset for %C(auto) if output is empty

And neither of the two colon containing line remotely resembles how
a typical RFC-822 header is formatted.  So that may serve as a hint
to how we can tighten it without introducing false negative.


The only "offending" character is the space (according to RFC 822), but 
that sounds like a good rule to have.



Another made up example, that I'd want to feed
in commit -s eventually:

--8<--
demonstrate colons in Java

First paragraph is not interesting.

Also if using another Language such as Java, where I point out
Class::function() to be problematic
--8<--

This would lack the white space between the last paragraph and
the Sign off ?

So for this patch I am mostly concerned about false positives hidden
in actual text.


Yes.

These are exactly why I mentioned "if certian number or percentage"
in my earlier suggestion.

I think in practice, "A paragraph with at least one Signed-off-by:
line, and has no more than 3/4 of the (logical) lines that do not
resemble how a typical RFC-822 header is formatted" or something
along that line would give us a reasonable safety.


I think that "Signed-off-by:" is not guaranteed to be present. Defining 
a trailer line as "a line starting with a token, then optional 
whitespace, then separator", maybe the following rule:
- at least one trailer line generated by Git ("(cherry picked by" or 
"Signed-off-by") or configured in the "trailer" section in gitconfig

OR
- at least 3/4 logical trailer lines (I'm wondering if this should be 
100% trailer lines)


?


Your Java example will fail the criteria in two ways, so we'd be
safe ;-)


[PATCH v3 6/6] trailer: support values folded to multiple lines

2016-10-14 Thread Jonathan Tan
Currently, interpret-trailers requires that a trailer be only on 1 line.
For example:

  a: first line
 second line

would be interpreted as one trailer line followed by one non-trailer line.

Make interpret-trailers support RFC 822-style folding, treating those
lines as one logical trailer.

Signed-off-by: Jonathan Tan <jonathanta...@google.com>
---
 Documentation/git-interpret-trailers.txt |   7 +-
 t/t7513-interpret-trailers.sh| 139 +++
 trailer.c|  22 +++--
 3 files changed, 160 insertions(+), 8 deletions(-)

diff --git a/Documentation/git-interpret-trailers.txt 
b/Documentation/git-interpret-trailers.txt
index c480da6..cfec636 100644
--- a/Documentation/git-interpret-trailers.txt
+++ b/Documentation/git-interpret-trailers.txt
@@ -57,11 +57,12 @@ minus signs start the patch part of the message.
 
 When reading trailers, there can be whitespaces before and after the
 token, the separator and the value. There can also be whitespaces
-inside the token and the value.
+inside the token and the value. The value may be split over multiple lines with
+each subsequent line starting with whitespace, like the "folding" in RFC 822.
 
 Note that 'trailers' do not follow and are not intended to follow many
-rules for RFC 822 headers. For example they do not follow the line
-folding rules, the encoding rules and probably many other rules.
+rules for RFC 822 headers. For example they do not follow
+the encoding rules and probably many other rules.
 
 OPTIONS
 ---
diff --git a/t/t7513-interpret-trailers.sh b/t/t7513-interpret-trailers.sh
index 7f5cd2a..31db699 100755
--- a/t/t7513-interpret-trailers.sh
+++ b/t/t7513-interpret-trailers.sh
@@ -157,6 +157,145 @@ test_expect_success 'with non-trailer lines only' '
test_cmp expected actual
 '
 
+test_expect_success 'multiline field treated as atomic for placement' '
+   q_to_tab >patch <<-\EOF &&
+
+   another: trailer
+   name: value on
+   Qmultiple lines
+   another: trailer
+   EOF
+   q_to_tab >expected <<-\EOF &&
+
+   another: trailer
+   name: value on
+   Qmultiple lines
+   name: value
+   another: trailer
+   EOF
+   test_config trailer.name.where after &&
+   git interpret-trailers --trailer "name: value" patch >actual &&
+   test_cmp expected actual
+'
+
+test_expect_success 'multiline field treated as atomic for replacement' '
+   q_to_tab >patch <<-\EOF &&
+
+   another: trailer
+   name: value on
+   Qmultiple lines
+   another: trailer
+   EOF
+   q_to_tab >expected <<-\EOF &&
+
+   another: trailer
+   another: trailer
+   name: value
+   EOF
+   test_config trailer.name.ifexists replace &&
+   git interpret-trailers --trailer "name: value" patch >actual &&
+   test_cmp expected actual
+'
+
+test_expect_success 'multiline field treated as atomic for difference check' '
+   q_to_tab >patch <<-\EOF &&
+
+   another: trailer
+   name: first line
+   Qsecond line
+   another: trailer
+   EOF
+   test_config trailer.name.ifexists addIfDifferent &&
+
+   q_to_tab >trailer <<-\EOF &&
+   name: first line
+   Qsecond line
+   EOF
+   q_to_tab >expected <<-\EOF &&
+
+   another: trailer
+   name: first line
+   Qsecond line
+   another: trailer
+   EOF
+   git interpret-trailers --trailer "$(cat trailer)" patch >actual &&
+   test_cmp expected actual &&
+
+   q_to_tab >trailer <<-\EOF &&
+   name: first line
+   Qsecond line
+   EOF
+   q_to_tab >expected <<-\EOF &&
+
+   another: trailer
+   name: first line
+   Qsecond line
+   another: trailer
+   name: first line
+   Qsecond line
+   EOF
+   git interpret-trailers --trailer "$(cat trailer)" patch >actual &&
+   test_cmp expected actual &&
+
+   q_to_tab >trailer <<-\EOF &&
+   name: first line *DIFFERENT*
+   Qsecond line
+   EOF
+   q_to_tab >expected <<-\EOF &&
+
+   another: trailer
+   name: first line
+   Qsecond line
+   another: trailer
+   name: first line *DIFFERENT*
+   Qsecond line
+   EOF
+   git interpret-trailers --trailer "$(cat trailer

[PATCH v3 5/6] trailer: allow non-trailers in trailer block

2016-10-14 Thread Jonathan Tan
Currently, interpret-trailers requires all lines of a trailer block to
be trailers (or comments) - if not it would not identify that block as a
trailer block, and thus create its own trailer block, inserting a blank
line.  For example:

  echo -e "\na: b\nnot trailer" |
  git interpret-trailers --trailer "c: d"

would result in:

  a: b
  not trailer

  c: d

Relax the definition of a trailer block to only require 1 trailer, so
that trailers can be directly added to such blocks, resulting in:

  a: b
  not trailer
  c: d

This allows arbitrary lines to be included in trailer blocks, like those
in [1], and still allow interpret-trailers to be used.

This change also makes comments in the trailer block be treated as any
other non-trailer line, preserving them in the output of
interpret-trailers.

[1]
https://kernel.googlesource.com/pub/scm/linux/kernel/git/stable/linux-stable/+/e7d316a02f683864a12389f8808570e37fb90aa3

Signed-off-by: Jonathan Tan <jonathanta...@google.com>
---
 Documentation/git-interpret-trailers.txt |  3 +-
 t/t7513-interpret-trailers.sh| 35 +++
 trailer.c| 77 ++--
 3 files changed, 90 insertions(+), 25 deletions(-)

diff --git a/Documentation/git-interpret-trailers.txt 
b/Documentation/git-interpret-trailers.txt
index 93d1db6..c480da6 100644
--- a/Documentation/git-interpret-trailers.txt
+++ b/Documentation/git-interpret-trailers.txt
@@ -48,7 +48,8 @@ with only spaces at the end of the commit message part, one 
blank line
 will be added before the new trailer.
 
 Existing trailers are extracted from the input message by looking for
-a group of one or more lines that contain a colon (by default), where
+a group of one or more lines in which at least one line contains a 
+colon (by default), where
 the group is preceded by one or more empty (or whitespace-only) lines.
 The group must either be at the end of the message or be the last
 non-whitespace lines before a line that starts with '---'. Such three
diff --git a/t/t7513-interpret-trailers.sh b/t/t7513-interpret-trailers.sh
index aee785c..7f5cd2a 100755
--- a/t/t7513-interpret-trailers.sh
+++ b/t/t7513-interpret-trailers.sh
@@ -126,6 +126,37 @@ test_expect_success 'with multiline title in the message' '
test_cmp expected actual
 '
 
+test_expect_success 'with non-trailer lines mixed with trailer lines' '
+   cat >patch <<-\EOF &&
+
+   this: is a trailer
+   this is not a trailer
+   EOF
+   cat >expected <<-\EOF &&
+
+   this: is a trailer
+   this is not a trailer
+   token: value
+   EOF
+   git interpret-trailers --trailer "token: value" patch >actual &&
+   test_cmp expected actual
+'
+
+test_expect_success 'with non-trailer lines only' '
+   cat >patch <<-\EOF &&
+
+   this is not a trailer
+   EOF
+   cat >expected <<-\EOF &&
+
+   this is not a trailer
+
+   token: value
+   EOF
+   git interpret-trailers --trailer "token: value" patch >actual &&
+   test_cmp expected actual
+'
+
 test_expect_success 'with config setup' '
git config trailer.ack.key "Acked-by: " &&
cat >expected <<-\EOF &&
@@ -257,6 +288,8 @@ test_expect_success 'with message that has comments' '
cat >>expected <<-\EOF &&
# comment
 
+   # other comment
+   # yet another comment
Reviewed-by: Johan
Cc: Peff
# last comment
@@ -286,6 +319,8 @@ test_expect_success 'with message that has an old style 
conflict block' '
cat >>expected <<-\EOF &&
# comment
 
+   # other comment
+   # yet another comment
Reviewed-by: Johan
Cc: Peff
# last comment
diff --git a/trailer.c b/trailer.c
index a9ed3f8..d6dfc7a 100644
--- a/trailer.c
+++ b/trailer.c
@@ -27,6 +27,10 @@ static struct conf_info default_conf_info;
 
 struct trailer_item {
struct list_head list;
+   /*
+* If this is not a trailer line, the line is stored in value
+* (excluding the terminating newline) and token is NULL.
+*/
char *token;
char *value;
 };
@@ -70,9 +74,14 @@ static size_t token_len_without_separator(const char *token, 
size_t len)
 
 static int same_token(struct trailer_item *a, struct arg_item *b)
 {
-   size_t a_len = token_len_without_separator(a->token, strlen(a->token));
-   size_t b_len = token_len_without_separator(b->token, strlen(b->token));
-   size_t min_len = (a_len > b_len) ? b_len : a_len;
+   size_t a_len, b_len, min_len;
+
+   if (!a->token)
+   

[PATCH v3 4/6] trailer: make args have their own struct

2016-10-14 Thread Jonathan Tan
Improve type safety by making arguments (whether from configuration or
from the command line) have their own "struct arg_item" type, separate
from the "struct trailer_item" type used to represent the trailers in
the buffer being manipulated.

This change also prepares "struct trailer_item" to be further
differentiated from "struct arg_item" in future patches.

Signed-off-by: Jonathan Tan <jonathanta...@google.com>
---
 trailer.c | 135 +++---
 1 file changed, 85 insertions(+), 50 deletions(-)

diff --git a/trailer.c b/trailer.c
index 54cc930..a9ed3f8 100644
--- a/trailer.c
+++ b/trailer.c
@@ -29,6 +29,12 @@ struct trailer_item {
struct list_head list;
char *token;
char *value;
+};
+
+struct arg_item {
+   struct list_head list;
+   char *token;
+   char *value;
struct conf_info conf;
 };
 
@@ -62,7 +68,7 @@ static size_t token_len_without_separator(const char *token, 
size_t len)
return len;
 }
 
-static int same_token(struct trailer_item *a, struct trailer_item *b)
+static int same_token(struct trailer_item *a, struct arg_item *b)
 {
size_t a_len = token_len_without_separator(a->token, strlen(a->token));
size_t b_len = token_len_without_separator(b->token, strlen(b->token));
@@ -71,12 +77,12 @@ static int same_token(struct trailer_item *a, struct 
trailer_item *b)
return !strncasecmp(a->token, b->token, min_len);
 }
 
-static int same_value(struct trailer_item *a, struct trailer_item *b)
+static int same_value(struct trailer_item *a, struct arg_item *b)
 {
return !strcasecmp(a->value, b->value);
 }
 
-static int same_trailer(struct trailer_item *a, struct trailer_item *b)
+static int same_trailer(struct trailer_item *a, struct arg_item *b)
 {
return same_token(a, b) && same_value(a, b);
 }
@@ -98,6 +104,13 @@ static inline void strbuf_replace(struct strbuf *sb, const 
char *a, const char *
 
 static void free_trailer_item(struct trailer_item *item)
 {
+   free(item->token);
+   free(item->value);
+   free(item);
+}
+
+static void free_arg_item(struct arg_item *item)
+{
free(item->conf.name);
free(item->conf.key);
free(item->conf.command);
@@ -137,17 +150,29 @@ static void print_all(FILE *outfile, struct list_head 
*head, int trim_empty)
}
 }
 
+static struct trailer_item *trailer_from_arg(struct arg_item *arg_tok)
+{
+   struct trailer_item *new = xcalloc(sizeof(*new), 1);
+   new->token = arg_tok->token;
+   new->value = arg_tok->value;
+   arg_tok->token = arg_tok->value = NULL;
+   free_arg_item(arg_tok);
+   return new;
+}
+
 static void add_arg_to_input_list(struct trailer_item *on_tok,
- struct trailer_item *arg_tok)
+ struct arg_item *arg_tok)
 {
-   if (after_or_end(arg_tok->conf.where))
-   list_add(_tok->list, _tok->list);
+   int aoe = after_or_end(arg_tok->conf.where);
+   struct trailer_item *to_add = trailer_from_arg(arg_tok);
+   if (aoe)
+   list_add(_add->list, _tok->list);
else
-   list_add_tail(_tok->list, _tok->list);
+   list_add_tail(_add->list, _tok->list);
 }
 
 static int check_if_different(struct trailer_item *in_tok,
- struct trailer_item *arg_tok,
+ struct arg_item *arg_tok,
  int check_all,
  struct list_head *head)
 {
@@ -200,7 +225,7 @@ static char *apply_command(const char *command, const char 
*arg)
return result;
 }
 
-static void apply_item_command(struct trailer_item *in_tok, struct 
trailer_item *arg_tok)
+static void apply_item_command(struct trailer_item *in_tok, struct arg_item 
*arg_tok)
 {
if (arg_tok->conf.command) {
const char *arg;
@@ -218,13 +243,13 @@ static void apply_item_command(struct trailer_item 
*in_tok, struct trailer_item
 }
 
 static void apply_arg_if_exists(struct trailer_item *in_tok,
-   struct trailer_item *arg_tok,
+   struct arg_item *arg_tok,
struct trailer_item *on_tok,
struct list_head *head)
 {
switch (arg_tok->conf.if_exists) {
case EXISTS_DO_NOTHING:
-   free_trailer_item(arg_tok);
+   free_arg_item(arg_tok);
break;
case EXISTS_REPLACE:
apply_item_command(in_tok, arg_tok);
@@ -241,39 +266,41 @@ static void apply_arg_if_exists(struct trailer_item 
*in_tok,
if (check_if_different(in_tok, arg_tok, 1, head))
add_arg_to_input_list(on_tok, arg_tok);
 

[PATCH v3 2/6] trailer: use list.h for doubly-linked list

2016-10-14 Thread Jonathan Tan
Replace the existing handwritten implementation of a doubly-linked list
in trailer.c with the functions and macros from list.h. This
significantly simplifies the code.

Signed-off-by: Jonathan Tan <jonathanta...@google.com>
---
 trailer.c | 258 ++
 1 file changed, 91 insertions(+), 167 deletions(-)

diff --git a/trailer.c b/trailer.c
index 1f191b2..0afa240 100644
--- a/trailer.c
+++ b/trailer.c
@@ -4,6 +4,7 @@
 #include "commit.h"
 #include "tempfile.h"
 #include "trailer.h"
+#include "list.h"
 /*
  * Copyright (c) 2013, 2014 Christian Couder <chrisc...@tuxfamily.org>
  */
@@ -25,19 +26,24 @@ struct conf_info {
 static struct conf_info default_conf_info;
 
 struct trailer_item {
-   struct trailer_item *previous;
-   struct trailer_item *next;
+   struct list_head list;
char *token;
char *value;
struct conf_info conf;
 };
 
-static struct trailer_item *first_conf_item;
+LIST_HEAD(conf_head);
 
 static char *separators = ":";
 
 #define TRAILER_ARG_STRING "$ARG"
 
+/* Iterate over the elements of the list. */
+#define list_for_each_dir(pos, head, is_reverse) \
+   for (pos = is_reverse ? (head)->prev : (head)->next; \
+   pos != (head); \
+   pos = is_reverse ? pos->prev : pos->next)
+
 static int after_or_end(enum action_where where)
 {
return (where == WHERE_AFTER) || (where == WHERE_END);
@@ -120,101 +126,49 @@ static void print_tok_val(FILE *outfile, const char 
*tok, const char *val)
fprintf(outfile, "%s%c %s\n", tok, separators[0], val);
 }
 
-static void print_all(FILE *outfile, struct trailer_item *first, int 
trim_empty)
+static void print_all(FILE *outfile, struct list_head *head, int trim_empty)
 {
+   struct list_head *pos;
struct trailer_item *item;
-   for (item = first; item; item = item->next) {
+   list_for_each(pos, head) {
+   item = list_entry(pos, struct trailer_item, list);
if (!trim_empty || strlen(item->value) > 0)
print_tok_val(outfile, item->token, item->value);
}
 }
 
-static void update_last(struct trailer_item **last)
-{
-   if (*last)
-   while ((*last)->next != NULL)
-   *last = (*last)->next;
-}
-
-static void update_first(struct trailer_item **first)
-{
-   if (*first)
-   while ((*first)->previous != NULL)
-   *first = (*first)->previous;
-}
-
 static void add_arg_to_input_list(struct trailer_item *on_tok,
- struct trailer_item *arg_tok,
- struct trailer_item **first,
- struct trailer_item **last)
-{
-   if (after_or_end(arg_tok->conf.where)) {
-   arg_tok->next = on_tok->next;
-   on_tok->next = arg_tok;
-   arg_tok->previous = on_tok;
-   if (arg_tok->next)
-   arg_tok->next->previous = arg_tok;
-   update_last(last);
-   } else {
-   arg_tok->previous = on_tok->previous;
-   on_tok->previous = arg_tok;
-   arg_tok->next = on_tok;
-   if (arg_tok->previous)
-   arg_tok->previous->next = arg_tok;
-   update_first(first);
-   }
+ struct trailer_item *arg_tok)
+{
+   if (after_or_end(arg_tok->conf.where))
+   list_add(_tok->list, _tok->list);
+   else
+   list_add_tail(_tok->list, _tok->list);
 }
 
 static int check_if_different(struct trailer_item *in_tok,
  struct trailer_item *arg_tok,
- int check_all)
+ int check_all,
+ struct list_head *head)
 {
enum action_where where = arg_tok->conf.where;
+   struct list_head *next_head;
do {
-   if (!in_tok)
-   return 1;
if (same_trailer(in_tok, arg_tok))
return 0;
/*
 * if we want to add a trailer after another one,
 * we have to check those before this one
 */
-   in_tok = after_or_end(where) ? in_tok->previous : in_tok->next;
+   next_head = after_or_end(where) ? in_tok->list.prev
+   : in_tok->list.next;
+   if (next_head == head)
+   break;
+   in_tok = list_entry(next_head, struct trailer_item, list);
} while (check_all);
return 1;
 }
 
-static void remove_from_list(struct trailer_item *item,
-   

[PATCH v3 3/6] trailer: streamline trailer item create and add

2016-10-14 Thread Jonathan Tan
Currently, creation and addition (to a list) of trailer items are spread
across multiple functions. Streamline this by only having 2 functions:
one to parse the user-supplied string, and one to add the parsed
information to a list.

Signed-off-by: Jonathan Tan <jonathanta...@google.com>
---
 trailer.c | 130 +-
 1 file changed, 60 insertions(+), 70 deletions(-)

diff --git a/trailer.c b/trailer.c
index 0afa240..54cc930 100644
--- a/trailer.c
+++ b/trailer.c
@@ -500,10 +500,31 @@ static int git_trailer_config(const char *conf_key, const 
char *value, void *cb)
return 0;
 }
 
-static int parse_trailer(struct strbuf *tok, struct strbuf *val, const char 
*trailer)
+static const char *token_from_item(struct trailer_item *item, char *tok)
+{
+   if (item->conf.key)
+   return item->conf.key;
+   if (tok)
+   return tok;
+   return item->conf.name;
+}
+
+static int token_matches_item(const char *tok, struct trailer_item *item, int 
tok_len)
+{
+   if (!strncasecmp(tok, item->conf.name, tok_len))
+   return 1;
+   return item->conf.key ? !strncasecmp(tok, item->conf.key, tok_len) : 0;
+}
+
+static int parse_trailer(struct strbuf *tok, struct strbuf *val,
+const struct conf_info **conf, const char *trailer)
 {
size_t len;
struct strbuf seps = STRBUF_INIT;
+   struct trailer_item *item;
+   int tok_len;
+   struct list_head *pos;
+
strbuf_addstr(, separators);
strbuf_addch(, '=');
len = strcspn(trailer, seps.buf);
@@ -523,74 +544,31 @@ static int parse_trailer(struct strbuf *tok, struct 
strbuf *val, const char *tra
strbuf_addstr(tok, trailer);
strbuf_trim(tok);
}
-   return 0;
-}
-
-static const char *token_from_item(struct trailer_item *item, char *tok)
-{
-   if (item->conf.key)
-   return item->conf.key;
-   if (tok)
-   return tok;
-   return item->conf.name;
-}
-
-static struct trailer_item *new_trailer_item(struct trailer_item *conf_item,
-char *tok, char *val)
-{
-   struct trailer_item *new = xcalloc(sizeof(*new), 1);
-   new->value = val ? val : xstrdup("");
-
-   if (conf_item) {
-   duplicate_conf(>conf, _item->conf);
-   new->token = xstrdup(token_from_item(conf_item, tok));
-   free(tok);
-   } else {
-   duplicate_conf(>conf, _conf_info);
-   new->token = tok;
-   }
-
-   return new;
-}
-
-static int token_matches_item(const char *tok, struct trailer_item *item, int 
tok_len)
-{
-   if (!strncasecmp(tok, item->conf.name, tok_len))
-   return 1;
-   return item->conf.key ? !strncasecmp(tok, item->conf.key, tok_len) : 0;
-}
-
-static struct trailer_item *create_trailer_item(const char *string)
-{
-   struct strbuf tok = STRBUF_INIT;
-   struct strbuf val = STRBUF_INIT;
-   struct trailer_item *item;
-   int tok_len;
-   struct list_head *pos;
-
-   if (parse_trailer(, , string))
-   return NULL;
-
-   tok_len = token_len_without_separator(tok.buf, tok.len);
 
/* Lookup if the token matches something in the config */
+   tok_len = token_len_without_separator(tok->buf, tok->len);
+   *conf = _conf_info;
list_for_each(pos, _head) {
item = list_entry(pos, struct trailer_item, list);
-   if (token_matches_item(tok.buf, item, tok_len))
-   return new_trailer_item(item,
-   strbuf_detach(, NULL),
-   strbuf_detach(, NULL));
+   if (token_matches_item(tok->buf, item, tok_len)) {
+   char *tok_buf = strbuf_detach(tok, NULL);
+   *conf = >conf;
+   strbuf_addstr(tok, token_from_item(item, tok_buf));
+   free(tok_buf);
+   break;
+   }
}
 
-   return new_trailer_item(NULL,
-   strbuf_detach(, NULL),
-   strbuf_detach(, NULL));
+   return 0;
 }
 
-static void add_trailer_item(struct list_head *head, struct trailer_item *new)
+static void add_trailer_item(struct list_head *head, char *tok, char *val,
+const struct conf_info *conf)
 {
-   if (!new)
-   return;
+   struct trailer_item *new = xcalloc(sizeof(*new), 1);
+   new->token = tok;
+   new->value = val;
+   duplicate_conf(>conf, conf);
list_add_tail(>list, head);
 }
 
@@ -599,21 +577,28 @@ static void process_command_line_args(struct list_head 
*arg_head,
 {
struct string_list_item *

[PATCH v3 1/6] trailer: improve const correctness

2016-10-14 Thread Jonathan Tan
Change "const char *" to "char *" in struct trailer_item and in the
return value of apply_command (since those strings are owned strings).

Change "struct conf_info *" to "const struct conf_info *" (since that
struct is not modified).

Signed-off-by: Jonathan Tan <jonathanta...@google.com>
---
 trailer.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/trailer.c b/trailer.c
index c6ea9ac..1f191b2 100644
--- a/trailer.c
+++ b/trailer.c
@@ -27,8 +27,8 @@ static struct conf_info default_conf_info;
 struct trailer_item {
struct trailer_item *previous;
struct trailer_item *next;
-   const char *token;
-   const char *value;
+   char *token;
+   char *value;
struct conf_info conf;
 };
 
@@ -95,8 +95,8 @@ static void free_trailer_item(struct trailer_item *item)
free(item->conf.name);
free(item->conf.key);
free(item->conf.command);
-   free((char *)item->token);
-   free((char *)item->value);
+   free(item->token);
+   free(item->value);
free(item);
 }
 
@@ -215,13 +215,13 @@ static struct trailer_item *remove_first(struct 
trailer_item **first)
return item;
 }
 
-static const char *apply_command(const char *command, const char *arg)
+static char *apply_command(const char *command, const char *arg)
 {
struct strbuf cmd = STRBUF_INIT;
struct strbuf buf = STRBUF_INIT;
struct child_process cp = CHILD_PROCESS_INIT;
const char *argv[] = {NULL, NULL};
-   const char *result;
+   char *result;
 
strbuf_addstr(, command);
if (arg)
@@ -425,7 +425,7 @@ static int set_if_missing(struct conf_info *item, const 
char *value)
return 0;
 }
 
-static void duplicate_conf(struct conf_info *dst, struct conf_info *src)
+static void duplicate_conf(struct conf_info *dst, const struct conf_info *src)
 {
*dst = *src;
if (src->name)
-- 
2.8.0.rc3.226.g39d4020



[PATCH v3 0/6] allow non-trailers and multiple-line trailers

2016-10-14 Thread Jonathan Tan
Ah, I knew I forgot something. These are exactly the same as v2, except
that these are signed off.

Jonathan Tan (6):
  trailer: improve const correctness
  trailer: use list.h for doubly-linked list
  trailer: streamline trailer item create and add
  trailer: make args have their own struct
  trailer: allow non-trailers in trailer block
  trailer: support values folded to multiple lines

 Documentation/git-interpret-trailers.txt |  10 +-
 t/t7513-interpret-trailers.sh| 174 ++
 trailer.c| 538 +++
 3 files changed, 444 insertions(+), 278 deletions(-)

-- 
2.8.0.rc3.226.g39d4020



[PATCH v2 2/6] trailer: use list.h for doubly-linked list

2016-10-12 Thread Jonathan Tan
Replace the existing handwritten implementation of a doubly-linked list
in trailer.c with the functions and macros from list.h. This
significantly simplifies the code.
---
 trailer.c | 258 ++
 1 file changed, 91 insertions(+), 167 deletions(-)

diff --git a/trailer.c b/trailer.c
index 1f191b2..0afa240 100644
--- a/trailer.c
+++ b/trailer.c
@@ -4,6 +4,7 @@
 #include "commit.h"
 #include "tempfile.h"
 #include "trailer.h"
+#include "list.h"
 /*
  * Copyright (c) 2013, 2014 Christian Couder 
  */
@@ -25,19 +26,24 @@ struct conf_info {
 static struct conf_info default_conf_info;
 
 struct trailer_item {
-   struct trailer_item *previous;
-   struct trailer_item *next;
+   struct list_head list;
char *token;
char *value;
struct conf_info conf;
 };
 
-static struct trailer_item *first_conf_item;
+LIST_HEAD(conf_head);
 
 static char *separators = ":";
 
 #define TRAILER_ARG_STRING "$ARG"
 
+/* Iterate over the elements of the list. */
+#define list_for_each_dir(pos, head, is_reverse) \
+   for (pos = is_reverse ? (head)->prev : (head)->next; \
+   pos != (head); \
+   pos = is_reverse ? pos->prev : pos->next)
+
 static int after_or_end(enum action_where where)
 {
return (where == WHERE_AFTER) || (where == WHERE_END);
@@ -120,101 +126,49 @@ static void print_tok_val(FILE *outfile, const char 
*tok, const char *val)
fprintf(outfile, "%s%c %s\n", tok, separators[0], val);
 }
 
-static void print_all(FILE *outfile, struct trailer_item *first, int 
trim_empty)
+static void print_all(FILE *outfile, struct list_head *head, int trim_empty)
 {
+   struct list_head *pos;
struct trailer_item *item;
-   for (item = first; item; item = item->next) {
+   list_for_each(pos, head) {
+   item = list_entry(pos, struct trailer_item, list);
if (!trim_empty || strlen(item->value) > 0)
print_tok_val(outfile, item->token, item->value);
}
 }
 
-static void update_last(struct trailer_item **last)
-{
-   if (*last)
-   while ((*last)->next != NULL)
-   *last = (*last)->next;
-}
-
-static void update_first(struct trailer_item **first)
-{
-   if (*first)
-   while ((*first)->previous != NULL)
-   *first = (*first)->previous;
-}
-
 static void add_arg_to_input_list(struct trailer_item *on_tok,
- struct trailer_item *arg_tok,
- struct trailer_item **first,
- struct trailer_item **last)
-{
-   if (after_or_end(arg_tok->conf.where)) {
-   arg_tok->next = on_tok->next;
-   on_tok->next = arg_tok;
-   arg_tok->previous = on_tok;
-   if (arg_tok->next)
-   arg_tok->next->previous = arg_tok;
-   update_last(last);
-   } else {
-   arg_tok->previous = on_tok->previous;
-   on_tok->previous = arg_tok;
-   arg_tok->next = on_tok;
-   if (arg_tok->previous)
-   arg_tok->previous->next = arg_tok;
-   update_first(first);
-   }
+ struct trailer_item *arg_tok)
+{
+   if (after_or_end(arg_tok->conf.where))
+   list_add(_tok->list, _tok->list);
+   else
+   list_add_tail(_tok->list, _tok->list);
 }
 
 static int check_if_different(struct trailer_item *in_tok,
  struct trailer_item *arg_tok,
- int check_all)
+ int check_all,
+ struct list_head *head)
 {
enum action_where where = arg_tok->conf.where;
+   struct list_head *next_head;
do {
-   if (!in_tok)
-   return 1;
if (same_trailer(in_tok, arg_tok))
return 0;
/*
 * if we want to add a trailer after another one,
 * we have to check those before this one
 */
-   in_tok = after_or_end(where) ? in_tok->previous : in_tok->next;
+   next_head = after_or_end(where) ? in_tok->list.prev
+   : in_tok->list.next;
+   if (next_head == head)
+   break;
+   in_tok = list_entry(next_head, struct trailer_item, list);
} while (check_all);
return 1;
 }
 
-static void remove_from_list(struct trailer_item *item,
-struct trailer_item **first,
-struct trailer_item **last)
-{
-   struct trailer_item *next = item->next;
-   struct trailer_item *previous = item->previous;
-
-   if (next) {
-   item->next->previous = 

[PATCH v2 5/6] trailer: allow non-trailers in trailer block

2016-10-12 Thread Jonathan Tan
Currently, interpret-trailers requires all lines of a trailer block to
be trailers (or comments) - if not it would not identify that block as a
trailer block, and thus create its own trailer block, inserting a blank
line.  For example:

  echo -e "\na: b\nnot trailer" |
  git interpret-trailers --trailer "c: d"

would result in:

  a: b
  not trailer

  c: d

Relax the definition of a trailer block to only require 1 trailer, so
that trailers can be directly added to such blocks, resulting in:

  a: b
  not trailer
  c: d

This allows arbitrary lines to be included in trailer blocks, like those
in [1], and still allow interpret-trailers to be used.

This change also makes comments in the trailer block be treated as any
other non-trailer line, preserving them in the output of
interpret-trailers.

[1]
https://kernel.googlesource.com/pub/scm/linux/kernel/git/stable/linux-stable/+/e7d316a02f683864a12389f8808570e37fb90aa3
---
 Documentation/git-interpret-trailers.txt |  3 +-
 t/t7513-interpret-trailers.sh| 35 +++
 trailer.c| 77 ++--
 3 files changed, 90 insertions(+), 25 deletions(-)

diff --git a/Documentation/git-interpret-trailers.txt 
b/Documentation/git-interpret-trailers.txt
index 93d1db6..c480da6 100644
--- a/Documentation/git-interpret-trailers.txt
+++ b/Documentation/git-interpret-trailers.txt
@@ -48,7 +48,8 @@ with only spaces at the end of the commit message part, one 
blank line
 will be added before the new trailer.
 
 Existing trailers are extracted from the input message by looking for
-a group of one or more lines that contain a colon (by default), where
+a group of one or more lines in which at least one line contains a 
+colon (by default), where
 the group is preceded by one or more empty (or whitespace-only) lines.
 The group must either be at the end of the message or be the last
 non-whitespace lines before a line that starts with '---'. Such three
diff --git a/t/t7513-interpret-trailers.sh b/t/t7513-interpret-trailers.sh
index aee785c..7f5cd2a 100755
--- a/t/t7513-interpret-trailers.sh
+++ b/t/t7513-interpret-trailers.sh
@@ -126,6 +126,37 @@ test_expect_success 'with multiline title in the message' '
test_cmp expected actual
 '
 
+test_expect_success 'with non-trailer lines mixed with trailer lines' '
+   cat >patch <<-\EOF &&
+
+   this: is a trailer
+   this is not a trailer
+   EOF
+   cat >expected <<-\EOF &&
+
+   this: is a trailer
+   this is not a trailer
+   token: value
+   EOF
+   git interpret-trailers --trailer "token: value" patch >actual &&
+   test_cmp expected actual
+'
+
+test_expect_success 'with non-trailer lines only' '
+   cat >patch <<-\EOF &&
+
+   this is not a trailer
+   EOF
+   cat >expected <<-\EOF &&
+
+   this is not a trailer
+
+   token: value
+   EOF
+   git interpret-trailers --trailer "token: value" patch >actual &&
+   test_cmp expected actual
+'
+
 test_expect_success 'with config setup' '
git config trailer.ack.key "Acked-by: " &&
cat >expected <<-\EOF &&
@@ -257,6 +288,8 @@ test_expect_success 'with message that has comments' '
cat >>expected <<-\EOF &&
# comment
 
+   # other comment
+   # yet another comment
Reviewed-by: Johan
Cc: Peff
# last comment
@@ -286,6 +319,8 @@ test_expect_success 'with message that has an old style 
conflict block' '
cat >>expected <<-\EOF &&
# comment
 
+   # other comment
+   # yet another comment
Reviewed-by: Johan
Cc: Peff
# last comment
diff --git a/trailer.c b/trailer.c
index a9ed3f8..d6dfc7a 100644
--- a/trailer.c
+++ b/trailer.c
@@ -27,6 +27,10 @@ static struct conf_info default_conf_info;
 
 struct trailer_item {
struct list_head list;
+   /*
+* If this is not a trailer line, the line is stored in value
+* (excluding the terminating newline) and token is NULL.
+*/
char *token;
char *value;
 };
@@ -70,9 +74,14 @@ static size_t token_len_without_separator(const char *token, 
size_t len)
 
 static int same_token(struct trailer_item *a, struct arg_item *b)
 {
-   size_t a_len = token_len_without_separator(a->token, strlen(a->token));
-   size_t b_len = token_len_without_separator(b->token, strlen(b->token));
-   size_t min_len = (a_len > b_len) ? b_len : a_len;
+   size_t a_len, b_len, min_len;
+
+   if (!a->token)
+   return 0;
+
+   a_len = token_len_without_separator(a->token, strlen(a->token));
+   b_len = token_len_without_separator(b->token, strlen(b->token));
+   min_len = (a_len > b_len) ? b_len : a_len;
 
return !strncasecmp(a->token, b->token, min_len);
 }
@@ -130,7 

[PATCH v2 4/6] trailer: make args have their own struct

2016-10-12 Thread Jonathan Tan
Improve type safety by making arguments (whether from configuration or
from the command line) have their own "struct arg_item" type, separate
from the "struct trailer_item" type used to represent the trailers in
the buffer being manipulated.

This change also prepares "struct trailer_item" to be further
differentiated from "struct arg_item" in future patches.
---
 trailer.c | 135 +++---
 1 file changed, 85 insertions(+), 50 deletions(-)

diff --git a/trailer.c b/trailer.c
index 54cc930..a9ed3f8 100644
--- a/trailer.c
+++ b/trailer.c
@@ -29,6 +29,12 @@ struct trailer_item {
struct list_head list;
char *token;
char *value;
+};
+
+struct arg_item {
+   struct list_head list;
+   char *token;
+   char *value;
struct conf_info conf;
 };
 
@@ -62,7 +68,7 @@ static size_t token_len_without_separator(const char *token, 
size_t len)
return len;
 }
 
-static int same_token(struct trailer_item *a, struct trailer_item *b)
+static int same_token(struct trailer_item *a, struct arg_item *b)
 {
size_t a_len = token_len_without_separator(a->token, strlen(a->token));
size_t b_len = token_len_without_separator(b->token, strlen(b->token));
@@ -71,12 +77,12 @@ static int same_token(struct trailer_item *a, struct 
trailer_item *b)
return !strncasecmp(a->token, b->token, min_len);
 }
 
-static int same_value(struct trailer_item *a, struct trailer_item *b)
+static int same_value(struct trailer_item *a, struct arg_item *b)
 {
return !strcasecmp(a->value, b->value);
 }
 
-static int same_trailer(struct trailer_item *a, struct trailer_item *b)
+static int same_trailer(struct trailer_item *a, struct arg_item *b)
 {
return same_token(a, b) && same_value(a, b);
 }
@@ -98,6 +104,13 @@ static inline void strbuf_replace(struct strbuf *sb, const 
char *a, const char *
 
 static void free_trailer_item(struct trailer_item *item)
 {
+   free(item->token);
+   free(item->value);
+   free(item);
+}
+
+static void free_arg_item(struct arg_item *item)
+{
free(item->conf.name);
free(item->conf.key);
free(item->conf.command);
@@ -137,17 +150,29 @@ static void print_all(FILE *outfile, struct list_head 
*head, int trim_empty)
}
 }
 
+static struct trailer_item *trailer_from_arg(struct arg_item *arg_tok)
+{
+   struct trailer_item *new = xcalloc(sizeof(*new), 1);
+   new->token = arg_tok->token;
+   new->value = arg_tok->value;
+   arg_tok->token = arg_tok->value = NULL;
+   free_arg_item(arg_tok);
+   return new;
+}
+
 static void add_arg_to_input_list(struct trailer_item *on_tok,
- struct trailer_item *arg_tok)
+ struct arg_item *arg_tok)
 {
-   if (after_or_end(arg_tok->conf.where))
-   list_add(_tok->list, _tok->list);
+   int aoe = after_or_end(arg_tok->conf.where);
+   struct trailer_item *to_add = trailer_from_arg(arg_tok);
+   if (aoe)
+   list_add(_add->list, _tok->list);
else
-   list_add_tail(_tok->list, _tok->list);
+   list_add_tail(_add->list, _tok->list);
 }
 
 static int check_if_different(struct trailer_item *in_tok,
- struct trailer_item *arg_tok,
+ struct arg_item *arg_tok,
  int check_all,
  struct list_head *head)
 {
@@ -200,7 +225,7 @@ static char *apply_command(const char *command, const char 
*arg)
return result;
 }
 
-static void apply_item_command(struct trailer_item *in_tok, struct 
trailer_item *arg_tok)
+static void apply_item_command(struct trailer_item *in_tok, struct arg_item 
*arg_tok)
 {
if (arg_tok->conf.command) {
const char *arg;
@@ -218,13 +243,13 @@ static void apply_item_command(struct trailer_item 
*in_tok, struct trailer_item
 }
 
 static void apply_arg_if_exists(struct trailer_item *in_tok,
-   struct trailer_item *arg_tok,
+   struct arg_item *arg_tok,
struct trailer_item *on_tok,
struct list_head *head)
 {
switch (arg_tok->conf.if_exists) {
case EXISTS_DO_NOTHING:
-   free_trailer_item(arg_tok);
+   free_arg_item(arg_tok);
break;
case EXISTS_REPLACE:
apply_item_command(in_tok, arg_tok);
@@ -241,39 +266,41 @@ static void apply_arg_if_exists(struct trailer_item 
*in_tok,
if (check_if_different(in_tok, arg_tok, 1, head))
add_arg_to_input_list(on_tok, arg_tok);
else
-   free_trailer_item(arg_tok);
+   free_arg_item(arg_tok);
break;
case EXISTS_ADD_IF_DIFFERENT_NEIGHBOR:

[PATCH v2 3/6] trailer: streamline trailer item create and add

2016-10-12 Thread Jonathan Tan
Currently, creation and addition (to a list) of trailer items are spread
across multiple functions. Streamline this by only having 2 functions:
one to parse the user-supplied string, and one to add the parsed
information to a list.
---
 trailer.c | 130 +-
 1 file changed, 60 insertions(+), 70 deletions(-)

diff --git a/trailer.c b/trailer.c
index 0afa240..54cc930 100644
--- a/trailer.c
+++ b/trailer.c
@@ -500,10 +500,31 @@ static int git_trailer_config(const char *conf_key, const 
char *value, void *cb)
return 0;
 }
 
-static int parse_trailer(struct strbuf *tok, struct strbuf *val, const char 
*trailer)
+static const char *token_from_item(struct trailer_item *item, char *tok)
+{
+   if (item->conf.key)
+   return item->conf.key;
+   if (tok)
+   return tok;
+   return item->conf.name;
+}
+
+static int token_matches_item(const char *tok, struct trailer_item *item, int 
tok_len)
+{
+   if (!strncasecmp(tok, item->conf.name, tok_len))
+   return 1;
+   return item->conf.key ? !strncasecmp(tok, item->conf.key, tok_len) : 0;
+}
+
+static int parse_trailer(struct strbuf *tok, struct strbuf *val,
+const struct conf_info **conf, const char *trailer)
 {
size_t len;
struct strbuf seps = STRBUF_INIT;
+   struct trailer_item *item;
+   int tok_len;
+   struct list_head *pos;
+
strbuf_addstr(, separators);
strbuf_addch(, '=');
len = strcspn(trailer, seps.buf);
@@ -523,74 +544,31 @@ static int parse_trailer(struct strbuf *tok, struct 
strbuf *val, const char *tra
strbuf_addstr(tok, trailer);
strbuf_trim(tok);
}
-   return 0;
-}
-
-static const char *token_from_item(struct trailer_item *item, char *tok)
-{
-   if (item->conf.key)
-   return item->conf.key;
-   if (tok)
-   return tok;
-   return item->conf.name;
-}
-
-static struct trailer_item *new_trailer_item(struct trailer_item *conf_item,
-char *tok, char *val)
-{
-   struct trailer_item *new = xcalloc(sizeof(*new), 1);
-   new->value = val ? val : xstrdup("");
-
-   if (conf_item) {
-   duplicate_conf(>conf, _item->conf);
-   new->token = xstrdup(token_from_item(conf_item, tok));
-   free(tok);
-   } else {
-   duplicate_conf(>conf, _conf_info);
-   new->token = tok;
-   }
-
-   return new;
-}
-
-static int token_matches_item(const char *tok, struct trailer_item *item, int 
tok_len)
-{
-   if (!strncasecmp(tok, item->conf.name, tok_len))
-   return 1;
-   return item->conf.key ? !strncasecmp(tok, item->conf.key, tok_len) : 0;
-}
-
-static struct trailer_item *create_trailer_item(const char *string)
-{
-   struct strbuf tok = STRBUF_INIT;
-   struct strbuf val = STRBUF_INIT;
-   struct trailer_item *item;
-   int tok_len;
-   struct list_head *pos;
-
-   if (parse_trailer(, , string))
-   return NULL;
-
-   tok_len = token_len_without_separator(tok.buf, tok.len);
 
/* Lookup if the token matches something in the config */
+   tok_len = token_len_without_separator(tok->buf, tok->len);
+   *conf = _conf_info;
list_for_each(pos, _head) {
item = list_entry(pos, struct trailer_item, list);
-   if (token_matches_item(tok.buf, item, tok_len))
-   return new_trailer_item(item,
-   strbuf_detach(, NULL),
-   strbuf_detach(, NULL));
+   if (token_matches_item(tok->buf, item, tok_len)) {
+   char *tok_buf = strbuf_detach(tok, NULL);
+   *conf = >conf;
+   strbuf_addstr(tok, token_from_item(item, tok_buf));
+   free(tok_buf);
+   break;
+   }
}
 
-   return new_trailer_item(NULL,
-   strbuf_detach(, NULL),
-   strbuf_detach(, NULL));
+   return 0;
 }
 
-static void add_trailer_item(struct list_head *head, struct trailer_item *new)
+static void add_trailer_item(struct list_head *head, char *tok, char *val,
+const struct conf_info *conf)
 {
-   if (!new)
-   return;
+   struct trailer_item *new = xcalloc(sizeof(*new), 1);
+   new->token = tok;
+   new->value = val;
+   duplicate_conf(>conf, conf);
list_add_tail(>list, head);
 }
 
@@ -599,21 +577,28 @@ static void process_command_line_args(struct list_head 
*arg_head,
 {
struct string_list_item *tr;
struct trailer_item *item;
+   struct strbuf tok = STRBUF_INIT;
+   struct strbuf val = STRBUF_INIT;
+   const struct conf_info 

[PATCH v2 6/6] trailer: support values folded to multiple lines

2016-10-12 Thread Jonathan Tan
Currently, interpret-trailers requires that a trailer be only on 1 line.
For example:

  a: first line
 second line

would be interpreted as one trailer line followed by one non-trailer line.

Make interpret-trailers support RFC 822-style folding, treating those
lines as one logical trailer.
---
 Documentation/git-interpret-trailers.txt |   7 +-
 t/t7513-interpret-trailers.sh| 139 +++
 trailer.c|  22 +++--
 3 files changed, 160 insertions(+), 8 deletions(-)

diff --git a/Documentation/git-interpret-trailers.txt 
b/Documentation/git-interpret-trailers.txt
index c480da6..cfec636 100644
--- a/Documentation/git-interpret-trailers.txt
+++ b/Documentation/git-interpret-trailers.txt
@@ -57,11 +57,12 @@ minus signs start the patch part of the message.
 
 When reading trailers, there can be whitespaces before and after the
 token, the separator and the value. There can also be whitespaces
-inside the token and the value.
+inside the token and the value. The value may be split over multiple lines with
+each subsequent line starting with whitespace, like the "folding" in RFC 822.
 
 Note that 'trailers' do not follow and are not intended to follow many
-rules for RFC 822 headers. For example they do not follow the line
-folding rules, the encoding rules and probably many other rules.
+rules for RFC 822 headers. For example they do not follow
+the encoding rules and probably many other rules.
 
 OPTIONS
 ---
diff --git a/t/t7513-interpret-trailers.sh b/t/t7513-interpret-trailers.sh
index 7f5cd2a..31db699 100755
--- a/t/t7513-interpret-trailers.sh
+++ b/t/t7513-interpret-trailers.sh
@@ -157,6 +157,145 @@ test_expect_success 'with non-trailer lines only' '
test_cmp expected actual
 '
 
+test_expect_success 'multiline field treated as atomic for placement' '
+   q_to_tab >patch <<-\EOF &&
+
+   another: trailer
+   name: value on
+   Qmultiple lines
+   another: trailer
+   EOF
+   q_to_tab >expected <<-\EOF &&
+
+   another: trailer
+   name: value on
+   Qmultiple lines
+   name: value
+   another: trailer
+   EOF
+   test_config trailer.name.where after &&
+   git interpret-trailers --trailer "name: value" patch >actual &&
+   test_cmp expected actual
+'
+
+test_expect_success 'multiline field treated as atomic for replacement' '
+   q_to_tab >patch <<-\EOF &&
+
+   another: trailer
+   name: value on
+   Qmultiple lines
+   another: trailer
+   EOF
+   q_to_tab >expected <<-\EOF &&
+
+   another: trailer
+   another: trailer
+   name: value
+   EOF
+   test_config trailer.name.ifexists replace &&
+   git interpret-trailers --trailer "name: value" patch >actual &&
+   test_cmp expected actual
+'
+
+test_expect_success 'multiline field treated as atomic for difference check' '
+   q_to_tab >patch <<-\EOF &&
+
+   another: trailer
+   name: first line
+   Qsecond line
+   another: trailer
+   EOF
+   test_config trailer.name.ifexists addIfDifferent &&
+
+   q_to_tab >trailer <<-\EOF &&
+   name: first line
+   Qsecond line
+   EOF
+   q_to_tab >expected <<-\EOF &&
+
+   another: trailer
+   name: first line
+   Qsecond line
+   another: trailer
+   EOF
+   git interpret-trailers --trailer "$(cat trailer)" patch >actual &&
+   test_cmp expected actual &&
+
+   q_to_tab >trailer <<-\EOF &&
+   name: first line
+   Qsecond line
+   EOF
+   q_to_tab >expected <<-\EOF &&
+
+   another: trailer
+   name: first line
+   Qsecond line
+   another: trailer
+   name: first line
+   Qsecond line
+   EOF
+   git interpret-trailers --trailer "$(cat trailer)" patch >actual &&
+   test_cmp expected actual &&
+
+   q_to_tab >trailer <<-\EOF &&
+   name: first line *DIFFERENT*
+   Qsecond line
+   EOF
+   q_to_tab >expected <<-\EOF &&
+
+   another: trailer
+   name: first line
+   Qsecond line
+   another: trailer
+   name: first line *DIFFERENT*
+   Qsecond line
+   EOF
+   git interpret-trailers --trailer "$(cat trailer)" patch >actual &&
+   test_cmp expected actual
+'
+
+test_expect_success 'multiline field treated as atomic for neighbor check' '
+   q_to_tab >patch <<-\EOF &&
+
+   another: trailer
+   name: first line
+   Qsecond line
+   another: trailer
+   EOF
+   test_config trailer.name.where after &&
+   test_config 

[PATCH v2 1/6] trailer: improve const correctness

2016-10-12 Thread Jonathan Tan
Change "const char *" to "char *" in struct trailer_item and in the
return value of apply_command (since those strings are owned strings).

Change "struct conf_info *" to "const struct conf_info *" (since that
struct is not modified).
---
 trailer.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/trailer.c b/trailer.c
index c6ea9ac..1f191b2 100644
--- a/trailer.c
+++ b/trailer.c
@@ -27,8 +27,8 @@ static struct conf_info default_conf_info;
 struct trailer_item {
struct trailer_item *previous;
struct trailer_item *next;
-   const char *token;
-   const char *value;
+   char *token;
+   char *value;
struct conf_info conf;
 };
 
@@ -95,8 +95,8 @@ static void free_trailer_item(struct trailer_item *item)
free(item->conf.name);
free(item->conf.key);
free(item->conf.command);
-   free((char *)item->token);
-   free((char *)item->value);
+   free(item->token);
+   free(item->value);
free(item);
 }
 
@@ -215,13 +215,13 @@ static struct trailer_item *remove_first(struct 
trailer_item **first)
return item;
 }
 
-static const char *apply_command(const char *command, const char *arg)
+static char *apply_command(const char *command, const char *arg)
 {
struct strbuf cmd = STRBUF_INIT;
struct strbuf buf = STRBUF_INIT;
struct child_process cp = CHILD_PROCESS_INIT;
const char *argv[] = {NULL, NULL};
-   const char *result;
+   char *result;
 
strbuf_addstr(, command);
if (arg)
@@ -425,7 +425,7 @@ static int set_if_missing(struct conf_info *item, const 
char *value)
return 0;
 }
 
-static void duplicate_conf(struct conf_info *dst, struct conf_info *src)
+static void duplicate_conf(struct conf_info *dst, const struct conf_info *src)
 {
*dst = *src;
if (src->name)
-- 
2.8.0.rc3.226.g39d4020



[PATCH v2 0/6] allow non-trailers and multiple-line trailers

2016-10-12 Thread Jonathan Tan
Thanks, Peff, for the pointer to list.h. Using list.h does simplify the
code by a similar amount to switching it to a singly-linked list, so I
have done that (replacing my earlier "trailer: use singly-linked list,
not doubly" patch). Another advantage is that I no longer need to change
the algorithm, making for a smaller patch.

(There are some quirks resulting from list.h implementing a circular
list, like needing to pass "head" as a sentinel when iterating from the
middle of the list, but those are minor, and my original singly-linked
list implementation had quirks too anyway like needing to pass a pointer
to the next pointer.)

Updates:
 (-> 1/6)
 - Added separate patch for const correctness changes
 (1/5 -> 2/6)
 - Dropped singly-linked list patch, instead replacing existing
   doubly-linked list implementation with list.h
 (5/5 -> 6/6)
 - Used "char *" instead of "struct strbuf"
 - Modified test slightly to test whitespace at beginning of line

Jonathan Tan (6):
  trailer: improve const correctness
  trailer: use list.h for doubly-linked list
  trailer: streamline trailer item create and add
  trailer: make args have their own struct
  trailer: allow non-trailers in trailer block
  trailer: support values folded to multiple lines

 Documentation/git-interpret-trailers.txt |  10 +-
 t/t7513-interpret-trailers.sh| 174 ++
 trailer.c| 538 +++
 3 files changed, 444 insertions(+), 278 deletions(-)

-- 
2.8.0.rc3.226.g39d4020



[PATCH 3/5] trailer: make args have their own struct

2016-10-11 Thread Jonathan Tan
Improve type safety by making arguments (whether from configuration or
from the command line) have their own "struct arg_item" type, separate
from the "struct trailer_item" type used to represent the trailers in
the buffer being manipulated.

Also take the opportunity to refine the "struct trailer_item" definition
by removing the conf (which is used only by arguments) and by removing
const on the string fields, since those fields are owned by the struct.

This change also prepares "struct trailer_item" to be further
differentiated from "struct arg_item" in future patches.
---
 trailer.c | 161 ++
 1 file changed, 99 insertions(+), 62 deletions(-)

diff --git a/trailer.c b/trailer.c
index e8b1bfb..167b2fd 100644
--- a/trailer.c
+++ b/trailer.c
@@ -26,12 +26,18 @@ static struct conf_info default_conf_info;
 
 struct trailer_item {
struct trailer_item *next;
-   const char *token;
-   const char *value;
+   char *token;
+   char *value;
+};
+
+struct arg_item {
+   struct arg_item *next;
+   char *token;
+   char *value;
struct conf_info conf;
 };
 
-static struct trailer_item *first_conf_item;
+static struct arg_item *first_conf_item;
 
 static char *separators = ":";
 
@@ -55,8 +61,7 @@ static size_t token_len_without_separator(const char *token, 
size_t len)
return len;
 }
 
-static int same_token(const struct trailer_item *a,
- const struct trailer_item *b)
+static int same_token(const struct trailer_item *a, const struct arg_item *b)
 {
size_t a_len = token_len_without_separator(a->token, strlen(a->token));
size_t b_len = token_len_without_separator(b->token, strlen(b->token));
@@ -65,14 +70,12 @@ static int same_token(const struct trailer_item *a,
return !strncasecmp(a->token, b->token, min_len);
 }
 
-static int same_value(const struct trailer_item *a,
- const struct trailer_item *b)
+static int same_value(const struct trailer_item *a, const struct arg_item *b)
 {
return !strcasecmp(a->value, b->value);
 }
 
-static int same_trailer(const struct trailer_item *a,
-   const struct trailer_item *b)
+static int same_trailer(const struct trailer_item *a, const struct arg_item *b)
 {
return same_token(a, b) && same_value(a, b);
 }
@@ -94,11 +97,18 @@ static inline void strbuf_replace(struct strbuf *sb, const 
char *a, const char *
 
 static void free_trailer_item(struct trailer_item *item)
 {
+   free(item->token);
+   free(item->value);
+   free(item);
+}
+
+static void free_arg_item(struct arg_item *item)
+{
free(item->conf.name);
free(item->conf.key);
free(item->conf.command);
-   free((char *)item->token);
-   free((char *)item->value);
+   free(item->token);
+   free(item->value);
free(item);
 }
 
@@ -131,13 +141,13 @@ static void print_all(FILE *outfile, struct trailer_item 
*first, int trim_empty)
}
 }
 
-static const char *apply_command(const char *command, const char *arg)
+static char *apply_command(const char *command, const char *arg)
 {
struct strbuf cmd = STRBUF_INIT;
struct strbuf buf = STRBUF_INIT;
struct child_process cp = CHILD_PROCESS_INIT;
const char *argv[] = {NULL, NULL};
-   const char *result;
+   char *result;
 
strbuf_addstr(, command);
if (arg)
@@ -162,7 +172,7 @@ static const char *apply_command(const char *command, const 
char *arg)
return result;
 }
 
-static void apply_item_command(struct trailer_item *in_tok, struct 
trailer_item *arg_tok)
+static void apply_item_command(struct trailer_item *in_tok, struct arg_item 
*arg_tok)
 {
if (arg_tok->conf.command) {
const char *arg;
@@ -179,60 +189,77 @@ static void apply_item_command(struct trailer_item 
*in_tok, struct trailer_item
}
 }
 
+static struct trailer_item *trailer_from_arg(struct arg_item *arg_tok)
+{
+   struct trailer_item *new = xcalloc(sizeof(*new), 1);
+   new->token = arg_tok->token;
+   new->value = arg_tok->value;
+   arg_tok->token = arg_tok->value = NULL;
+   free_arg_item(arg_tok);
+   return new;
+}
+
 static void apply_existing_arg(struct trailer_item **found_next,
-  struct trailer_item *arg_tok,
+  struct arg_item *arg_tok,
   struct trailer_item **position_to_add,
   const struct trailer_item *in_tok_head,
   const struct trailer_item *neighbor)
 {
-   if (arg_tok->conf.if_exists == EXISTS_DO_NOTHING) {
-   free_trailer_item(arg_tok);
+   enum action_if_exists if_exists = arg_tok->conf.if_exists;
+   struct trailer_item *to_add;
+
+   if (if_exists == EXISTS_DO_NOTHING) {
+   free_arg_item(arg_tok);
return;
}
 

[PATCH 2/5] trailer: streamline trailer item create and add

2016-10-11 Thread Jonathan Tan
Currently, creation and addition (to a list) of trailer items are spread
across multiple functions. Streamline this by only having 2 functions:
one to parse the user-supplied string, and one to add the parsed
information to a list.
---
 trailer.c | 135 +-
 1 file changed, 62 insertions(+), 73 deletions(-)

diff --git a/trailer.c b/trailer.c
index bf3d7d0..e8b1bfb 100644
--- a/trailer.c
+++ b/trailer.c
@@ -335,7 +335,7 @@ static int set_if_missing(struct conf_info *item, const 
char *value)
return 0;
 }
 
-static void duplicate_conf(struct conf_info *dst, struct conf_info *src)
+static void duplicate_conf(struct conf_info *dst, const struct conf_info *src)
 {
*dst = *src;
if (src->name)
@@ -467,10 +467,30 @@ static int git_trailer_config(const char *conf_key, const 
char *value, void *cb)
return 0;
 }
 
-static int parse_trailer(struct strbuf *tok, struct strbuf *val, const char 
*trailer)
+static const char *token_from_item(struct trailer_item *item, char *tok)
+{
+   if (item->conf.key)
+   return item->conf.key;
+   if (tok)
+   return tok;
+   return item->conf.name;
+}
+
+static int token_matches_item(const char *tok, struct trailer_item *item, int 
tok_len)
+{
+   if (!strncasecmp(tok, item->conf.name, tok_len))
+   return 1;
+   return item->conf.key ? !strncasecmp(tok, item->conf.key, tok_len) : 0;
+}
+
+static int parse_trailer(struct strbuf *tok, struct strbuf *val,
+const struct conf_info **conf, const char *trailer)
 {
size_t len;
struct strbuf seps = STRBUF_INIT;
+   struct trailer_item *item;
+   int tok_len;
+
strbuf_addstr(, separators);
strbuf_addch(, '=');
len = strcspn(trailer, seps.buf);
@@ -490,73 +510,31 @@ static int parse_trailer(struct strbuf *tok, struct 
strbuf *val, const char *tra
strbuf_addstr(tok, trailer);
strbuf_trim(tok);
}
-   return 0;
-}
-
-static const char *token_from_item(struct trailer_item *item, char *tok)
-{
-   if (item->conf.key)
-   return item->conf.key;
-   if (tok)
-   return tok;
-   return item->conf.name;
-}
-
-static struct trailer_item *new_trailer_item(struct trailer_item *conf_item,
-char *tok, char *val)
-{
-   struct trailer_item *new = xcalloc(sizeof(*new), 1);
-   new->value = val ? val : xstrdup("");
-
-   if (conf_item) {
-   duplicate_conf(>conf, _item->conf);
-   new->token = xstrdup(token_from_item(conf_item, tok));
-   free(tok);
-   } else {
-   duplicate_conf(>conf, _conf_info);
-   new->token = tok;
-   }
-
-   return new;
-}
-
-static int token_matches_item(const char *tok, struct trailer_item *item, int 
tok_len)
-{
-   if (!strncasecmp(tok, item->conf.name, tok_len))
-   return 1;
-   return item->conf.key ? !strncasecmp(tok, item->conf.key, tok_len) : 0;
-}
-
-static struct trailer_item *create_trailer_item(const char *string)
-{
-   struct strbuf tok = STRBUF_INIT;
-   struct strbuf val = STRBUF_INIT;
-   struct trailer_item *item;
-   int tok_len;
-
-   if (parse_trailer(, , string))
-   return NULL;
-
-   tok_len = token_len_without_separator(tok.buf, tok.len);
 
/* Lookup if the token matches something in the config */
+   tok_len = token_len_without_separator(tok->buf, tok->len);
+   *conf = _conf_info;
for (item = first_conf_item; item; item = item->next) {
-   if (token_matches_item(tok.buf, item, tok_len))
-   return new_trailer_item(item,
-   strbuf_detach(, NULL),
-   strbuf_detach(, NULL));
+   if (token_matches_item(tok->buf, item, tok_len)) {
+   char *tok_buf = strbuf_detach(tok, NULL);
+   *conf = >conf;
+   strbuf_addstr(tok, token_from_item(item, tok_buf));
+   free(tok_buf);
+   break;
+   }
}
 
-   return new_trailer_item(NULL,
-   strbuf_detach(, NULL),
-   strbuf_detach(, NULL));
+   return 0;
 }
 
-static void add_trailer_item(struct trailer_item ***tail,
-struct trailer_item *new)
+static void add_trailer_item(struct trailer_item ***tail, char *tok, char *val,
+const struct conf_info *conf)
 {
-   if (!new)
-   return;
+   struct trailer_item *new = xcalloc(sizeof(*new), 1);
+   new->token = tok;
+   new->value = val;
+   duplicate_conf(>conf, conf);
+
**tail = new;
*tail = >next;
 }
@@ -567,19 

<    5   6   7   8   9   10   11   >