[PATCH v2 0/2] Cookie redaction during GIT_TRACE_CURL

2018-01-18 Thread Jonathan Tan
Thanks, Eric. Changes in v2:
 - documented all environment variables introduced
 - made test more clear by ensuring that no cookie keys are suffixes or
   prefixes of others
 - tested empty value

As far as I can tell, it does not seem possible that Git generates a
cookie with no equals sign (like "Secure" or "HttpOnly" described in RFC
6265). When I try to craft a cookie file containing that (using
"Set-Cookie: Foo=; HttpOnly", for example), the no-equals-sign cookie
just disappears.

Jonathan Tan (2):
  http: support cookie redaction when tracing
  http: support omitting data from traces

 Documentation/git.txt   | 10 ++
 http.c  | 82 -
 t/t5551-http-fetch-smart.sh | 33 ++
 3 files changed, 117 insertions(+), 8 deletions(-)

-- 
2.16.0.rc2.37.ge0d575025.dirty



[PATCH v2 2/2] http: support omitting data from traces

2018-01-18 Thread Jonathan Tan
GIT_TRACE_CURL provides a way to debug what is being sent and received
over HTTP, with automatic redaction of sensitive information. But it
also logs data transmissions, which significantly increases the log file
size, sometimes unnecessarily. Add an option "GIT_TRACE_CURL_NO_DATA" to
allow the user to omit such data transmissions.

Signed-off-by: Jonathan Tan <jonathanta...@google.com>
---
 Documentation/git.txt   |  4 
 http.c  | 27 +++
 t/t5551-http-fetch-smart.sh | 12 
 3 files changed, 35 insertions(+), 8 deletions(-)

diff --git a/Documentation/git.txt b/Documentation/git.txt
index 5446d2143..8163b5796 100644
--- a/Documentation/git.txt
+++ b/Documentation/git.txt
@@ -646,6 +646,10 @@ of clones and fetches.
variable.
See `GIT_TRACE` for available trace output options.
 
+`GIT_TRACE_CURL_NO_DATA`::
+   When a curl trace is enabled (see `GIT_TRACE_CURL` above), do not dump
+   data (that is, only dump info lines and headers).
+
 `GIT_REDACT_COOKIES`::
This can be set to a comma-separated list of strings. When a curl trace
is enabled (see `GIT_TRACE_CURL` above), whenever a "Cookies:" header
diff --git a/http.c b/http.c
index 088cf70bf..32a823895 100644
--- a/http.c
+++ b/http.c
@@ -16,6 +16,7 @@
 #include "string-list.h"
 
 static struct trace_key trace_curl = TRACE_KEY_INIT(CURL);
+static int trace_curl_data = 1;
 static struct string_list cookies_to_redact = STRING_LIST_INIT_DUP;
 #if LIBCURL_VERSION_NUM >= 0x070a08
 long int git_curl_ipresolve = CURL_IPRESOLVE_WHATEVER;
@@ -695,24 +696,32 @@ static int curl_trace(CURL *handle, curl_infotype type, 
char *data, size_t size,
curl_dump_header(text, (unsigned char *)data, size, DO_FILTER);
break;
case CURLINFO_DATA_OUT:
-   text = "=> Send data";
-   curl_dump_data(text, (unsigned char *)data, size);
+   if (trace_curl_data) {
+   text = "=> Send data";
+   curl_dump_data(text, (unsigned char *)data, size);
+   }
break;
case CURLINFO_SSL_DATA_OUT:
-   text = "=> Send SSL data";
-   curl_dump_data(text, (unsigned char *)data, size);
+   if (trace_curl_data) {
+   text = "=> Send SSL data";
+   curl_dump_data(text, (unsigned char *)data, size);
+   }
break;
case CURLINFO_HEADER_IN:
text = "<= Recv header";
curl_dump_header(text, (unsigned char *)data, size, NO_FILTER);
break;
case CURLINFO_DATA_IN:
-   text = "<= Recv data";
-   curl_dump_data(text, (unsigned char *)data, size);
+   if (trace_curl_data) {
+   text = "<= Recv data";
+   curl_dump_data(text, (unsigned char *)data, size);
+   }
break;
case CURLINFO_SSL_DATA_IN:
-   text = "<= Recv SSL data";
-   curl_dump_data(text, (unsigned char *)data, size);
+   if (trace_curl_data) {
+   text = "<= Recv SSL data";
+   curl_dump_data(text, (unsigned char *)data, size);
+   }
break;
 
default:/* we ignore unknown types by default */
@@ -857,6 +866,8 @@ static CURL *get_curl_handle(void)
if (getenv("GIT_CURL_VERBOSE"))
curl_easy_setopt(result, CURLOPT_VERBOSE, 1L);
setup_curl_trace(result);
+   if (getenv("GIT_TRACE_CURL_NO_DATA"))
+   trace_curl_data = 0;
if (getenv("GIT_REDACT_COOKIES")) {
string_list_split(_to_redact,
  getenv("GIT_REDACT_COOKIES"), ',', -1);
diff --git a/t/t5551-http-fetch-smart.sh b/t/t5551-http-fetch-smart.sh
index 21a5ce860..f5721b4a5 100755
--- a/t/t5551-http-fetch-smart.sh
+++ b/t/t5551-http-fetch-smart.sh
@@ -385,5 +385,17 @@ test_expect_success 'GIT_REDACT_COOKIES handles empty 
values' '
grep "Cookie:.*Foo=" err
 '
 
+test_expect_success 'GIT_TRACE_CURL_NO_DATA prevents data from being traced' '
+   rm -rf clone &&
+   GIT_TRACE_CURL=true \
+   git clone $HTTPD_URL/smart/repo.git clone 2>err &&
+   grep "=> Send data" err &&
+
+   rm -rf clone &&
+   GIT_TRACE_CURL=true GIT_TRACE_CURL_NO_DATA=1 \
+   git clone $HTTPD_URL/smart/repo.git clone 2>err &&
+   ! grep "=> Send data" err
+'
+
 stop_httpd
 test_done
-- 
2.16.0.rc2.37.ge0d575025.dirty



[PATCH v2 1/2] http: support cookie redaction when tracing

2018-01-18 Thread Jonathan Tan
When using GIT_TRACE_CURL, Git already redacts the "Authorization:" and
"Proxy-Authorization:" HTTP headers. Extend this redaction to a
user-specified list of cookies, specified through the
"GIT_REDACT_COOKIES" environment variable.

Signed-off-by: Jonathan Tan <jonathanta...@google.com>
---
 Documentation/git.txt   |  6 +
 http.c  | 55 +
 t/t5551-http-fetch-smart.sh | 21 +
 3 files changed, 82 insertions(+)

diff --git a/Documentation/git.txt b/Documentation/git.txt
index 3f4161a79..5446d2143 100644
--- a/Documentation/git.txt
+++ b/Documentation/git.txt
@@ -646,6 +646,12 @@ of clones and fetches.
variable.
See `GIT_TRACE` for available trace output options.
 
+`GIT_REDACT_COOKIES`::
+   This can be set to a comma-separated list of strings. When a curl trace
+   is enabled (see `GIT_TRACE_CURL` above), whenever a "Cookies:" header
+   sent by the client is dumped, values of cookies whose key is in that
+   list (case-sensitive) are redacted.
+
 `GIT_LITERAL_PATHSPECS`::
Setting this variable to `1` will cause Git to treat all
pathspecs literally, rather than as glob patterns. For example,
diff --git a/http.c b/http.c
index 597771271..088cf70bf 100644
--- a/http.c
+++ b/http.c
@@ -13,8 +13,10 @@
 #include "transport.h"
 #include "packfile.h"
 #include "protocol.h"
+#include "string-list.h"
 
 static struct trace_key trace_curl = TRACE_KEY_INIT(CURL);
+static struct string_list cookies_to_redact = STRING_LIST_INIT_DUP;
 #if LIBCURL_VERSION_NUM >= 0x070a08
 long int git_curl_ipresolve = CURL_IPRESOLVE_WHATEVER;
 #else
@@ -575,6 +577,54 @@ static void redact_sensitive_header(struct strbuf *header)
/* Everything else is opaque and possibly sensitive */
strbuf_setlen(header,  sensitive_header - header->buf);
strbuf_addstr(header, " ");
+   } else if (cookies_to_redact.nr &&
+  skip_prefix(header->buf, "Cookie:", _header)) {
+   struct strbuf redacted_header = STRBUF_INIT;
+   char *cookie;
+
+   while (isspace(*sensitive_header))
+   sensitive_header++;
+
+   /*
+* The contents of header starting from sensitive_header will
+* subsequently be overridden, so it is fine to mutate this
+* string (hence the assignment to "char *").
+*/
+   cookie = (char *) sensitive_header;
+
+   while (cookie) {
+   char *equals;
+   char *semicolon = strstr(cookie, "; ");
+   if (semicolon)
+   *semicolon = 0;
+   equals = strchrnul(cookie, '=');
+   if (!equals) {
+   /* invalid cookie, just append and continue */
+   strbuf_addstr(_header, cookie);
+   continue;
+   }
+   *equals = 0; /* temporarily set to NUL for lookup */
+   if (string_list_lookup(_to_redact, cookie)) {
+   strbuf_addstr(_header, cookie);
+   strbuf_addstr(_header, "=");
+   } else {
+   *equals = '=';
+   strbuf_addstr(_header, cookie);
+   }
+   if (semicolon) {
+   /*
+* There are more cookies. (Or, for some
+* reason, the input string ends in "; ".)
+*/
+   strbuf_addstr(_header, "; ");
+   cookie = semicolon + strlen("; ");
+   } else {
+   cookie = NULL;
+   }
+   }
+
+   strbuf_setlen(header, sensitive_header - header->buf);
+   strbuf_addbuf(header, _header);
}
 }
 
@@ -807,6 +857,11 @@ static CURL *get_curl_handle(void)
if (getenv("GIT_CURL_VERBOSE"))
curl_easy_setopt(result, CURLOPT_VERBOSE, 1L);
setup_curl_trace(result);
+   if (getenv("GIT_REDACT_COOKIES")) {
+   string_list_split(_to_redact,
+ getenv("GIT_REDACT_COOKIES"), ',', -1);
+   string_list_sort(_to_redact);
+   }
 
curl_easy_setopt(result, CURLOPT_USERAGENT,
user_agent ? user_agent : git_user_agent());
diff --git a/t/t5551-http-fetch-smart.sh b/t/t5551-http-fe

Re: What's cooking in git.git (Jan 2018, #02; Tue, 9)

2018-01-18 Thread Jonathan Tan
On Thu, 18 Jan 2018 09:56:50 +0100
Christian Couder  wrote:

> I am still not very happy with fetch_object() not returning anything.
> I wonder what happens when that function is used to fetch from a repo
> that cannot provide the requested object.

My idea was to save a verification step - the caller of fetch_object()
needs to reattempt the object load anyway (which includes a verification
that the object exists), so I didn't see the need to have fetch_object()
do it too.

> Also I think the "extensions.partialclone = " config option is
> not very future proof. If people start using partial clone, it is
> likely that at some point they will need their repo to talk to more
> than one remote that is partial clone enabled and I don't see how such
> a config option can scale in this case.

In the case that they want to talk to more than one
partial-clone-enabled repo, I think that there still needs to be one
"default" remote from which missing objects are fetched. I can think of
a few reasons for that - for example, (a) we need to support commands
that give a nonexistent-in-repo SHA-1 directly, and (b) existing Git
code relies on the ability to fetch an object given only its SHA-1. (a)
can be overcome by forbidding that (?) and (b) can be overcome by an
overhaul of the object-fetching and object-using code, but I don't think
both (a) and (b) will occur anytime soon.


Re: [PATCH 00/11] Some fixes and bunch of object_id conversions

2018-01-18 Thread Jonathan Tan
On Thu, 18 Jan 2018 15:50:52 +0100
Patryk Obara  wrote:

> Patch 1 is not directly related to object_id conversions but helped with
> debugging t5540, which kept failing on master for me (spoiler: it was Fedora
> fault).  It helps with debugging of failing git-push over HTTP in case of
> internal server error, so I think it might be worthwhile.

This patch seems reasonable to me - it is a little strange that,
currently, an error message is printed upon an XML error and upon
failure of start_active_slot(), but not when the slot cannot be run.
This patch fills the gap.

> Patch 2 is a small adjustment to .clang-format, which prevents unnecessary
> line breaks after function return type.

I'm not very familiar with clang-format, so no comment from me here.

> Patch 6 is a tiny fix in oidclr function.

Ah, good catch. I would write the memset line as
"memset(oid, 0, sizeof(*oid))", but the existing one is fine too.

> All other patches are progressive conversions to struct object_id with some
> formatting fixes sprinkled in. These should be somewhat uncontroversial, I 
> hope.

I've looked at those and they appear to be obviously correct.


[RFC PATCH 2/2] http: support omitting data from traces

2018-01-17 Thread Jonathan Tan
GIT_TRACE_CURL provides a way to debug what is being sent and received
over HTTP, with automatic redaction of sensitive information. But it
also logs data transmissions, which significantly increases the log file
size, sometimes unnecessarily. Add an option "GIT_TRACE_CURL_NO_DATA" to
allow the user to omit such data transmissions.

Signed-off-by: Jonathan Tan <jonathanta...@google.com>
---
 http.c  | 27 +++
 t/t5551-http-fetch-smart.sh | 12 
 2 files changed, 31 insertions(+), 8 deletions(-)

diff --git a/http.c b/http.c
index 088cf70bf..32a823895 100644
--- a/http.c
+++ b/http.c
@@ -16,6 +16,7 @@
 #include "string-list.h"
 
 static struct trace_key trace_curl = TRACE_KEY_INIT(CURL);
+static int trace_curl_data = 1;
 static struct string_list cookies_to_redact = STRING_LIST_INIT_DUP;
 #if LIBCURL_VERSION_NUM >= 0x070a08
 long int git_curl_ipresolve = CURL_IPRESOLVE_WHATEVER;
@@ -695,24 +696,32 @@ static int curl_trace(CURL *handle, curl_infotype type, 
char *data, size_t size,
curl_dump_header(text, (unsigned char *)data, size, DO_FILTER);
break;
case CURLINFO_DATA_OUT:
-   text = "=> Send data";
-   curl_dump_data(text, (unsigned char *)data, size);
+   if (trace_curl_data) {
+   text = "=> Send data";
+   curl_dump_data(text, (unsigned char *)data, size);
+   }
break;
case CURLINFO_SSL_DATA_OUT:
-   text = "=> Send SSL data";
-   curl_dump_data(text, (unsigned char *)data, size);
+   if (trace_curl_data) {
+   text = "=> Send SSL data";
+   curl_dump_data(text, (unsigned char *)data, size);
+   }
break;
case CURLINFO_HEADER_IN:
text = "<= Recv header";
curl_dump_header(text, (unsigned char *)data, size, NO_FILTER);
break;
case CURLINFO_DATA_IN:
-   text = "<= Recv data";
-   curl_dump_data(text, (unsigned char *)data, size);
+   if (trace_curl_data) {
+   text = "<= Recv data";
+   curl_dump_data(text, (unsigned char *)data, size);
+   }
break;
case CURLINFO_SSL_DATA_IN:
-   text = "<= Recv SSL data";
-   curl_dump_data(text, (unsigned char *)data, size);
+   if (trace_curl_data) {
+   text = "<= Recv SSL data";
+   curl_dump_data(text, (unsigned char *)data, size);
+   }
break;
 
default:/* we ignore unknown types by default */
@@ -857,6 +866,8 @@ static CURL *get_curl_handle(void)
if (getenv("GIT_CURL_VERBOSE"))
curl_easy_setopt(result, CURLOPT_VERBOSE, 1L);
setup_curl_trace(result);
+   if (getenv("GIT_TRACE_CURL_NO_DATA"))
+   trace_curl_data = 0;
if (getenv("GIT_REDACT_COOKIES")) {
string_list_split(_to_redact,
  getenv("GIT_REDACT_COOKIES"), ',', -1);
diff --git a/t/t5551-http-fetch-smart.sh b/t/t5551-http-fetch-smart.sh
index 88bd9c094..0c62d00af 100755
--- a/t/t5551-http-fetch-smart.sh
+++ b/t/t5551-http-fetch-smart.sh
@@ -376,5 +376,17 @@ test_expect_success 'GIT_REDACT_COOKIES redacts cookies' '
! grep "Cookie:.*Secret=Bar" err
 '
 
+test_expect_success 'GIT_TRACE_CURL_NO_DATA prevents data from being traced' '
+   rm -rf clone &&
+   GIT_TRACE_CURL=true \
+   git clone $HTTPD_URL/smart/repo.git clone 2>err &&
+   grep "=> Send data" err &&
+
+   rm -rf clone &&
+   GIT_TRACE_CURL=true GIT_TRACE_CURL_NO_DATA=1 \
+   git clone $HTTPD_URL/smart/repo.git clone 2>err &&
+   ! grep "=> Send data" err
+'
+
 stop_httpd
 test_done
-- 
2.16.0.rc1.238.g530d649a79-goog



[RFC PATCH 0/2] Cookie redaction during GIT_TRACE_CURL

2018-01-17 Thread Jonathan Tan
Sometimes authentication information is sent over HTTP through cookies,
but when using GIT_TRACE_CURL, that information appears in logs. There
are some HTTP headers already redacted ("Authorization:" and
"Proxy-Authorization:") - the first patch extends such redaction to a
user-specified list.

I've also included another patch to allow omission of data transmission
information from being logged when using GIT_TRACE_CURL. This reduces
the information logged to that similar to GIT_CURL_VERBOSE.
(As for why not use GIT_CURL_VERBOSE instead - that is because
GIT_CURL_VERBOSE does not perform any redaction, merely using Curl's
default logging mechanism.)

The patches are ready for merging, but I marked this as "RFC" just in
case there is a better way to accomplish this.

Jonathan Tan (2):
  http: support cookie redaction when tracing
  http: support omitting data from traces

 http.c  | 82 -
 t/t5551-http-fetch-smart.sh | 24 +
 2 files changed, 98 insertions(+), 8 deletions(-)

-- 
2.16.0.rc1.238.g530d649a79-goog



[RFC PATCH 1/2] http: support cookie redaction when tracing

2018-01-17 Thread Jonathan Tan
When using GIT_TRACE_CURL, Git already redacts the "Authorization:" and
"Proxy-Authorization:" HTTP headers. Extend this redaction to a
user-specified list of cookies, specified through the
"GIT_REDACT_COOKIES" environment variable.

Signed-off-by: Jonathan Tan <jonathanta...@google.com>
---
 http.c  | 55 +
 t/t5551-http-fetch-smart.sh | 12 ++
 2 files changed, 67 insertions(+)

diff --git a/http.c b/http.c
index 597771271..088cf70bf 100644
--- a/http.c
+++ b/http.c
@@ -13,8 +13,10 @@
 #include "transport.h"
 #include "packfile.h"
 #include "protocol.h"
+#include "string-list.h"
 
 static struct trace_key trace_curl = TRACE_KEY_INIT(CURL);
+static struct string_list cookies_to_redact = STRING_LIST_INIT_DUP;
 #if LIBCURL_VERSION_NUM >= 0x070a08
 long int git_curl_ipresolve = CURL_IPRESOLVE_WHATEVER;
 #else
@@ -575,6 +577,54 @@ static void redact_sensitive_header(struct strbuf *header)
/* Everything else is opaque and possibly sensitive */
strbuf_setlen(header,  sensitive_header - header->buf);
strbuf_addstr(header, " ");
+   } else if (cookies_to_redact.nr &&
+  skip_prefix(header->buf, "Cookie:", _header)) {
+   struct strbuf redacted_header = STRBUF_INIT;
+   char *cookie;
+
+   while (isspace(*sensitive_header))
+   sensitive_header++;
+
+   /*
+* The contents of header starting from sensitive_header will
+* subsequently be overridden, so it is fine to mutate this
+* string (hence the assignment to "char *").
+*/
+   cookie = (char *) sensitive_header;
+
+   while (cookie) {
+   char *equals;
+   char *semicolon = strstr(cookie, "; ");
+   if (semicolon)
+   *semicolon = 0;
+   equals = strchrnul(cookie, '=');
+   if (!equals) {
+   /* invalid cookie, just append and continue */
+   strbuf_addstr(_header, cookie);
+   continue;
+   }
+   *equals = 0; /* temporarily set to NUL for lookup */
+   if (string_list_lookup(_to_redact, cookie)) {
+   strbuf_addstr(_header, cookie);
+   strbuf_addstr(_header, "=");
+   } else {
+   *equals = '=';
+   strbuf_addstr(_header, cookie);
+   }
+   if (semicolon) {
+   /*
+* There are more cookies. (Or, for some
+* reason, the input string ends in "; ".)
+*/
+   strbuf_addstr(_header, "; ");
+   cookie = semicolon + strlen("; ");
+   } else {
+   cookie = NULL;
+   }
+   }
+
+   strbuf_setlen(header, sensitive_header - header->buf);
+   strbuf_addbuf(header, _header);
}
 }
 
@@ -807,6 +857,11 @@ static CURL *get_curl_handle(void)
if (getenv("GIT_CURL_VERBOSE"))
curl_easy_setopt(result, CURLOPT_VERBOSE, 1L);
setup_curl_trace(result);
+   if (getenv("GIT_REDACT_COOKIES")) {
+   string_list_split(_to_redact,
+ getenv("GIT_REDACT_COOKIES"), ',', -1);
+   string_list_sort(_to_redact);
+   }
 
curl_easy_setopt(result, CURLOPT_USERAGENT,
user_agent ? user_agent : git_user_agent());
diff --git a/t/t5551-http-fetch-smart.sh b/t/t5551-http-fetch-smart.sh
index a51b7e20d..88bd9c094 100755
--- a/t/t5551-http-fetch-smart.sh
+++ b/t/t5551-http-fetch-smart.sh
@@ -364,5 +364,17 @@ test_expect_success 'custom http headers' '
submodule update sub
 '
 
+test_expect_success 'GIT_REDACT_COOKIES redacts cookies' '
+   rm -rf clone &&
+   echo "Set-Cookie: NotSecret=Foo" >cookies &&
+   echo "Set-Cookie: Secret=Bar" >>cookies &&
+   GIT_TRACE_CURL=true GIT_REDACT_COOKIES=Secret,AnotherSecret \
+   git -c "http.cookieFile=$(pwd)/cookies" clone \
+   $HTTPD_URL/smart/repo.git clone 2>err &&
+   grep "Cookie:.*NotSecret=Foo" err &&
+   grep "Cookie:.*Secret=" err &&
+   ! grep "Cookie:.*Secret=Bar" err
+'
+
 stop_httpd
 test_done
-- 
2.16.0.rc1.238.g530d649a79-goog



Re: [PATCH 26/26] remote-curl: implement connect-half-duplex command

2018-01-10 Thread Jonathan Tan
On Tue,  2 Jan 2018 16:18:28 -0800
Brandon Williams  wrote:

> +static size_t proxy_in(void *ptr, size_t eltsize,
> +size_t nmemb, void *buffer_)

OK, I managed to look at the Curl stuff in more detail.

I know that these parameter names are what remote_curl.c has been using
for its callbacks, but I find them confusing (in particular, some Curl
documentation rightly refer to the 1st parameter as a buffer, and the
4th parameter is actually userdata). Also, according to the Curl
documentation, the type of the first parameter is "char *". Could we
change the type of the first parameter to "char *", and the name of the
fourth parameter either to "proxy_state_" or "userdata"?

> +{
> + size_t max = eltsize * nmemb;
> + struct proxy_state *p = buffer_;
> + size_t avail = p->request_buffer.len - p->pos;
> +
> + if (!avail) {
> + if (p->seen_flush) {
> + p->seen_flush = 0;
> + return 0;
> + }
> +
> + strbuf_reset(>request_buffer);
> + switch (packet_reader_read(>reader)) {
> + case PACKET_READ_EOF:
> + die("error reading request from parent process");

This should say "BUG:", I think. I'm not sure what the best way of
explaining it is, but basically connect_half_duplex is supposed to
ensure (by peeking) that there is no EOF when proxy_in() is called.

> + case PACKET_READ_NORMAL:
> + packet_buf_write_len(>request_buffer, p->reader.line,
> +  p->reader.pktlen);
> + break;
> + case PACKET_READ_DELIM:
> + packet_buf_delim(>request_buffer);
> + break;
> + case PACKET_READ_FLUSH:
> + packet_buf_flush(>request_buffer);
> + p->seen_flush = 1;
> + break;
> + }
> + p->pos = 0;
> + avail = p->request_buffer.len;
> + }
> +
> + if (max < avail)
> + avail = max;
> + memcpy(ptr, p->request_buffer.buf + p->pos, avail);
> + p->pos += avail;
> + return avail;

Thanks, this looks correct. I wish that the Curl API had a way for us to
say "here are 4 more bytes, and that is all" instead of us having to
make a note (p->seen_flush) to remember to return 0 on the next call,
but that's the way it is.

> +}
> +static size_t proxy_out(char *ptr, size_t eltsize,
> + size_t nmemb, void *buffer_)

Add a blank line before proxy_out. Also, same comment as proxy_in()
about the function signature.

> +{
> + size_t size = eltsize * nmemb;
> + struct proxy_state *p = buffer_;
> +
> + write_or_die(p->out, ptr, size);
> + return size;
> +}
> +
> +static int proxy_post(struct proxy_state *p)
> +{
> + struct active_request_slot *slot;
> + struct curl_slist *headers = http_copy_default_headers();
> + int err;
> +
> + headers = curl_slist_append(headers, p->hdr_content_type);
> + headers = curl_slist_append(headers, p->hdr_accept);
> + headers = curl_slist_append(headers, "Transfer-Encoding: chunked");
> +
> + slot = get_active_slot();
> +
> + curl_easy_setopt(slot->curl, CURLOPT_NOBODY, 0);
> + curl_easy_setopt(slot->curl, CURLOPT_POST, 1);
> + curl_easy_setopt(slot->curl, CURLOPT_URL, p->service_url);
> + curl_easy_setopt(slot->curl, CURLOPT_HTTPHEADER, headers);

I looked at the Curl documentation for CURLOPT_HTTPHEADER and
curl_easy_setopt doesn't consume the argument here (in fact, it asks us
to keep "headers" around), so it might be possible to just generate the
headers once in proxy_state_init().

> +
> + /* Setup function to read request from client */
> + curl_easy_setopt(slot->curl, CURLOPT_READFUNCTION, proxy_in);
> + curl_easy_setopt(slot->curl, CURLOPT_READDATA, p);
> +
> + /* Setup function to write server response to client */
> + curl_easy_setopt(slot->curl, CURLOPT_WRITEFUNCTION, proxy_out);
> + curl_easy_setopt(slot->curl, CURLOPT_WRITEDATA, p);
> +
> + err = run_slot(slot, NULL);
> +
> + if (err != HTTP_OK)
> + err = -1;

This seems to mean that we cannot have two requests in flight at the
same time even while there is no response (from the fact that we have a
HTTP status code after returning from run_slot()).

I thought that git fetch over HTTP uses the two-requests-in-flight
optimization that it also does over other protocols like SSH, but I see
that that code path (fetch_git() in remote-curl.c) also uses run_slot()
indirectly, so maybe my assumption is wrong. Anyway, this is outside the
scope of this patch.

> +
> + curl_slist_free_all(headers);
> + return err;
> +}
> +
> +static int connect_half_duplex(const char *service_name)
> +{
> + struct discovery *discover;
> + struct proxy_state p;
> +
> + /*
> +  * Run the info/refs request 

Re: [PATCH 26/26] remote-curl: implement connect-half-duplex command

2018-01-09 Thread Jonathan Tan
On Tue,  2 Jan 2018 16:18:28 -0800
Brandon Williams  wrote:

> Teach remote-curl the 'connect-half-duplex' command which is used to
> establish a half-duplex connection with servers which support protocol
> version 2.  This allows remote-curl to act as a proxy, allowing the git
> client to communicate natively with a remote end, simply using
> remote-curl as a pass through to convert requests to http.
> 
> Signed-off-by: Brandon Williams 
> ---
>  remote-curl.c  | 185 
> -
>  t/t5701-protocol-v2.sh |  41 +++
>  2 files changed, 224 insertions(+), 2 deletions(-)

I didn't look at the usage of the curl API in detail, but overall this
looks good. I'm pleasantly surprised that it didn't take so many lines
of code as I expected.

Overall everything looks good, except for the points that I have brought
up in my other e-mails.

> diff --git a/remote-curl.c b/remote-curl.c
> index 4086aa733..b63b06398 100644
> --- a/remote-curl.c
> +++ b/remote-curl.c

[snip]

> +struct proxy_state {
> + char *service_name;
> + char *service_url;
> + char *hdr_content_type;
> + char *hdr_accept;

Maybe document that the above 3 fields (service_url to hdr_accept) are
cached because we need to pass them to curl_easy_setopt() for every
request.


Re: [PATCH 20/26] fetch-pack: perform a fetch using v2

2018-01-09 Thread Jonathan Tan
On Tue,  2 Jan 2018 16:18:22 -0800
Brandon Williams  wrote:

> +static enum ack_type process_ack(const char *line, struct object_id *oid)
> +{
> + const char *arg;
> +
> + if (!strcmp(line, "NAK"))
> + return NAK;
> + if (skip_prefix(line, "ACK ", )) {
> + if (!parse_oid_hex(arg, oid, )) {
> + if (strstr(arg, "continue"))
> + return ACK_continue;

This function seems to be only used for v2, so I don't think we need to
parse "continue".

Also, maybe describe the plan for supporting functionality not supported
yet (e.g. server-side declaration of shallows and client-side "deepen").

It may be possible to delay support for server-side shallows on the
server (that is, only implement support for it in the client) since the
server can just declare that it doesn't support protocol v2 when serving
such repos (although it might just be easier to implement server-side
support in this case).

For "deepen", we need support for it both on the client and the server
now unless we plan to declare a "deepen" capability in the future (then,
as of these patches, clients that require "deepen" will use protocol v1;
when a new server declares "deepen", old clients will ignore it and keep
the status quo, and new clients can then use "deepen").

There may be others that I've missed.


Re: [PATCH 11/26] serve: introduce git-serve

2018-01-09 Thread Jonathan Tan
On Tue, 9 Jan 2018 14:16:42 -0800
Brandon Williams  wrote:

> All good documentation changes.

Thanks!

> > > + /*
> > > +  * Function called when a client requests the capability as a command.
> > > +  * The command request will be provided to the function via 'keys', the
> > > +  * capabilities requested, and 'args', the command specific parameters.
> > > +  *
> > > +  * This field should be NULL for capabilities which are not commands.
> > > +  */
> > > + int (*command)(struct repository *r,
> > > +struct argv_array *keys,
> > > +struct argv_array *args);
> > 
> > Looking at the code below, I see that the command is not executed unless
> > advertise returns true - this means that a command cannot be both
> > supported and unadvertised. Would this be too restrictive? For example,
> > this would disallow a gradual across-multiple-servers rollout in which
> > we allow but not advertise a capability, and then after some time,
> > advertise the capability.
> 
> One way to change this would be to just add another function to the
> struct which is called to check if the command is allowed, instead of
> relying on the same function to do that for both advertise and
> allow...though I don't see a big win for allowing a command but not
> advertising it.

My rationale for allowing a command but not advertising it is in the
paragraph above (that you quoted), but if that is insufficient
rationale, then I agree that we don't need to do this.

> > If we change this, then the value parameter of advertise can be
> > mandatory instead of optional.
> 
> I don't see how this fixes the issue you bring up.

This is a consequence, not a fix - if we were to do as I suggested, then
we no longer need to invoke advertise to check whether something is
advertised except when we are advertising them, in which case "value"
never needs to be NULL.


Re: [PATCH 13/26] connect: request remote refs using v2

2018-01-09 Thread Jonathan Tan
On Tue,  2 Jan 2018 16:18:15 -0800
Brandon Williams  wrote:

> diff --git a/connect.c b/connect.c
> index caa539b75..9badd403f 100644
> --- a/connect.c
> +++ b/connect.c
> @@ -12,9 +12,11 @@
>  #include "sha1-array.h"
>  #include "transport.h"
>  #include "strbuf.h"
> +#include "version.h"
>  #include "protocol.h"
>  
>  static char *server_capabilities;
> +static struct argv_array server_capabilities_v2 = ARGV_ARRAY_INIT;
>  static const char *parse_feature_value(const char *, const char *, int *);
>  
>  static int check_ref(const char *name, unsigned int flags)
> @@ -62,6 +64,33 @@ static void die_initial_contact(int unexpected)
> "and the repository exists."));
>  }
>  
> +static int server_supports_v2(const char *c, int die_on_error)

Document what "c" means.

[snip]

> +static void process_capabilities_v2(struct packet_reader *reader)
> +{
> + while (packet_reader_read(reader) == PACKET_READ_NORMAL) {
> + argv_array_push(_capabilities_v2, reader->line);
> + }

No need for braces on single-line blocks.

> +static int process_ref_v2(const char *line, struct ref ***list)

The "list" is the tail of a linked list, so maybe name it "tail"
instead.


Re: [PATCH 12/26] ls-refs: introduce ls-refs server command

2018-01-09 Thread Jonathan Tan
On Tue,  2 Jan 2018 16:18:14 -0800
Brandon Williams  wrote:

> +  symrefs: In addition to the object pointed by it, show the underlying
> +ref pointed by it when showing a symbolic ref.
> +  peel: Show peeled tags.
> +  ref-pattern : When specified, only references matching the
> +  given patterns are displayed.

I notice "symrefs" being tested in patch 13 and "ref-pattern" being
tested in patch 16. Is it possible to make a test for "peel" as well?
(Or is it being tested somewhere I didn't notice?)


Re: [PATCH 11/26] serve: introduce git-serve

2018-01-09 Thread Jonathan Tan
On Tue,  2 Jan 2018 16:18:13 -0800
Brandon Williams  wrote:

> diff --git a/Documentation/technical/protocol-v2.txt 
> b/Documentation/technical/protocol-v2.txt
> new file mode 100644
> index 0..b87ba3816
> --- /dev/null
> +++ b/Documentation/technical/protocol-v2.txt

I'll review the documentation later, once there is some consensus that
the overall design is OK. (Or maybe there already is consensus?)

> diff --git a/builtin/serve.c b/builtin/serve.c
> new file mode 100644
> index 0..bb726786a
> --- /dev/null
> +++ b/builtin/serve.c
> @@ -0,0 +1,30 @@
> +#include "cache.h"
> +#include "builtin.h"
> +#include "parse-options.h"
> +#include "serve.h"
> +
> +static char const * const grep_usage[] = {

Should be serve_usage.

> diff --git a/serve.c b/serve.c
> new file mode 100644
> index 0..da8127775
> --- /dev/null
> +++ b/serve.c

[snip]

> +struct protocol_capability {
> + const char *name; /* capability name */

Maybe document as:

  The name of the capability. The server uses this name when advertising
  this capability, and the client uses this name to invoke the command
  corresponding to this capability.

> + /*
> +  * Function queried to see if a capability should be advertised.
> +  * Optionally a value can be specified by adding it to 'value'.
> +  */
> + int (*advertise)(struct repository *r, struct strbuf *value);

Document what happens when value is appended to. For example:

  ... If value is appended to, the server will advertise this capability
  as = instead of .

> + /*
> +  * Function called when a client requests the capability as a command.
> +  * The command request will be provided to the function via 'keys', the
> +  * capabilities requested, and 'args', the command specific parameters.
> +  *
> +  * This field should be NULL for capabilities which are not commands.
> +  */
> + int (*command)(struct repository *r,
> +struct argv_array *keys,
> +struct argv_array *args);

Looking at the code below, I see that the command is not executed unless
advertise returns true - this means that a command cannot be both
supported and unadvertised. Would this be too restrictive? For example,
this would disallow a gradual across-multiple-servers rollout in which
we allow but not advertise a capability, and then after some time,
advertise the capability.

If we change this, then the value parameter of advertise can be
mandatory instead of optional.


Re: [PATCH 09/26] transport: store protocol version

2018-01-09 Thread Jonathan Tan
On Tue,  2 Jan 2018 16:18:11 -0800
Brandon Williams  wrote:

> diff --git a/transport.c b/transport.c
> index 63c3dbab9..2378dcb38 100644
> --- a/transport.c
> +++ b/transport.c
> @@ -118,6 +118,7 @@ struct git_transport_data {
>   struct child_process *conn;
>   int fd[2];
>   unsigned got_remote_heads : 1;
> + enum protocol_version version;

Should this be initialized to protocol_unknown_version? Right now, as
far as I can tell, it is zero-initialized, which means protocol_v0.


Re: [PATCH 07/26] connect: convert get_remote_heads to use struct packet_reader

2018-01-09 Thread Jonathan Tan
On Tue,  2 Jan 2018 16:18:09 -0800
Brandon Williams  wrote:

> - while ((len = read_remote_ref(in, _buf, _len, ))) {
> + while (state != EXPECTING_DONE) {
> + switch (packet_reader_read()) {
> + case PACKET_READ_EOF:
> + die_initial_contact(1);
> + case PACKET_READ_NORMAL:
> + len = reader.pktlen;
> + if (len > 4 && skip_prefix(packet_buffer, "ERR ", ))

This should be a field in reader, not the global packet_buffer, I think.

Also, I did a search of usages of packet_buffer, and there are just a
few of them - it might be worthwhile to eliminate it, and have each
component using it allocate its own buffer. But this can be done in a
separate patch set.

> @@ -269,6 +284,8 @@ struct ref **get_remote_heads(int in, char *src_buf, 
> size_t src_len,
>   if (process_shallow(len, shallow_points))
>   break;
>   die("protocol error: unexpected '%s'", packet_buffer);

Here too.


Re: [PATCH 02/26] pkt-line: introduce struct packet_reader

2018-01-09 Thread Jonathan Tan
On Tue,  2 Jan 2018 16:18:04 -0800
Brandon Williams  wrote:

> diff --git a/pkt-line.h b/pkt-line.h
> index 06c468927..c446e886a 100644
> --- a/pkt-line.h
> +++ b/pkt-line.h
> @@ -111,6 +111,63 @@ char *packet_read_line_buf(char **src_buf, size_t 
> *src_len, int *size);
>   */
>  ssize_t read_packetized_to_strbuf(int fd_in, struct strbuf *sb_out);
>  
> +struct packet_reader {
> + /* source file descriptor */
> + int fd;
> +
> + /* source buffer and its size */
> + char *src_buffer;
> + size_t src_len;
> +
> + /* buffer that pkt-lines are read into and its size */
> + char *buffer;
> + unsigned buffer_size;

Is the intention to support different buffers in the future?

[snip]

> +/*
> + * Peek the next packet line without consuming it and return the status.
> + * The next call to 'packet_reader_read()' will perform a read of the same 
> line
> + * that was peeked, consuming the line.
> + *
> + * Only a single line can be peeked at a time.

It is logical to me that if you peeked at a line, and then peeked at it
again, you will get the same line - I would phrase this not as a
restriction ("only a single line") but just as a statement of fact (e.g.
"Peeking at the same line multiple times without an intervening
packet_reader_read will return the same result").

> + */
> +extern enum packet_read_status packet_reader_peek(struct packet_reader 
> *reader);
> +
>  #define DEFAULT_PACKET_MAX 1000
>  #define LARGE_PACKET_MAX 65520
>  #define LARGE_PACKET_DATA_MAX (LARGE_PACKET_MAX - 4)


Re: [PATCH 01/26] pkt-line: introduce packet_read_with_status

2018-01-09 Thread Jonathan Tan
On Tue,  2 Jan 2018 16:18:03 -0800
Brandon Williams  wrote:

> -int packet_read(int fd, char **src_buf, size_t *src_len,
> - char *buffer, unsigned size, int options)
> +enum packet_read_status packet_read_with_status(int fd, char **src_buffer, 
> size_t *src_len,
> + char *buffer, unsigned size, 
> int *pktlen,
> + int options)
>  {
> - int len, ret;
> + int len;
>   char linelen[4];
>  
> - ret = get_packet_data(fd, src_buf, src_len, linelen, 4, options);
> - if (ret < 0)
> - return ret;
> + if (get_packet_data(fd, src_buffer, src_len, linelen, 4, options) < 0)
> + return PACKET_READ_EOF;
> +
>   len = packet_length(linelen);
>   if (len < 0)
>   die("protocol error: bad line length character: %.4s", linelen);
> - if (!len) {
> +
> + if (len == 0) {

This change (replacing "!len" with "len == 0") is unnecessary, I think.

>   packet_trace("", 4, 0);
> - return 0;
> + return PACKET_READ_FLUSH;
> + } else if (len >= 1 && len <= 3) {
> + die("protocol error: bad line length character: %.4s", linelen);
>   }

This seems to be more of a "bad line length" than a "bad line length
character".

Also, some of the checks are redundant. Above, it is probably better to
delete "len >= 1", and optionally write "len < 4" instead of "len <= 3"
(to emphasize that the subtraction of 4 below does not result in a
negative value).

> +
>   len -= 4;
> - if (len >= size)
> + if ((len < 0) || ((unsigned)len >= size))
>   die("protocol error: bad line length %d", len);

The "len < 0" check is redundant.

> - ret = get_packet_data(fd, src_buf, src_len, buffer, len, options);
> - if (ret < 0)
> - return ret;
> +
> + if (get_packet_data(fd, src_buffer, src_len, buffer, len, options) < 0)
> + return PACKET_READ_EOF;
>  
>   if ((options & PACKET_READ_CHOMP_NEWLINE) &&
>   len && buffer[len-1] == '\n')


Re: [PATCH 00/26] protocol version 2

2018-01-09 Thread Jonathan Tan
On Tue,  2 Jan 2018 16:18:02 -0800
Brandon Williams  wrote:

> The following patches extend what I sent out as an WIP
> (https://public-inbox.org/git/20171204235823.63299-1-bmw...@google.com/) and
> implement protocol version 2.

Summarizing (for myself) the rationale for protocol version 2:

The existing protocol has a few pain points: (a) limit on the length of
the capability line (the capability line can be used to include
additional parameters in a backwards-compatible way), (b) difficulty in
creating proxies because of inconsistent flush semantics, and (c) the
need to implement clients twice - once for HTTP and once for
connect-supporting transports. To which we can add another: (d) if we
want to support something entirely new (for example, a server-side "git
log"), we will need a new protocol anyway.

The new functionality introduced in this patch set is probably best done
using a new protocol. If it were done using the existing protocol (by
adding a parameter in the capabilities line), we would still run into
(a) and (c), so we might as well introduce the new protocol now.

Some of the above points are repeats from my previous e-mail:
https://public-inbox.org/git/20171110121347.1f7c184c543622b60164e...@google.com/

> Some changes from that series are as follows:
>  * Lots of various cleanup on the ls-refs and fetch command code, both server
>and client.
>  * Fetch command now supports a stateless-rpc mode which enables communicating
>with a half-duplex connection.

Good to hear about fetch support.

>  * Introduce a new remote-helper command 'connect-half-duplex' which is
>implemented by remote-curl (the http remote-helper).  This allows for a
>client to establish a half-duplex connection and use remote-curl as a proxy
>to wrap requests in http before sending them to the remote end and
>unwrapping the responses and sending them back to the client's stdin.

I'm not sure about the "half-duplex" name - it is half-duplex in that
each side must terminate their communications with a flush, but not
half-duplex in that request-response pairs can overlap each other (e.g.
during negotation during fetch - there is an optimization in which the
client tries to keep two requests pending at a time). I think that the
idea we want to communicate is that requests and responses are always
packetized, stateless, and always happen as a pair.

I wonder if "stateless-connect" is a better keyword - it makes sense to
me (once described) that "stateless" implies that the client sends
everything the server needs at once (thus, in a packet), the server
sends everything the client needs back at once (thus, in a packet), and
then the client must not assume any state-storing on the part of the
server or transport.

>  * The transport code is refactored for ls-remote, fetch, and push to provide 
> a
>list of ref-patterns (based on the refspec being used) when requesting refs
>from the remote end.  This allows the ls-refs code to send this list of
>patterns so the remote end and filter the refs it sends back.

Briefly looking at the implementation, the client seems to incur an
extra roundtrip when using ls-remote (and others) with a v2-supporting
server. I initially didn't like this, but upon further reflection, this
is probably fine for now. The client can be upgraded later, and I think
that clients will eventually want to query git-serve directly for
"ls-refs" first, and then fall back to v0 for ancient servers, instead
of checking git-upload-pack first (as in this patch set) - so, the
support for "ls-refs" here won't be carried forward merely for backwards
compatibility, but will eventually be actively used.

As for the decision to use a new endpoint "git-serve" instead of reusing
"git-upload-pack" (or "git-receive-pack"), reusing the existing one
might allow some sort of optimization later in which the first
"git-upload-pack" query immediately returns with the v2 answer (instead
of redirecting the client to "git-serve"), but this probably doesn't
matter in practice (as I stated above, I think that eventually clients
will query git-serve first).

> This series effectively implements protocol version 2 for listing a remotes
> refs (ls-remote) as well as for fetch for the builtin transports (ssh, git,
> file) and for the http/https transports.  Push is not implemented yet and
> doesn't need to be implemented at the same time as fetch since the
> receive-pack code can default to using protocol v0 when v2 is requested by the
> client.

Agreed - push can be done later.


Re: [RFC PATCH 01/18] docs: Multi-Pack Index (MIDX) Design Notes

2018-01-08 Thread Jonathan Tan
On Mon, 8 Jan 2018 15:35:59 -0500
Derrick Stolee  wrote:

> Thanks! That is certainly the idea. If you know about MIDX, then you can 
> benefit from it. If you do not, then you have all the same data 
> available to you do to your work. Having a MIDX file will not break 
> other tools (libgit2, JGit, etc.).
> 
> One thing I'd like to determine before this patch goes to v1 is how much 
> we should make the other packfile-aware commands also midx-aware. My gut 
> reaction right now is to have git-repack call 'git midx --clear' if 
> core.midx=true and a packfile was deleted. However, this could easily be 
> changed with 'git midx --clear' followed by 'git midx --write 
> --update-head' if midx-head exists.

My opinion is that these are sufficient:
 (a) functionality to create a .midx file from scratch (deleting any
 existing ones)
 (b) either:
 - update of packfile.c to read (one or more) midx files (like in
   patch 18), and possibly usage in a benchmark, or
 - any other usage of midx file (e.g. abbreviations, like in patch
   17)

In general, a way to create them (so that they can be used from a
cronjob, etc.), and a way to consume them to show that the new way works
and is indeed faster. This functionality in itself might be sufficient
to merge in - this would already be useful in situations where it is
undesirable for repacks to happen often, and we can "bridge" the
intervals between repacks using a cronjob that periodically generates
.midx files in order to keep up the object lookup performance.

In particular, I think that it is fine to omit a more sophisticated
"repack" for now - the .midx mechanism must tolerate packs referenced by
.midx files suddenly disappearing anyway, and in this way, at least we
can demonstrate that the .midx mechanism still works in this case.


Re: [RFC PATCH 01/18] docs: Multi-Pack Index (MIDX) Design Notes

2018-01-08 Thread Jonathan Tan
On Sun,  7 Jan 2018 13:14:42 -0500
Derrick Stolee  wrote:

> +Design Details
> +--
> +
> +- The MIDX file refers only to packfiles in the same directory
> +  as the MIDX file.
> +
> +- A special file, 'midx-head', stores the hash of the latest
> +  MIDX file so we can load the file without performing a dirstat.
> +  This file is especially important with incremental MIDX files,
> +  pointing to the newest file.

I presume that the actual MIDX files are named by hash? (You might have
written this somewhere that I somehow missed.)

Also, I notice that in the "Future Work" section, the possibility of
multiple MIDX files is raised. Could this 'midx-head' file be allowed to
store multiple such files? That way, we avoid a bit of file format
churn (in that we won't need to define a new "chunk" in the future).

> +- If a packfile exists in the pack directory but is not referenced
> +  by the MIDX file, then the packfile is loaded into the packed_git
> +  list and Git can access the objects as usual. This behavior is
> +  necessary since other tools could add packfiles to the pack
> +  directory without notifying Git.
> +
> +- The MIDX file should be only a supplemental structure. If a
> +  user downgrades or disables the `core.midx` config setting,
> +  then the existing .idx and .pack files should be sufficient
> +  to operate correctly.

Let me try to summarize: so, at this point, there are no
backwards-incompatible changes to the repo disk format. Unupdated code
paths (and old versions of Git) can just read the .idx and .pack files,
as always. Updated code paths will look at the .midx and .idx files, and
will sort them as follows:
 - .midx files go into a data structure
 - .idx files not referenced by any .midx files go into the
   existing packed_git data structure

A writer can either merely write a new packfile (like old versions of
Git) or write a packfile and update the .midx file, and everything above
will still work. In the event that a writer deletes an existing packfile
referenced by a .midx (for example, old versions of Git during a
repack), we will lose the advantages of the .midx file - we will detect
that the .midx no longer works when attempting to read an object given
its information, but in this case, we can recover by dropping the .midx
file and loading all the .idx files it references that still exist.

As a reviewer, I think this is a very good approach, and this does make
things easier to review (as opposed to, say, an approach where a lot of
the code must be aware of .midx files).


Re: [FYI PATCH] t/helper/test-lazy-name-hash: fix compilation

2017-12-18 Thread Jonathan Tan
On Mon, 18 Dec 2017 13:49:47 -0800
Stefan Beller  wrote:

> I was compiling origin/master today with stricter compiler flags today
> and was greeted by
> 
> t/helper/test-lazy-init-name-hash.c: In function ‘cmd_main’:
> t/helper/test-lazy-init-name-hash.c:172:5: error: ‘nr_threads_used’ may be 
> used uninitialized in this function [-Werror=maybe-uninitialized]
>  printf("avg [size %8d] [single %f] %c [multi %f %d]\n",
>  ^~~
>  nr,
>  ~~~
>  (double)avg_single/10,
>  ~~
>  (avg_single < avg_multi ? '<' : '>'),
>  ~
>  (double)avg_multi/10,
>  ~
>  nr_threads_used);
>  
> t/helper/test-lazy-init-name-hash.c:115:6: note: ‘nr_threads_used’ was 
> declared here
>   int nr_threads_used;
>   ^~~
> 
> I do not see how we can arrive at that line without having `nr_threads_used`
> initialized, as we'd have `count > 1`  (which asserts that we ran the
> loop above at least once, such that it *should* be initialized).

Your analysis makes sense. (The compiler probably couldn't detect it because
"count" is a static variable, not a local variable.)

> --- a/t/helper/test-lazy-init-name-hash.c
> +++ b/t/helper/test-lazy-init-name-hash.c
> @@ -112,7 +112,7 @@ static void analyze_run(void)
>  {
>   uint64_t t1s, t1m, t2s, t2m;
>   int cache_nr_limit;
> - int nr_threads_used;
> + int nr_threads_used = 0;
>   int i;
>   int nr;

I agree that this is probably the best way to fix it. Another way might
be to omit printing out the number of threads used in the printf that
prints the average statistics.

The best way is probably to not use so many global variables, but that
is out of scope of this change.


Re: Fetching commit instead of ref

2017-12-18 Thread Jonathan Tan
On Mon, 18 Dec 2017 12:30:23 +
"Carlsson, Magnus"  wrote:

> In a certain situation I would really need to fetch all commits
> related to a specific commit (SHA). I have read the git fetch
> documentation and found nothing regarding this. It only seems to
> support fetching references.

The documentation has been updated in version 2.15.0 to describe this.
But as the commit message of commit 83558a412afa ("fetch doc: src side
of refspec could be full SHA-1", 2017-10-18) says, this functionality
was available earlier.

> I found some traces on stack overflow:
> https://stackoverflow.com/questions/14872486/retrieve-specific-commit-from-a-remote-git-repository
> 
> Following that recommendation it feels like it almost works:
> $ git fetch subrepo 
> 50f730db793e0733b159326c5a3e78fd48cedfec:refs/remote/subrepo/foo-commit
> remote: Counting objects: 2311, done.
> remote: Total 2311 (delta 0), reused 0 (delta 0), pack-reused 2311
> Receiving objects: 100% (2311/2311), 703.64 KiB | 0 bytes/s, done.
> Resolving deltas: 100% (1174/1174), done.
> > So far so good, but then an error message appear:
> error: Server does not allow request for unadvertised object 
> 50f730db793e0733b159326c5a3e78fd48cedfec
> > And nothing seems to be fetched.
> 
> Is there a way to fetch a commit and any ancestors to that commit based on a 
> SHA?

You'll need to set uploadpack.allowTipSHA1InWant,
uploadpack.allowReachableSHA1InWant, or uploadpack.allowAnySHA1InWant on
the server. (This might need to be better documented - I see this
documented for fetch-pack, but not fetch.)


[PATCH 0/2] More transport API improvements

2017-12-14 Thread Jonathan Tan
Note that this is built on jt/transport-no-more-rsync.

I have found the transport mechanism relatively complicated, so here is
some more effort in the hope of making it more readily understood.

Patch 1 is probably good to go in as-is.

Patch 2 is a modification of the transport API by making certain
variables in the transport interface struct more private, and might need
more discussion. I also discuss the possible future work that this
modification makes possible.

Jonathan Tan (2):
  clone, fetch: remove redundant transport check
  transport: make transport vtable more private

 builtin/clone.c  |  3 ---
 builtin/fetch.c  |  3 ---
 transport-helper.c   | 23 +++---
 transport-internal.h | 61 ++
 transport.c  | 69 
 transport.h  | 54 ++--
 6 files changed, 120 insertions(+), 93 deletions(-)
 create mode 100644 transport-internal.h

-- 
2.15.1.504.g5279b80103-goog



[PATCH 2/2] transport: make transport vtable more private

2017-12-14 Thread Jonathan Tan
Move the definition of the transport-specific functions provided by
transports, whether declared in transport.c or transport-helper.c, into
an internal header. This means that transport-using code (as opposed to
transport-declaring code) can no longer access these functions (without
importing the internal header themselves), making it clear that they
should use the transport_*() functions instead, and also allowing the
interface between the transport mechanism and an individual transport to
independently evolve.

This is superficially a reversal of commit 824d5776c3f2 ("Refactor
struct transport_ops inlined into struct transport", 2007-09-19).
However, the scope of the involved variables was neither affected nor
discussed in that commit, and I think that the advantages in making
those functions more private outweigh the advantages described in that
commit's commit message. A minor additional point is that the code has
gotten more complicated since then, in that the function-pointer
variables are potentially mutated twice (once initially and once if
transport_take_over() is invoked), increasing the value of corralling
them into their own struct.

Signed-off-by: Jonathan Tan <jonathanta...@google.com>
---
The main evolution I see is to move all traces of the "takeover"
mechanism to transport.c, instead of having individual transports (the
builtin smart transport in transport.c and the remote helper
interface in transport-helper.c) do the connect-then-fetch etc. thing
themselves. This means that transport_vtable->connect must be allowed to
return "fallback", and we have the following vtables:
 - the one for bundles (unchanged)
 - one for builtin smart transports in which fetch and push_refs are
   NULL
 - one for remote helpers (unchanged)
 - one for the result of takeover (unchanged) (to be used when
   get_refs_list/fetch/pull is used with a connect-supporting transport)
 - one for the result of transport_connect (connect, fetch, and
   push_refs are NULL) (to be used when user code invokes
   transport_connect in order to control the stream directly)

transport_get_remote_refs() and friends will always attempt to connect
(and perform the takeover if it succeeds) before proceeding.

In this way, functionality becomes clearer:
 - Transports that do not support connect leave connect as NULL and
   implement get_refs_list/fetch/pull.
 - Transports that support connect implement connect. If connect is
   do-or-die (e.g. builtin smart transports), leave get_refs_list,
   fetch, and pull as NULL. Otherwise (e.g. remote helpers) populate
   them, and they will be used as a fallback if connect fails.
 - When transport_connect() is invoked, the vtable is switched so that
   connect/get_refs_list/fetch/pull no longer work. The user code has
   full control.
 - When transport_get_remote_refs() (or friends) is invoked, if connect
   is successful, the vtable is switched so that connect no longer
   works. get_refs_list/fetch/pull use the established connection, and
   it is clear that the user can no longer call transport_connect().

This seems feasible to me, but I haven't tried to implement it yet.
---
 transport-helper.c   | 23 +++---
 transport-internal.h | 61 ++
 transport.c  | 69 
 transport.h  | 54 ++--
 4 files changed, 120 insertions(+), 87 deletions(-)
 create mode 100644 transport-internal.h

diff --git a/transport-helper.c b/transport-helper.c
index c948d5215..1a4b43ff1 100644
--- a/transport-helper.c
+++ b/transport-helper.c
@@ -11,6 +11,7 @@
 #include "sigchain.h"
 #include "argv-array.h"
 #include "refs.h"
+#include "transport-internal.h"
 
 static int debug;
 
@@ -650,7 +651,7 @@ static int fetch(struct transport *transport,
 
if (process_connect(transport, 0)) {
do_take_over(transport);
-   return transport->fetch(transport, nr_heads, to_fetch);
+   return transport->vtable->fetch(transport, nr_heads, to_fetch);
}
 
count = 0;
@@ -987,7 +988,7 @@ static int push_refs(struct transport *transport,
 
if (process_connect(transport, 1)) {
do_take_over(transport);
-   return transport->push_refs(transport, remote_refs, flags);
+   return transport->vtable->push_refs(transport, remote_refs, 
flags);
}
 
if (!remote_refs) {
@@ -1035,7 +1036,7 @@ static struct ref *get_refs_list(struct transport 
*transport, int for_push)
 
if (process_connect(transport, for_push)) {
do_take_over(transport);
-   return transport->get_refs_list(transport, for_push);
+   return transport->vtable->get_refs_list(transport, for_push);
}
 
if (data->push && for_push)
@

[PATCH 1/2] clone, fetch: remove redundant transport check

2017-12-14 Thread Jonathan Tan
Prior to commit a2d725b7bdf7 ("Use an external program to implement
fetching with curl", 2009-08-05), if Git was compiled with NO_CURL, the
get_refs_list and fetch methods in struct transport might not be
populated, hence the checks in clone and fetch. After that commit, all
transports populate get_refs_list and fetch, making the checks in clone
and fetch redundant. Remove those checks.

Signed-off-by: Jonathan Tan <jonathanta...@google.com>
---
 builtin/clone.c | 3 ---
 builtin/fetch.c | 3 ---
 2 files changed, 6 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index dbddd98f8..979af0383 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -1083,9 +1083,6 @@ int cmd_clone(int argc, const char **argv, const char 
*prefix)
warning(_("--local is ignored"));
transport->cloning = 1;
 
-   if (!transport->get_refs_list || (!is_local && !transport->fetch))
-   die(_("Don't know how to clone %s"), transport->url);
-
transport_set_option(transport, TRANS_OPT_KEEP, "yes");
 
if (option_depth)
diff --git a/builtin/fetch.c b/builtin/fetch.c
index 225c73492..09eb1fc17 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -1094,9 +1094,6 @@ static int do_fetch(struct transport *transport,
tags = TAGS_UNSET;
}
 
-   if (!transport->get_refs_list || !transport->fetch)
-   die(_("Don't know how to fetch from %s"), transport->url);
-
/* if not appending, truncate FETCH_HEAD */
if (!append && !dry_run) {
retcode = truncate_fetch_head();
-- 
2.15.1.504.g5279b80103-goog



[PATCH] transport: remove unused "push" in vtable

2017-12-12 Thread Jonathan Tan
After commit 0d0bac67ce3b ("transport: drop support for git-over-rsync",
2016-02-01), no transport in Git populates the "push" entry in the
transport vtable. Remove this entry.

Signed-off-by: Jonathan Tan <jonathanta...@google.com>
---
I was taking a look at the transport code and noticed that push is
unused (and push_refs is used instead). Here is a code cleanup.
---
 transport.c | 9 +
 transport.h | 1 -
 2 files changed, 1 insertion(+), 9 deletions(-)

diff --git a/transport.c b/transport.c
index 7231d1b1b..7cc39b7c0 100644
--- a/transport.c
+++ b/transport.c
@@ -627,7 +627,6 @@ void transport_take_over(struct transport *transport,
transport->set_option = NULL;
transport->get_refs_list = get_refs_via_connect;
transport->fetch = fetch_refs_via_pack;
-   transport->push = NULL;
transport->push_refs = git_transport_push;
transport->disconnect = disconnect_git;
transport->smart_options = &(data->options);
@@ -969,13 +968,7 @@ int transport_push(struct transport *transport,
*reject_reasons = 0;
transport_verify_remote_names(refspec_nr, refspec);
 
-   if (transport->push) {
-   /* Maybe FIXME. But no important transport uses this case. */
-   if (flags & TRANSPORT_PUSH_SET_UPSTREAM)
-   die("This transport does not support using 
--set-upstream");
-
-   return transport->push(transport, refspec_nr, refspec, flags);
-   } else if (transport->push_refs) {
+   if (transport->push_refs) {
struct ref *remote_refs;
struct ref *local_refs = get_local_heads();
int match_flags = MATCH_REFS_NONE;
diff --git a/transport.h b/transport.h
index bc5571574..ab4fe7f27 100644
--- a/transport.h
+++ b/transport.h
@@ -103,7 +103,6 @@ struct transport {
 * process involved generating new commits.
 **/
int (*push_refs)(struct transport *transport, struct ref *refs, int 
flags);
-   int (*push)(struct transport *connection, int refspec_nr, const char 
**refspec, int flags);
int (*connect)(struct transport *connection, const char *name,
   const char *executable, int fd[2]);
 
-- 
2.15.1.424.g9478a66081-goog



Re: [PATCH] Makefile: allow NO_DC_SHA1_SUBMODULE to be set in config.mak

2017-12-12 Thread Jonathan Tan
On Tue, 12 Dec 2017 11:57:28 -0800
Junio C Hamano  wrote:

> Junio C Hamano  writes:
> 
> > Makes sense.  The patch looks scary by appearing to move the
> > includes far to the front of the Makefile, but it in fact is moving
> > the NO_DC_SHA1_SUBMODULE block slightly down and it is a sensible
> > and safe move.
> 
> A completely unrelated tangent.  This was a good change to try your
> anchored diff on.  Viewing this change with
> 
>$ git show --anchored='include config.mak'
> 
> looks a lot less scary than the way it is shown by default (which
> matches what was posted on the list).
> 
> Good job.

Thanks, glad that it helped.


Re: [PATCH v7 00/16] Parial clone part 3: clone, fetch, fetch-pack, upload-pack, and tests

2017-12-11 Thread Jonathan Tan
On Fri, 8 Dec 2017 14:30:10 -0800
Brandon Williams  wrote:

> I just finished reading through parts 1-3.  Overall I like the series.
> There are a few point's that I'm not a big fan of but i wasn't able to
> come up with a better alternative.  One of these being the need for a
> global variable to tell the fetch-object logic to not go to the server
> to try and fetch a missing object.

I didn't really like that approach too but I went with that because,
like you, I couldn't come up with a better one. The main issue is that
too many functions (e.g. parse_commit() in commit.c) indirectly read
objects, and I couldn't find a better way to control them all. Ideally,
we should have a "struct object_store" (or maybe "struct repository"
could do this too) on which we can set "fetch_if_missing", and have all
object-reading functions take a pointer to this struct. Or completely
separate the object-reading and object-parsing code (e.g. commit.c
should not be able to read objects at all). Or both.

Any of these would be major undertakings, though, and there are good
reasons for why the same function does the reading and parsing (for
example, parse_commit() does not perform any reading if the object has
been already parsed).

> One other thing i noticed was it looks like when you discover that you
> are missing a blob you you'll try to fault it in from the server without
> first checking its an object the server would even have.  Shouldn't you
> first do a check to verify that the object in question is a promised
> object before you go out to contact the server to request it?  You may
> have already ruled this out for some reason I'm not aware of (maybe its
> too costly to compute?).

It is quite costly to compute - in the worst case, we would need to read
every object in every promisor packfile of one or more certain types
(e.g. if we know that we're fetching a blob, we need to read every tree)
to find out if the object we want is a promisor object.

Such a check would be better at surfacing mistakes (e.g. the user giving
the wrong SHA-1) early, but beyond that, I don't think that having the
check is very important. Consider these two very common situations:

 (1) Fetching a single branch by its tip's SHA-1. A naive implementation
 will first check if we have that SHA-1, which triggers the dynamic
 fetch (since it is an object read), and assuming success, notice
 that we indeed have that tip, and not fetch anything else. The
 check you describe will avoid this situation.
 (2) Dynamically fetching a missing blob by its SHA-1. A naive
 implementation will first check if we have that SHA-1, which
 triggers the dynamic fetch, and that fetch will first check if we
 have that SHA-1, and so on (thus, an infinite loop). The check you
 describe will not avoid that situation.

The check solves (1), but we still need a solution to (2) - I used
"fetch_if_missing", as discussed in your previous question and my answer
to that. A solution to (2) is usually also a solution to (1), so the
check wouldn't help much here.


Re: [PATCH] decorate: clean up and document API

2017-12-11 Thread Jonathan Tan
On Fri, 8 Dec 2017 04:55:11 -0500
Jeff King  wrote:

> I have mixed feelings. On the one hand, compiling and running the code
> ensures that those things actually work. On the other hand, I expect you
> can make a much clearer example if instead of having running code, you
> show snippets of almost-code.
> 
> E.g.:
> 
>   struct decoration d = { NULL };
> 
>   add_decoration(, obj, "foo");
>   ...
>   str = lookup_decoration(obj);
> 
> pretty much gives the relevant overview, with very little boilerplate.
> Yes, it omits things like the return value of add_decoration(), but
> those sorts of details are probably better left to the function
> docstrings.

The part about iterating over all entries should probably also be shown
too. Besides that, I'm OK with having a simplified example in
documentation too, but I'll wait and see if others have any opinions
before making any changes.

> Other than that philosophical point, the documentation you added looks
> pretty good to me. Two possible improvements to the API we could do on
> top:
> 
>   1. Should there be a DECORATION_INIT macro (possibly taking the "name"
>  as an argument)? (Actually, the whole name thing seems like a
>  confusing and bad API design in the first place).

Agreed about the "name" thing. I'll add a DECORATION_INIT when I make
the next reroll, but I think that having it with no argument is best
(and instantiating "name" with NULL).

>   2. This is really just an oidmap to a void pointer. I wonder if we
>  ought to be wrapping that code (I think we still want some
>  interface so that the caller doesn't have to declare their own
>  structs).

It is slightly different from oidmap in that this uses "struct object *"
as a key whereas oidmap uses "struct object_id", meaning that a user of
"decorate" must already have objects allocated or be willing to allocate
them, whereas a user of "oidmap" doesn't.

Having said that, it is true that perhaps we have too many data
structures doing similar things.


[PATCH] decorate: clean up and document API

2017-12-07 Thread Jonathan Tan
Improve the names of the identifiers in decorate.h, document them, and
add an example of how to use these functions.

The example is compiled and run as part of the test suite.

Signed-off-by: Jonathan Tan <jonathanta...@google.com>
---
This patch contains some example code in a test helper.

There is a discussion at [1] about where example code for hashmap should
go. For something relatively simple, like decorate, perhaps example code
isn't necessary; but if we include example code anyway, I think that we
might as well make it compiled and tested, so this patch contains that
example code in a test helper.

[1] 
https://public-inbox.org/git/466dd5331907754ca5c25cc83173ca895220ca81.1511999045.git.johannes.schinde...@gmx.de/
---
 Documentation/technical/api-decorate.txt |  6 ---
 Makefile |  1 +
 builtin/fast-export.c|  2 +-
 decorate.c   | 28 ++--
 decorate.h   | 49 +++--
 t/helper/.gitignore  |  1 +
 t/helper/test-example-decorate.c | 74 
 t/t9004-example.sh   | 10 +
 8 files changed, 146 insertions(+), 25 deletions(-)
 delete mode 100644 Documentation/technical/api-decorate.txt
 create mode 100644 t/helper/test-example-decorate.c
 create mode 100755 t/t9004-example.sh

diff --git a/Documentation/technical/api-decorate.txt 
b/Documentation/technical/api-decorate.txt
deleted file mode 100644
index 1d52a6ce1..0
--- a/Documentation/technical/api-decorate.txt
+++ /dev/null
@@ -1,6 +0,0 @@
-decorate API
-
-
-Talk about 
-
-(Linus)
diff --git a/Makefile b/Makefile
index fef9c8d27..df52cc1c3 100644
--- a/Makefile
+++ b/Makefile
@@ -651,6 +651,7 @@ TEST_PROGRAMS_NEED_X += test-dump-cache-tree
 TEST_PROGRAMS_NEED_X += test-dump-fsmonitor
 TEST_PROGRAMS_NEED_X += test-dump-split-index
 TEST_PROGRAMS_NEED_X += test-dump-untracked-cache
+TEST_PROGRAMS_NEED_X += test-example-decorate
 TEST_PROGRAMS_NEED_X += test-fake-ssh
 TEST_PROGRAMS_NEED_X += test-genrandom
 TEST_PROGRAMS_NEED_X += test-hashmap
diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index f8fe04ca5..796d0cd66 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -895,7 +895,7 @@ static void export_marks(char *file)
 {
unsigned int i;
uint32_t mark;
-   struct object_decoration *deco = idnums.hash;
+   struct decoration_entry *deco = idnums.entries;
FILE *f;
int e = 0;
 
diff --git a/decorate.c b/decorate.c
index 270eb2519..de31331fa 100644
--- a/decorate.c
+++ b/decorate.c
@@ -14,20 +14,20 @@ static unsigned int hash_obj(const struct object *obj, 
unsigned int n)
 static void *insert_decoration(struct decoration *n, const struct object 
*base, void *decoration)
 {
int size = n->size;
-   struct object_decoration *hash = n->hash;
+   struct decoration_entry *entries = n->entries;
unsigned int j = hash_obj(base, size);
 
-   while (hash[j].base) {
-   if (hash[j].base == base) {
-   void *old = hash[j].decoration;
-   hash[j].decoration = decoration;
+   while (entries[j].base) {
+   if (entries[j].base == base) {
+   void *old = entries[j].decoration;
+   entries[j].decoration = decoration;
return old;
}
if (++j >= size)
j = 0;
}
-   hash[j].base = base;
-   hash[j].decoration = decoration;
+   entries[j].base = base;
+   entries[j].decoration = decoration;
n->nr++;
return NULL;
 }
@@ -36,24 +36,23 @@ static void grow_decoration(struct decoration *n)
 {
int i;
int old_size = n->size;
-   struct object_decoration *old_hash = n->hash;
+   struct decoration_entry *old_entries = n->entries;
 
n->size = (old_size + 1000) * 3 / 2;
-   n->hash = xcalloc(n->size, sizeof(struct object_decoration));
+   n->entries = xcalloc(n->size, sizeof(struct decoration_entry));
n->nr = 0;
 
for (i = 0; i < old_size; i++) {
-   const struct object *base = old_hash[i].base;
-   void *decoration = old_hash[i].decoration;
+   const struct object *base = old_entries[i].base;
+   void *decoration = old_entries[i].decoration;
 
if (!decoration)
continue;
insert_decoration(n, base, decoration);
}
-   free(old_hash);
+   free(old_entries);
 }
 
-/* Add a decoration pointer, return any old one */
 void *add_decoration(struct decoration *n, const struct object *obj,
void *decoration)
 {
@@ -64,7 +63,6 @@ void *add_decoration(struct decoration *n, const struct 
object *obj,
return insert_decoratio

Re: [PATCH v6 03/12] fsck: support refs pointing to promisor objects

2017-12-07 Thread Jonathan Tan
On Thu, 7 Dec 2017 11:18:52 -0800
Brandon Williams  wrote:

> Instead of requiring that every test first removes 'repo', maybe you
> want to have each test do its own cleanup by adding in
> 'test_when_finished' lines to do the removals?  Just a thought.

If "test_when_finished" is the style that we plan to use in the project,
we can do that.

But I think the "rm -rf" at the beginning of a test method is better
than "test_when_finished", though. It makes the test independent (for
example, the addition or removal of tests before such a test is less
likely to affect that test), and makes it clear if (and how) the test
does its own setup as opposed to requiring the setup from another test
block.


Re: [PATCH v6 00/12] Partial clone part 2: fsck and promisors

2017-12-05 Thread Jonathan Tan
On Tue,  5 Dec 2017 16:58:42 +
Jeff Hostetler <g...@jeffhostetler.com> wrote:

> From: Jeff Hostetler <jeffh...@microsoft.com>
> 
> This is V6 of part 2 of partial clone.  This assumes V6 of part 1
> is already present.  This version fixes a problem in fetch-pack
> observed in V5.  It also contains 2 "fixup" commits that are
> WIP responses to comments on V5.

A note on the fix of a problem in fetch-pack observed in V5: to do this,
I renamed the "no_haves" setting to "no_dependents", since this setting
now has a broader effect than merely suppressing "have" lines. This
setting is described in patch 7 ("introduce fetch-object: fetch one
promisor object").

> Part 2 is concerned with fsck, gc, initial support for dynamic
> object fetching, and tracking promisor objects.  Jonathan Tan
> originally developed this code.  I have moved it on top of
> part 1 and updated it slightly.

Thanks. I checked the diff between this and V5 and it looks as I
expected. ("git am -3" didn't work on the patches as e-mailed to the
list, though - I had to use the one hosted at GitHub [1].)

[1] https://github.com/jeffhostetler/git/tree/core/pc6_p2


Re: What's cooking in git.git (Dec 2017, #01; Mon, 4)

2017-12-04 Thread Jonathan Tan
On Mon, 04 Dec 2017 13:46:43 -0800
Junio C Hamano  wrote:

> * jt/diff-anchored-patience (2017-11-28) 1 commit
>  - diff: support anchoring line(s)
> 
>  "git diff" learned a variant of the "--patience" algorithm, to
>  which the user can specify which 'unique' line to be used as
>  anchoring points.

Is there anything I can do to progress this? Johannes Schindelin wrote
that he is in favor of this feature [1]. He also remarked on the fact
that the anchors are not freed [2], but I have replied [3] that it seems
that diff arguments are not freed in general (so there would be no good
place to put the freeing statements).

[1] 
https://public-inbox.org/git/alpine.DEB.2.21.1.1712011447550.98586@virtualbox/
[2] 
https://public-inbox.org/git/alpine.DEB.2.21.1.1711300134560.6482@virtualbox/
[3] 
https://public-inbox.org/git/20171130152605.1b775e9cc2ddd7f917424...@google.com/


[WIP 2/2] submodule: read-only super-backed ref backend

2017-12-01 Thread Jonathan Tan
Note that a few major parts are still missing:
 - special handling of the current branch of the superproject
 - writing (whether "refs/..." to the superproject as an index change or
   a commit, or non-"refs/..." directly to the subproject like usual)

Signed-off-by: Jonathan Tan <jonathanta...@google.com>
---
 Makefile   |   1 +
 refs.c |  11 +-
 refs/refs-internal.h   |   1 +
 refs/sp-backend.c  | 261 +
 submodule.c|  43 +--
 submodule.h|   2 +
 t/t1406-submodule-ref-store.sh |  26 
 7 files changed, 331 insertions(+), 14 deletions(-)
 create mode 100644 refs/sp-backend.c

diff --git a/Makefile b/Makefile
index e53750ca0..74120b5d7 100644
--- a/Makefile
+++ b/Makefile
@@ -858,6 +858,7 @@ LIB_OBJS += refs/files-backend.o
 LIB_OBJS += refs/iterator.o
 LIB_OBJS += refs/packed-backend.o
 LIB_OBJS += refs/ref-cache.o
+LIB_OBJS += refs/sp-backend.o
 LIB_OBJS += ref-filter.o
 LIB_OBJS += remote.o
 LIB_OBJS += replace_object.o
diff --git a/refs.c b/refs.c
index 339d4318e..1f7922733 100644
--- a/refs.c
+++ b/refs.c
@@ -1575,12 +1575,17 @@ static struct ref_store *lookup_ref_store_map(struct 
hashmap *map,
 static struct ref_store *ref_store_init(const char *gitdir,
unsigned int flags)
 {
-   const char *be_name = "files";
-   struct ref_storage_be *be = find_ref_storage_backend(be_name);
+   struct ref_storage_be *be;
struct ref_store *refs;
 
+   if (getenv("USE_SP")) {
+   be = _be_sp;
+   } else {
+   be = _be_files;
+   }
+
if (!be)
-   die("BUG: reference backend %s is unknown", be_name);
+   die("BUG: reference backend %s is unknown", "files");
 
refs = be->init(gitdir, flags);
return refs;
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index dd834314b..a8ec03d90 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -651,6 +651,7 @@ struct ref_storage_be {
 
 extern struct ref_storage_be refs_be_files;
 extern struct ref_storage_be refs_be_packed;
+extern struct ref_storage_be refs_be_sp;
 
 /*
  * A representation of the reference store for the main repository or
diff --git a/refs/sp-backend.c b/refs/sp-backend.c
new file mode 100644
index 0..31e8cec4b
--- /dev/null
+++ b/refs/sp-backend.c
@@ -0,0 +1,261 @@
+#include "../cache.h"
+#include "../config.h"
+#include "../refs.h"
+#include "refs-internal.h"
+#include "ref-cache.h"
+#include "packed-backend.h"
+#include "../iterator.h"
+#include "../dir-iterator.h"
+#include "../lockfile.h"
+#include "../object.h"
+#include "../dir.h"
+#include "../submodule.h"
+
+/*
+ * Future: need to be in "struct repository"
+ * when doing a full libification.
+ */
+struct sp_ref_store {
+   struct ref_store base;
+   unsigned int store_flags;
+
+   /*
+* Ref store of this repository (the submodule), used only for the
+* reflog.
+*/
+   struct ref_store *files;
+
+   /*
+* Ref store of the superproject, for refs.
+*/
+   struct ref_store *files_superproject;
+};
+
+/*
+ * Create a new submodule ref cache and add it to the internal
+ * set of caches.
+ */
+static struct ref_store *sp_init(const char *gitdir, unsigned int flags)
+{
+   struct sp_ref_store *refs = xcalloc(1, sizeof(*refs));
+   struct ref_store *ref_store = (struct ref_store *)refs;
+
+   base_ref_store_init(ref_store, _be_sp);
+   refs->store_flags = flags;
+   refs->files = refs_be_files.init(gitdir, flags);
+
+   return ref_store;
+}
+
+/*
+ * Downcast ref_store to sp_ref_store. Die if ref_store is not a
+ * sp_ref_store. required_flags is compared with ref_store's
+ * store_flags to ensure the ref_store has all required capabilities.
+ * "caller" is used in any necessary error messages.
+ */
+static struct sp_ref_store *sp_downcast(struct ref_store *ref_store,
+ unsigned int required_flags,
+ const char *caller)
+{
+   struct sp_ref_store *refs;
+
+   if (ref_store->be != _be_sp)
+   die("BUG: ref_store is type \"%s\" not \"sp\" in %s",
+   ref_store->be->name, caller);
+
+   refs = (struct sp_ref_store *)ref_store;
+
+   if ((refs->store_flags & required_flags) != required_flags)
+   die("BUG: operation %s requires abilities 0x%x, but only have 
0x%x",
+   caller, required_flags, refs->store_flags);
+
+   return refs;
+}
+
+static int sp_read_raw_ref(s

[WIP 0/2] Submodule ref backend that mirrors superproject

2017-12-01 Thread Jonathan Tan
I sent out an earlier email [1] that discusses the idea of a submodule
ref backend that mirrors the superproject. Basically, if this backend is
used (for example, through a configuration option), the submodule itself
will not store any "refs/..." refs, but will check the gitlink of the
commit of the ref of the same name in the superproject. For example, if
the superproject has at refs/heads/foo:

superproject/
  sub [gitlink -> 1234...]

and at refs/heads/bar:

superproject/
  sub [gitlink -> 5678...]

Inside sub, "git rev-parse foo" will output "1234...", and "git rev-parse
bar" will output "5678...", even though "foo" and "bar" are not defined
anywhere inside sub.

(The submodule handles refs that do not start with "refs/" - for
example, HEAD and FETCH_HEAD - like usual.)

[1] also describes what happens when the submodule attempts to write to
any "refs/..." ref.

For those interested, here's what such an implementation might look
like, and a test to demonstrate such functionality. I have partial
read-only functionality - a lot of it still remains to be done.

[1] 
https://public-inbox.org/git/20171108172945.33c42a0e91b4ac494217b...@google.com/

Jonathan Tan (2):
  submodule: refactor acquisition of superproject info
  submodule: read-only super-backed ref backend

 Makefile   |   1 +
 refs.c |  11 +-
 refs/refs-internal.h   |   1 +
 refs/sp-backend.c  | 261 +
 submodule.c| 107 +++--
 submodule.h|   5 +
 t/t1406-submodule-ref-store.sh |  26 
 7 files changed, 374 insertions(+), 38 deletions(-)
 create mode 100644 refs/sp-backend.c

-- 
2.15.0.531.g2ccb3012c9-goog



[WIP 1/2] submodule: refactor acquisition of superproject info

2017-12-01 Thread Jonathan Tan
Signed-off-by: Jonathan Tan <jonathanta...@google.com>
---
 submodule.c | 76 +
 submodule.h |  3 +++
 2 files changed, 49 insertions(+), 30 deletions(-)

diff --git a/submodule.c b/submodule.c
index bb531e0e5..ce511180e 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1977,16 +1977,24 @@ void absorb_git_dir_into_superproject(const char 
*prefix,
}
 }
 
-const char *get_superproject_working_tree(void)
+/*
+ * Get the full filename of the gitlink entry corresponding to the
+ * superproject having this repo as a submodule.
+ *
+ * full_name will point to an internal buffer upon success.
+ *
+ * Return 0 if successful, 1 if not (for example, if the parent
+ * directory is not a repo or an unrelated one).
+ */
+int get_superproject_entry(const char **full_name)
 {
+   static struct strbuf sb = STRBUF_INIT;
+
struct child_process cp = CHILD_PROCESS_INIT;
-   struct strbuf sb = STRBUF_INIT;
const char *one_up = real_path_if_valid("../");
const char *cwd = xgetcwd();
-   const char *ret = NULL;
const char *subpath;
int code;
-   ssize_t len;
 
if (!is_inside_work_tree())
/*
@@ -1994,11 +2002,15 @@ const char *get_superproject_working_tree(void)
 * We might have a superproject, but it is harder
 * to determine.
 */
-   return NULL;
+   return 1;
 
if (!one_up)
-   return NULL;
+   return 1;
 
+   /*
+* No need to reset sb, since relative_path() will do it if
+* necessary.
+*/
subpath = relative_path(cwd, one_up, );
 
prepare_submodule_repo_env(_array);
@@ -2017,45 +2029,49 @@ const char *get_superproject_working_tree(void)
if (start_command())
die(_("could not start ls-files in .."));
 
-   len = strbuf_read(, cp.out, PATH_MAX);
+   strbuf_read(, cp.out, PATH_MAX);
close(cp.out);
 
-   if (starts_with(sb.buf, "16")) {
-   int super_sub_len;
-   int cwd_len = strlen(cwd);
-   char *super_sub, *super_wt;
+   code = finish_command();
 
+   if (starts_with(sb.buf, "16")) {
/*
 * There is a superproject having this repo as a submodule.
 * The format is  SP  SP  TAB  \0,
 * We're only interested in the name after the tab.
 */
-   super_sub = strchr(sb.buf, '\t') + 1;
-   super_sub_len = sb.buf + sb.len - super_sub - 1;
-
-   if (super_sub_len > cwd_len ||
-   strcmp([cwd_len - super_sub_len], super_sub))
-   die (_("BUG: returned path string doesn't match cwd?"));
-
-   super_wt = xstrdup(cwd);
-   super_wt[cwd_len - super_sub_len] = '\0';
-
-   ret = real_path(super_wt);
-   free(super_wt);
+   char *tab = strchr(sb.buf, '\t');
+   if (!tab)
+   die("BUG: ls-files returned line with no tab");
+   *full_name = tab + 1;
+   return 0;
}
-   strbuf_release();
-
-   code = finish_command();
 
if (code == 128)
/* '../' is not a git repository */
-   return NULL;
-   if (code == 0 && len == 0)
+   return 1;
+   if (code == 0)
/* There is an unrelated git repository at '../' */
+   return 1;
+   die(_("ls-tree returned unexpected return code %d"), code);
+}
+
+const char *get_superproject_working_tree(void)
+{
+   const char *full_name;
+   char *super_wt;
+   size_t len;
+   const char *ret;
+
+   if (get_superproject_entry(_name))
return NULL;
-   if (code)
-   die(_("ls-tree returned unexpected return code %d"), code);
 
+   super_wt = xstrdup(xgetcwd());
+   if (!strip_suffix(super_wt, full_name, ))
+   die("BUG: returned path string doesn't match cwd?");
+   super_wt[len] = '\0';
+   ret = real_path(super_wt);
+   free(super_wt);
return ret;
 }
 
diff --git a/submodule.h b/submodule.h
index f0da0277a..29ab302cc 100644
--- a/submodule.h
+++ b/submodule.h
@@ -134,6 +134,9 @@ extern void absorb_git_dir_into_superproject(const char 
*prefix,
  * Return the absolute path of the working tree of the superproject, which this
  * project is a submodule of. If this repository is not a submodule of
  * another repository, return NULL.
+ *
+ * The return value, if not NULL, points to the same shared buffer as the one
+ * returned by real_path().
  */
 extern const char *get_superproject_working_tree(void);
 
-- 
2.15.0.531.g2ccb3012c9-goog



Re: [PATCH v3] diff: support anchoring line(s)

2017-11-30 Thread Jonathan Tan
On Thu, 30 Nov 2017 01:36:37 +0100 (CET)
Johannes Schindelin <johannes.schinde...@gmx.de> wrote:

> Hi Jonathan,
> 
> On Tue, 28 Nov 2017, Jonathan Tan wrote:
> 
> > @@ -4607,7 +4627,14 @@ int diff_opt_parse(struct diff_options *options,
> > DIFF_XDL_CLR(options, NEED_MINIMAL);
> > options->xdl_opts &= ~XDF_DIFF_ALGORITHM_MASK;
> > options->xdl_opts |= value;
> > +   if (value == XDF_PATIENCE_DIFF)
> > +   clear_patience_anchors(options);
> > return argcount;
> > +   } else if (skip_prefix(arg, "--anchored=", )) {
> > +   options->xdl_opts = DIFF_WITH_ALG(options, PATIENCE_DIFF);
> > +   ALLOC_GROW(options->anchors, options->anchors_nr + 1,
> > +  options->anchors_alloc);
> > +   options->anchors[options->anchors_nr++] = xstrdup(arg);
> 
> I looked and failed to find the code that releases this array after the
> diff is done... did I miss anything?

You didn't miss anything. As far as I can tell, occurrences of struct
diff_options live throughout the lifetime of an invocation of Git and
are not freed. (Even if the struct itself is allocated on the stack, its
pathspec has some elements allocated on the heap.)


[PATCH on en/rename-progress v2] diffcore-rename: make diff-tree -l0 mean -l

2017-11-29 Thread Jonathan Tan
In the documentation of diff-tree, it is stated that the -l option
"prevents rename/copy detection from running if the number of
rename/copy targets exceeds the specified number". The documentation
does not mention any special handling for the number 0, but the
implementation before commit 9f7e4bfa3b ("diff: remove silent clamp of
renameLimit", 2017-11-13) treated 0 as a special value indicating that
the rename limit is to be a very large number instead.

The commit 9f7e4bfa3b changed that behavior, treating 0 as 0. Revert
this behavior to what it was previously. This allows existing scripts
and tools that use "-l0" to continue working. The alternative (to have
"-l0" suppress rename detection) is probably much less useful, since
users can just refrain from specifying -M and/or -C to have the same
effect.

Signed-off-by: Jonathan Tan <jonathanta...@google.com>
---
There are multiple issues with the commit message, so I am sending this
new version out.

v2 is exactly the same as previously, except that the commit message is
changed following Elijah Newren's and Jonathan Nieder's comments.
---
 diffcore-rename.c  |  2 ++
 t/t4001-diff-rename.sh | 15 +++
 2 files changed, 17 insertions(+)

diff --git a/diffcore-rename.c b/diffcore-rename.c
index 9ca0eaec7..245e999fe 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -392,6 +392,8 @@ static int too_many_rename_candidates(int num_create,
 *
 *num_create * num_src > rename_limit * rename_limit
 */
+   if (rename_limit <= 0)
+   rename_limit = 32767;
if ((num_create <= rename_limit || num_src <= rename_limit) &&
((uint64_t)num_create * (uint64_t)num_src
 <= (uint64_t)rename_limit * (uint64_t)rename_limit))
diff --git a/t/t4001-diff-rename.sh b/t/t4001-diff-rename.sh
index 0d1fa45d2..eadf4f624 100755
--- a/t/t4001-diff-rename.sh
+++ b/t/t4001-diff-rename.sh
@@ -230,4 +230,19 @@ test_expect_success 'rename pretty print common prefix and 
suffix overlap' '
test_i18ngrep " d/f/{ => f}/e " output
 '
 
+test_expect_success 'diff-tree -l0 defaults to a big rename limit, not zero' '
+   test_write_lines line1 line2 line3 >myfile &&
+   git add myfile &&
+   git commit -m x &&
+
+   test_write_lines line1 line2 line4 >myotherfile &&
+   git rm myfile &&
+   git add myotherfile &&
+   git commit -m x &&
+
+   git diff-tree -M -l0 HEAD HEAD^ >actual &&
+   # Verify that a rename from myotherfile to myfile was detected
+   grep "myotherfile.*myfile" actual
+'
+
 test_done
-- 
2.15.0.173.g9268cf4a2.dirty



Re: [PATCH on en/rename-progress] diffcore-rename: make diff-tree -l0 mean -l

2017-11-29 Thread Jonathan Tan
On Wed, 29 Nov 2017 10:51:20 -0800
Elijah Newren  wrote:

> Thanks for testing that version and sending along the fix.
> 
> I suspect the commit referenced twice in the commit message should
> have been 9f7e4bfa3b ("diff: remove silent clamp of renameLimit",
> 2017-11-13) rather than b520abf ("sequencer: warn when internal merge
> may be suboptimal due to renameLimit", 2017-11-14).

Ah...yes, you're right. I'll update the commit message if I need to send
out a new version or if Junio requests it (so that it's easier for him
to merge it in).

> Other than that minor issue, patch and test looks good to me.

Thanks.


[PATCH on en/rename-progress] diffcore-rename: make diff-tree -l0 mean -l

2017-11-29 Thread Jonathan Tan
In the documentation of diff-tree, it is stated that the -l option
"prevents rename/copy detection from running if the number of
rename/copy targets exceeds the specified number". The documentation
does not mention any special handling for the number 0, but the
implementation before commit b520abf ("sequencer: warn when internal
merge may be suboptimal due to renameLimit", 2017-11-14) treated 0 as a
special value indicating that the rename limit is to be a very large
number instead.

The commit b520abf changed that behavior, treating 0 as 0. Revert this
behavior to what it was previously. This allows existing scripts and
tools that use "-l0" to continue working. The alternative (to allow
"-l0") is probably much less useful, since users can just refrain from
specifying -M and/or -C to have the same effect.

Signed-off-by: Jonathan Tan <jonathanta...@google.com>
---
Note that this patch is built on en/rename-progress.

We noticed this through an automated test for an internal tool - the
tool uses git diff-tree with -l0, and no longer produces the same
results as before.
---
 diffcore-rename.c  |  2 ++
 t/t4001-diff-rename.sh | 15 +++
 2 files changed, 17 insertions(+)

diff --git a/diffcore-rename.c b/diffcore-rename.c
index 9ca0eaec7..245e999fe 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -392,6 +392,8 @@ static int too_many_rename_candidates(int num_create,
 *
 *num_create * num_src > rename_limit * rename_limit
 */
+   if (rename_limit <= 0)
+   rename_limit = 32767;
if ((num_create <= rename_limit || num_src <= rename_limit) &&
((uint64_t)num_create * (uint64_t)num_src
 <= (uint64_t)rename_limit * (uint64_t)rename_limit))
diff --git a/t/t4001-diff-rename.sh b/t/t4001-diff-rename.sh
index 0d1fa45d2..eadf4f624 100755
--- a/t/t4001-diff-rename.sh
+++ b/t/t4001-diff-rename.sh
@@ -230,4 +230,19 @@ test_expect_success 'rename pretty print common prefix and 
suffix overlap' '
test_i18ngrep " d/f/{ => f}/e " output
 '
 
+test_expect_success 'diff-tree -l0 defaults to a big rename limit, not zero' '
+   test_write_lines line1 line2 line3 >myfile &&
+   git add myfile &&
+   git commit -m x &&
+
+   test_write_lines line1 line2 line4 >myotherfile &&
+   git rm myfile &&
+   git add myotherfile &&
+   git commit -m x &&
+
+   git diff-tree -M -l0 HEAD HEAD^ >actual &&
+   # Verify that a rename from myotherfile to myfile was detected
+   grep "myotherfile.*myfile" actual
+'
+
 test_done
-- 
2.15.0.173.g9268cf4a2.dirty



[PATCH v3] diff: support anchoring line(s)

2017-11-28 Thread Jonathan Tan
Teach diff a new algorithm, one that attempts to prevent user-specified
lines from appearing as a deletion or addition in the end result. The
end user can use this by specifying "--anchored=" one or more
times when using Git commands like "diff" and "show".

Signed-off-by: Jonathan Tan <jonathanta...@google.com>
---
v3 addresses Junio's comments.

> This makes sense, but "--diff-algorithm=patience" would want to do
> the same, I suspect, so the loop would want to become a little
> helper function "clear_patience_anchors(options)" or something like
> that.

Done. Also updated a test to test the interaction of
"--diff-algorithm=patience" with "--anchored=<>".

> This may be too little to matter, but I'd find
>
>   printf "%s\n" a b c >pre
>
> vastly easier to read.  Or perhaps just use
>
>   test_write_lines a b c >pre

Thanks for the pointer to test_write_lines - I used that and it is
indeed clearer.
---
 Documentation/diff-options.txt |  10 +
 diff.c |  31 -
 diff.h |   4 ++
 t/t4064-diff-anchored.sh   | 100 +
 xdiff/xdiff.h  |   4 ++
 xdiff/xpatience.c  |  42 ++---
 6 files changed, 184 insertions(+), 7 deletions(-)
 create mode 100755 t/t4064-diff-anchored.sh

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index dd0dba5b1..6ce39fb69 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -80,6 +80,16 @@ endif::git-format-patch[]
 --histogram::
Generate a diff using the "histogram diff" algorithm.
 
+--anchored=::
+   Generate a diff using the "anchored diff" algorithm.
++
+This option may be specified more than once.
++
+If a line exists in both the source and destination, exists only once,
+and starts with this text, this algorithm attempts to prevent it from
+appearing as a deletion or addition in the output. It uses the "patience
+diff" algorithm internally.
+
 --diff-algorithm={patience|minimal|histogram|myers}::
Choose a diff algorithm. The variants are as follows:
 +
diff --git a/diff.c b/diff.c
index 0763e8926..7149ff627 100644
--- a/diff.c
+++ b/diff.c
@@ -3210,6 +3210,8 @@ static void builtin_diff(const char *name_a,
ecbdata.opt = o;
ecbdata.header = header.len ?  : NULL;
xpp.flags = o->xdl_opts;
+   xpp.anchors = o->anchors;
+   xpp.anchors_nr = o->anchors_nr;
xecfg.ctxlen = o->context;
xecfg.interhunkctxlen = o->interhunkcontext;
xecfg.flags = XDL_EMIT_FUNCNAMES;
@@ -3302,6 +3304,8 @@ static void builtin_diffstat(const char *name_a, const 
char *name_b,
memset(, 0, sizeof(xpp));
memset(, 0, sizeof(xecfg));
xpp.flags = o->xdl_opts;
+   xpp.anchors = o->anchors;
+   xpp.anchors_nr = o->anchors_nr;
xecfg.ctxlen = o->context;
xecfg.interhunkctxlen = o->interhunkcontext;
if (xdi_diff_outf(, , diffstat_consume, diffstat,
@@ -4487,6 +4491,21 @@ static int parse_ws_error_highlight_opt(struct 
diff_options *opt, const char *ar
return 1;
 }
 
+/*
+ * Clear the anchors configured for the PATIENCE_DIFF algorithm.
+ *
+ * This must be called whenever a command-line argument indicating that
+ * the patience algorithm is to be used is seen, because the patience
+ * and anchored algorithms both use PATIENCE_DIFF internally.
+ */
+static void clear_patience_anchors(struct diff_options *options)
+{
+   int i;
+   for (i = 0; i < options->anchors_nr; i++)
+   free(options->anchors[i]);
+   options->anchors_nr = 0;
+}
+
 int diff_opt_parse(struct diff_options *options,
   const char **av, int ac, const char *prefix)
 {
@@ -4594,9 +4613,10 @@ int diff_opt_parse(struct diff_options *options,
DIFF_XDL_SET(options, INDENT_HEURISTIC);
else if (!strcmp(arg, "--no-indent-heuristic"))
DIFF_XDL_CLR(options, INDENT_HEURISTIC);
-   else if (!strcmp(arg, "--patience"))
+   else if (!strcmp(arg, "--patience")) {
options->xdl_opts = DIFF_WITH_ALG(options, PATIENCE_DIFF);
-   else if (!strcmp(arg, "--histogram"))
+   clear_patience_anchors(options);
+   } else if (!strcmp(arg, "--histogram"))
options->xdl_opts = DIFF_WITH_ALG(options, HISTOGRAM_DIFF);
else if ((argcount = parse_long_opt("diff-algorithm", av, ))) {
long value = parse_algorithm_value(optarg);
@@ -4607,7 +4627,14 @@ int diff_opt_parse(struct diff_options *op

[PATCH v2] diff: support anchoring line(s)

2017-11-27 Thread Jonathan Tan
Teach diff a new algorithm, one that attempts to prevent user-specified
lines from appearing as a deletion or addition in the end result. The
end user can use this by specifying "--anchored=" one or more
times when using Git commands like "diff" and "show".

Signed-off-by: Jonathan Tan <jonathanta...@google.com>
---
In v2, the UI treats it as a new algorithm. (Internally it's still an
option for "patience".)

Stefan suggested that I add a test where a single --anchored=<> matches
multiple lines, and that reminded me that the patience algorithm only
really works on unique lines. I have updated the documentation to remark
on that, and also added a test to show what happens. I have also added a
test to show that later algorithm options override earlier ones.
---
 Documentation/diff-options.txt | 10 +
 diff.c | 22 +-
 diff.h |  4 ++
 t/t4064-diff-anchored.sh   | 94 ++
 xdiff/xdiff.h  |  4 ++
 xdiff/xpatience.c  | 42 ---
 6 files changed, 169 insertions(+), 7 deletions(-)
 create mode 100755 t/t4064-diff-anchored.sh

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index dd0dba5b1..6ce39fb69 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -80,6 +80,16 @@ endif::git-format-patch[]
 --histogram::
Generate a diff using the "histogram diff" algorithm.
 
+--anchored=::
+   Generate a diff using the "anchored diff" algorithm.
++
+This option may be specified more than once.
++
+If a line exists in both the source and destination, exists only once,
+and starts with this text, this algorithm attempts to prevent it from
+appearing as a deletion or addition in the output. It uses the "patience
+diff" algorithm internally.
+
 --diff-algorithm={patience|minimal|histogram|myers}::
Choose a diff algorithm. The variants are as follows:
 +
diff --git a/diff.c b/diff.c
index 0763e8926..98aa638f8 100644
--- a/diff.c
+++ b/diff.c
@@ -3210,6 +3210,8 @@ static void builtin_diff(const char *name_a,
ecbdata.opt = o;
ecbdata.header = header.len ?  : NULL;
xpp.flags = o->xdl_opts;
+   xpp.anchors = o->anchors;
+   xpp.anchors_nr = o->anchors_nr;
xecfg.ctxlen = o->context;
xecfg.interhunkctxlen = o->interhunkcontext;
xecfg.flags = XDL_EMIT_FUNCNAMES;
@@ -3302,6 +3304,8 @@ static void builtin_diffstat(const char *name_a, const 
char *name_b,
memset(, 0, sizeof(xpp));
memset(, 0, sizeof(xecfg));
xpp.flags = o->xdl_opts;
+   xpp.anchors = o->anchors;
+   xpp.anchors_nr = o->anchors_nr;
xecfg.ctxlen = o->context;
xecfg.interhunkctxlen = o->interhunkcontext;
if (xdi_diff_outf(, , diffstat_consume, diffstat,
@@ -4594,9 +4598,18 @@ int diff_opt_parse(struct diff_options *options,
DIFF_XDL_SET(options, INDENT_HEURISTIC);
else if (!strcmp(arg, "--no-indent-heuristic"))
DIFF_XDL_CLR(options, INDENT_HEURISTIC);
-   else if (!strcmp(arg, "--patience"))
+   else if (!strcmp(arg, "--patience")) {
+   int i;
options->xdl_opts = DIFF_WITH_ALG(options, PATIENCE_DIFF);
-   else if (!strcmp(arg, "--histogram"))
+   /*
+* Both --patience and --anchored use PATIENCE_DIFF
+* internally, so remove any anchors previously
+* specified.
+*/
+   for (i = 0; i < options->anchors_nr; i++)
+   free(options->anchors[i]);
+   options->anchors_nr = 0;
+   } else if (!strcmp(arg, "--histogram"))
options->xdl_opts = DIFF_WITH_ALG(options, HISTOGRAM_DIFF);
else if ((argcount = parse_long_opt("diff-algorithm", av, ))) {
long value = parse_algorithm_value(optarg);
@@ -4608,6 +4621,11 @@ int diff_opt_parse(struct diff_options *options,
options->xdl_opts &= ~XDF_DIFF_ALGORITHM_MASK;
options->xdl_opts |= value;
return argcount;
+   } else if (skip_prefix(arg, "--anchored=", )) {
+   options->xdl_opts = DIFF_WITH_ALG(options, PATIENCE_DIFF);
+   ALLOC_GROW(options->anchors, options->anchors_nr + 1,
+  options->anchors_alloc);
+   options->anchors[options->anchors_nr++] = xstrdup(arg);
}
 
/* flags options */
diff --git a/diff.h b/diff.h
index 0fb18dd73..7cf276f07 100644
--- a/diff.h
+++ b/diff.h
@@ -166,6 +166,10 @@ s

Re: [PATCH] xdiff/xpatience: support anchoring line(s)

2017-11-27 Thread Jonathan Tan
On Thu, 23 Nov 2017 11:47:02 +0900
Junio C Hamano  wrote:

> Thinking about this a bit more, I do like the basic idea of the UI
> even better.  What we could do is to sell this to the end users as a
> new kind of diff algorithm choice (i.e. myers, patience, ... will
> gain a new friend) that internally happens to be implemented by
> piggybacking on patience (just like minimal is piggybacking on
> myers) and call it "anchor".  Then just like this command line
> 
> git diff --histogram --patience
> 
> makes the last one win without complaint, it is sane that these
> command lines
> 
> git diff --histogram --anchored=
> git diff --anchored= --histogram
> 
> make the last one win without complaint, either.
> 
> Hmm?

This sounds good. There will be a bit of inconsistency in that in "git
diff --anchored= --anchored=", it is not the last
one that wins, but both of them will in fact be used. But I think that
in practice, this will be fine.

I'll send out another version with this UI (and with Stefan's test
suggestion).


[PATCH] xdiff/xpatience: support anchoring line(s)

2017-11-22 Thread Jonathan Tan
Teach the patience diff to attempt preventing user-specified lines from
appearing as a deletion or addition in the end result. The end user can
use this by specifying "--anchor=" one or more times when using
Git commands like "diff" and "show".

Signed-off-by: Jonathan Tan <jonathanta...@google.com>
---
Actual patch instead of RFC.

One thing that might help is to warn if --anchor is used without
--patience, but I couldn't find a good place to put that warning. Let me
know if you know of a good place.

Replying to Stefan's and Junio's comments:

> The solution you provide is a good thing to experiment with, but
> longer term, I would want to have huge record of configs in which
> humans selected the best diff, such that we can use that data
> to reason about better automatic diff generation.
> The diff heuristic was based on a lot of human generated data,
> that was generated by Michael at the time. I wonder if we want to
> permanently store the anchor so the data collection will happen
> automatically over time.

I think machine learning is beyond the scope of this patch :-)

> or rather: "c is not moved, we don't care how the diff actually looks
> like",
> so maybe
>   ! grep "+c" diff

I think it's less error-prone to show "a" moving. With this, if the
command somehow prints nothing, the test would still pass.

> While this is RFC, I should not expect comments, though it would
> be nice to have them in the final series. ;-)

Added comments :-)

> This is a natural extension of the idea the patience algorithm is
> built upon.  If this were a cumulative command line option that can
> be given to specify multiple lines and can be used across the diff
> family, it would make a welcome addition, I would think.

Specifying multiple lines - done.

By diff family, do you mean "diff", "show", and others? If yes, that has
also been done in this patch.
---
 Documentation/diff-options.txt |  9 +++
 diff.c |  8 ++
 diff.h |  4 +++
 t/t4033-diff-patience.sh   | 59 ++
 xdiff/xdiff.h  |  4 +++
 xdiff/xpatience.c  | 42 ++
 6 files changed, 121 insertions(+), 5 deletions(-)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index dd0dba5b1..b17d62696 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -100,6 +100,15 @@ For instance, if you configured diff.algorithm variable to 
a
 non-default value and want to use the default one, then you
 have to use `--diff-algorithm=default` option.
 
+--anchor=::
+   This option may be specified more than once.
++
+If the "patience diff" algorithm is used, Git attempts to prevent lines
+starting with this text from appearing as a deletion or addition in the
+output.
++
+If another diff algorithm is used, this has no effect.
+
 --stat[=[,[,]]]::
Generate a diffstat. By default, as much space as necessary
will be used for the filename part, and the rest for the graph
diff --git a/diff.c b/diff.c
index 0763e8926..29a7176ee 100644
--- a/diff.c
+++ b/diff.c
@@ -3210,6 +3210,8 @@ static void builtin_diff(const char *name_a,
ecbdata.opt = o;
ecbdata.header = header.len ?  : NULL;
xpp.flags = o->xdl_opts;
+   xpp.anchors = o->anchors;
+   xpp.anchors_nr = o->anchors_nr;
xecfg.ctxlen = o->context;
xecfg.interhunkctxlen = o->interhunkcontext;
xecfg.flags = XDL_EMIT_FUNCNAMES;
@@ -3302,6 +3304,8 @@ static void builtin_diffstat(const char *name_a, const 
char *name_b,
memset(, 0, sizeof(xpp));
memset(, 0, sizeof(xecfg));
xpp.flags = o->xdl_opts;
+   xpp.anchors = o->anchors;
+   xpp.anchors_nr = o->anchors_nr;
xecfg.ctxlen = o->context;
xecfg.interhunkctxlen = o->interhunkcontext;
if (xdi_diff_outf(, , diffstat_consume, diffstat,
@@ -4608,6 +4612,10 @@ int diff_opt_parse(struct diff_options *options,
options->xdl_opts &= ~XDF_DIFF_ALGORITHM_MASK;
options->xdl_opts |= value;
return argcount;
+   } else if (skip_prefix(arg, "--anchor=", )) {
+   ALLOC_GROW(options->anchors, options->anchors_nr + 1,
+  options->anchors_alloc);
+   options->anchors[options->anchors_nr++] = xstrdup(arg);
}
 
/* flags options */
diff --git a/diff.h b/diff.h
index 0fb18dd73..7cf276f07 100644
--- a/diff.h
+++ b/diff.h
@@ -166,6 +166,10 @@ struct diff_options {
const char *stat_sep;
long xdl_opts;
 
+

Re: [PATCH v5 00/10] Partial clone part 2: fsck and promisors

2017-11-22 Thread Jonathan Tan
On Wed, Nov 22, 2017 at 10:00 AM, Jonathan Tan <jonathanta...@google.com> wrote:
> On Wed, 22 Nov 2017 14:25:13 +0900
> Junio C Hamano <gits...@pobox.com> wrote:
>
>> Thanks, will replace/queue all three series.  I am getting a feeling
>> that the first one is already ready for 'next', while the other two
>> may want to see a bit more comments?
>
> Yes, I think so too.
>
> Jeff Hostetler and I noticed some issues occuring when some other Git
> commands dynamically fetch objects due to the fact that those commands
> and fetch-pack use overlapping object flags. At the very least, we
> should look at that before it goes into next.

In the end, making fetch-pack refrain from setting object flags when
being used to dynamically fetch objects works (see the new "introduce
fetch-object: fetch one promisor object" commit that I just pushed to
[1]). I'll coordinate with Jeff Hostetler about sending v6 to the list
(most likely after the U.S. Thanksgiving holidays).

[1] https://github.com/jonathantanmy/git/commits/pc20171122


Re: [PATCH v5 00/10] Partial clone part 2: fsck and promisors

2017-11-22 Thread Jonathan Tan
On Wed, 22 Nov 2017 14:25:13 +0900
Junio C Hamano <gits...@pobox.com> wrote:

> Jeff Hostetler <g...@jeffhostetler.com> writes:
> 
> > From: Jeff Hostetler <jeffh...@microsoft.com>
> >
> > This is V5 of part 2 of partial clone.  This assumes V5 of part 1
> > is already present.  V5 includes minor cleanup over V4 and better
> > separates the --exclude-promisor-objects and --missing arguments.
> >
> > Part 2 is concerned with fsck, gc, initial support for dynamic
> > object fetching, and tracking promisor objects.  Jonathan Tan
> > originally developed this code.  I have moved it on top of
> > part 1 and updated it slightly.
> 
> Thanks, will replace/queue all three series.  I am getting a feeling
> that the first one is already ready for 'next', while the other two
> may want to see a bit more comments?

Yes, I think so too.

Jeff Hostetler and I noticed some issues occuring when some other Git
commands dynamically fetch objects due to the fact that those commands
and fetch-pack use overlapping object flags. At the very least, we
should look at that before it goes into next.


Re: [PATCH v5 0/6] Partial clone part 1: object filtering

2017-11-21 Thread Jonathan Tan
On Tue, 21 Nov 2017 20:58:46 +
Jeff Hostetler  wrote:

> From: Jeff Hostetler 
> 
> Here is V5 of the list-object filtering, rev-list, and pack-objects.
> 
> This version addresses comments on the V4 series.  I removed the
> questionable character encoding scheme.  And I removed or clarified
> use of the term "partial clone" to refer to a future feature.
> 
> Jeff Hostetler (6):
>   dir: allow exclusions from blob in addition to file
>   oidmap: add oidmap iterator methods
>   oidset: add iterator methods to oidset
>   list-objects: filter objects in traverse_commit_list
>   rev-list: add list-objects filtering support
>   pack-objects: add list-objects filtering

I checked the diff against v4 and this looks good.

I'm still unsure if pre-parsing the --filter argument into a struct
list_objects_filter_options is the best approach API-wise in the case
that we plan to send it to the server, but it does have the benefit of
failing (and informing the user) early if the filter is in the wrong
format, so I'm fine with this patch set as-is.


[RFC PATCH] xdiff/xpatience: support anchoring a line

2017-11-21 Thread Jonathan Tan
Teach the patience diff to support prohibiting a user-specified line
from appearing as a deletion or addition in the end result.

Signed-off-by: Jonathan Tan <jonathanta...@google.com>
---
I'm sending this out to see if a change similar to this would be
welcome. It is useful to me as a reviewer (to check my own code, e.g.
when checking [1]). Probably more design needs to go into this,
including the best way to specify the "anchor" line, and the correct
behavior when the anchor is either not found or appears more than once.

Any thoughts?

[1]
https://public-inbox.org/git/20171121221256.154741-1-jonathanta...@google.com/
---
 t/t4033-diff-patience.sh | 13 +
 xdiff/xpatience.c| 29 +++--
 2 files changed, 40 insertions(+), 2 deletions(-)

diff --git a/t/t4033-diff-patience.sh b/t/t4033-diff-patience.sh
index 113304dc5..2147fd688 100755
--- a/t/t4033-diff-patience.sh
+++ b/t/t4033-diff-patience.sh
@@ -13,6 +13,19 @@ test_expect_success '--ignore-space-at-eol with a single 
appended character' '
grep "^+.*X" diff
 '
 
+test_expect_success 'anchor' '
+   printf "a\nb\nc\n" >pre &&
+   printf "c\na\nb\n" >post &&
+
+   # without anchor, c is moved
+   test_expect_code 1 git diff --no-index --patience pre post >diff &&
+   grep "^+c" diff &&
+
+   # with anchor, a is moved
+   DIFF_ANCHOR=c test_expect_code 1 git diff --no-index --patience pre 
post >diff &&
+   grep "^+a" diff
+'
+
 test_diff_frobnitz "patience"
 
 test_diff_unique "patience"
diff --git a/xdiff/xpatience.c b/xdiff/xpatience.c
index a44e77632..195a60e57 100644
--- a/xdiff/xpatience.c
+++ b/xdiff/xpatience.c
@@ -62,6 +62,8 @@ struct hashmap {
 * initially, "next" reflects only the order in file1.
 */
struct entry *next, *previous;
+
+   unsigned anchor : 1;
} *entries, *first, *last;
/* were common records found? */
unsigned long has_matches;
@@ -70,6 +72,14 @@ struct hashmap {
xpparam_t const *xpp;
 };
 
+static int is_anchor(const char *line)
+{
+   char *anchor = getenv("DIFF_ANCHOR");
+   if (!anchor)
+   return 0;
+   return !strncmp(line, anchor, strlen(anchor));
+}
+
 /* The argument "pass" is 1 for the first file, 2 for the second. */
 static void insert_record(int line, struct hashmap *map, int pass)
 {
@@ -110,6 +120,7 @@ static void insert_record(int line, struct hashmap *map, 
int pass)
return;
map->entries[index].line1 = line;
map->entries[index].hash = record->ha;
+   map->entries[index].anchor = is_anchor(map->env->xdf1.recs[line - 
1]->ptr);
if (!map->first)
map->first = map->entries + index;
if (map->last) {
@@ -192,14 +203,28 @@ static struct entry *find_longest_common_sequence(struct 
hashmap *map)
int longest = 0, i;
struct entry *entry;
 
+   /*
+* If not -1, this entry in sequence must never be overridden. (Also,
+* do not override entries in sequence before this entry, since it is
+* useless.)
+*/
+   int anchor_i = -1;
+
for (entry = map->first; entry; entry = entry->next) {
if (!entry->line2 || entry->line2 == NON_UNIQUE)
continue;
i = binary_search(sequence, longest, entry);
entry->previous = i < 0 ? NULL : sequence[i];
-   sequence[++i] = entry;
-   if (i == longest)
+   ++i;
+   if (i <= anchor_i)
+   continue;
+   sequence[i] = entry;
+   if (anchor_i == -1 && entry->anchor) {
+   anchor_i = i;
+   longest = anchor_i + 1;
+   } else if (i == longest) {
longest++;
+   }
}
 
/* No common unique lines were found */
-- 
2.15.0.448.gf294e3d99a-goog



[PATCH] Tests: clean up submodule recursive helpers

2017-11-21 Thread Jonathan Tan
This continues the work in commit d3b5a49 ("Tests: clean up and document
submodule helpers", 2017-11-08).

Factor out the commonalities from
test_submodule_switch_recursing_with_args() and
test_submodule_forced_switch_recursing_with_args() in
lib-submodule-update.sh, and document their usage. Some tests differ
slightly in their test assertions; I have used the superset of those
assertions in that case.

Signed-off-by: Jonathan Tan <jonathanta...@google.com>
---
I checked my work by using a custom diff patch to "anchor" one line into
never appearing as a minus/plus in the diff, so that I could more
clearly see the deletions and additions, then opening 3 terminals: one
showing the common part, one showing the non-forced part, and one
showing the forced part. I'll send that diff patch to the mailing list
shortly [1].

[1] With my diff patch, you can run:
DIFF_ANCHOR=test_submodule_switch_recursing_with_args git show
--patience --color-moved
---
 t/lib-submodule-update.sh | 343 +-
 1 file changed, 125 insertions(+), 218 deletions(-)

diff --git a/t/lib-submodule-update.sh b/t/lib-submodule-update.sh
index bb94c2320..f39ec3095 100755
--- a/t/lib-submodule-update.sh
+++ b/t/lib-submodule-update.sh
@@ -554,6 +554,10 @@ test_submodule_switch_common() {
 #  - if succeeds, once "git submodule update" is invoked, the contents of
 #submodule directories are updated
 #
+# If the command under test is known to not work with submodules in certain
+# conditions, set the appropriate KNOWN_FAILURE_* variable used in the tests
+# below to 1.
+#
 # Use as follows:
 #
 # my_func () {
@@ -622,19 +626,11 @@ test_submodule_forced_switch () {
 # - Removing a submodule with a git directory absorbs the submodules
 #   git directory first into the superproject.
 
-test_submodule_switch_recursing_with_args () {
-   cmd_args="$1"
-   command="git $cmd_args --recurse-submodules"
-   RESULTDS=success
-   if test "$KNOWN_FAILURE_DIRECTORY_SUBMODULE_CONFLICTS" = 1
-   then
-   RESULTDS=failure
-   fi
-   RESULTOI=success
-   if test "$KNOWN_FAILURE_SUBMODULE_OVERWRITE_IGNORED_UNTRACKED" = 1
-   then
-   RESULTOI=failure
-   fi
+# Internal function; use test_submodule_switch_recursing_with_args() or
+# test_submodule_forced_switch_recursing_with_args() instead.
+test_submodule_recursing_with_args_common() {
+   command="$1"
+
# Appearing submodule #
# Switching to a commit letting a submodule appear checks it out ...
test_expect_success "$command: added submodule is checked out" '
@@ -648,7 +644,7 @@ test_submodule_switch_recursing_with_args () {
test_submodule_content sub1 origin/add_sub1
)
'
-   # ... ignoring an empty existing directory ...
+   # ... ignoring an empty existing directory.
test_expect_success "$command: added submodule is checked out in empty 
dir" '
prolog &&
reset_work_tree_to_interested no_submodule &&
@@ -661,34 +657,6 @@ test_submodule_switch_recursing_with_args () {
test_submodule_content sub1 origin/add_sub1
)
'
-   # ... unless there is an untracked file in its place.
-   test_expect_success "$command: added submodule doesn't remove untracked 
file with same name" '
-   prolog &&
-   reset_work_tree_to_interested no_submodule &&
-   (
-   cd submodule_update &&
-   git branch -t add_sub1 origin/add_sub1 &&
-   : >sub1 &&
-   test_must_fail $command add_sub1 &&
-   test_superproject_content origin/no_submodule &&
-   test_must_be_empty sub1
-   )
-   '
-   # ... but an ignored file is fine.
-   test_expect_$RESULTOI "$command: added submodule removes an untracked 
ignored file" '
-   test_when_finished "rm submodule_update/.git/info/exclude" &&
-   prolog &&
-   reset_work_tree_to_interested no_submodule &&
-   (
-   cd submodule_update &&
-   git branch -t add_sub1 origin/add_sub1 &&
-   : >sub1 &&
-   echo sub1 >.git/info/exclude
-   $command add_sub1 &&
-   test_superproject_content origin/add_sub1 &&
-   test_submodule_content sub1 origin/add_sub1
-   )
-   '
# Replacing a tracked file with a submodule p

Re: [PATCH v4 00/15] Parial clone part 3: clone, fetch, fetch-pack, upload-pack, and tests

2017-11-21 Thread Jonathan Tan
On Thu, 16 Nov 2017 18:17:08 +
Jeff Hostetler  wrote:

> From: Jeff Hostetler 
> 
> This part 3 of a 3 part sequence partial clone.  It assumes
> that part 1 and part 2 are in place.
> 
> This patch series is labeled as V4 to keep it in sync with
> the corresponding V4 versions of parts 1 and 2.  There was
> not a V3 version of this patch series.
> 
> Jonathan and I independently started on this task.  This is another
> pass at merging those efforts.  So there are several places that may
> need refactoring and cleanup, but fewer than in the previous submission.
> In particular, the test cases should be squashed and new tests added.
> 
> And I think we need more end-to-end tests.  I'll work on those next.

I think that it would be easier to review if the test for each command
was contained in the same patch as (or the patch immediately following)
the implementation of the command - for example, as in my modifications
[1].

(If you're about to send out v5, that's fine - maybe take this in
consideration for v6, if there is one.)

[1] https://github.com/jonathantanmy/git/commits/pc20171103


Re: [PATCH v4 00/10] Partial clone part 2: fsck and promisors

2017-11-16 Thread Jonathan Tan
I patched both this series and the first 9 patches of mine [1] on part 1
of the entire partial clone implementation [2], and then
diffed them. I'll review just the differences between the two.

You can see the entire diff below (minus means in my patch set but not
in Jeff's, plus means the contrary) - I did not trim it.

[1] https://public-inbox.org/git/cover.1506714999.git.jonathanta...@google.com/
[2] 
https://public-inbox.org/git/20171116180743.61353-1-...@jeffhostetler.com/T/#t

> diff --git a/Documentation/git-pack-objects.txt 
> b/Documentation/git-pack-objects.txt
> index 5fad696c5..33a824eed 100644
> --- a/Documentation/git-pack-objects.txt
> +++ b/Documentation/git-pack-objects.txt
> @@ -242,9 +242,19 @@ So does `git bundle` (see linkgit:git-bundle[1]) when it 
> creates a bundle.
>   the resulting packfile.  See linkgit:git-rev-list[1] for valid
>   `` forms.
>  
> ---missing=(error|allow-any):
> +--missing=(error|allow-any|allow-promisor):
>   Specifies how missing objects are handled.  This is useful, for
>   example, when there are missing objects from a prior partial clone.
> + This is stronger than `--missing=allow-promisor` because it limits
> + the traversal, rather than just silencing errors about missing
> + objects.

What is stronger than `--missing=allow-promisor`?

> +
> +--exclude-promisor-objects::
> + Omit objects that are known to be in the promisor remote". (This

Stray quote.

> + option has the purpose of operating only on locally created objects,
> + so that when we repack, we still maintain a distinction between
> + locally created objects [without .promisor] and objects from the
> + promisor remote [with .promisor].)  This is used with partial clone.
>  
>  SEE ALSO
>  
> diff --git a/Documentation/gitremote-helpers.txt 
> b/Documentation/gitremote-helpers.txt
> index 4a584f3c5..1ceab8944 100644
> --- a/Documentation/gitremote-helpers.txt
> +++ b/Documentation/gitremote-helpers.txt
> @@ -466,6 +466,12 @@ set by Git if the remote helper has the 'option' 
> capability.
>   Transmit  as a push option. As the push option
>   must not contain LF or NUL characters, the string is not encoded.
>  
> +'option from-promisor' {'true'|'false'}::
> + Indicate that these objects are being fetch by a promisor.

Should be: ...are being fetched from a promisor.

> +
> +'option no-haves' {'true'|'false'}::
> + Do not send "have" lines.
> +
>  SEE ALSO
>  
>  linkgit:git-remote[1]
> diff --git a/Documentation/rev-list-options.txt 
> b/Documentation/rev-list-options.txt
> index c84e46522..2beffe320 100644
> --- a/Documentation/rev-list-options.txt
> +++ b/Documentation/rev-list-options.txt
> @@ -730,7 +730,7 @@ specification contained in .
>   Only useful with `--filter=`; prints a list of the omitted objects.
>   Object IDs are prefixed with a ``~'' character.
>  
> ---missing=(error|allow-any|print)::
> +--missing=(error|allow-any|allow-promisor|print)::
>   Specifies how missing objects are handled.  The repository may
>   have missing objects after a partial clone, for example.
>  +
> @@ -741,10 +741,20 @@ The value 'allow-any' will allow object traversal to 
> continue if a
>  missing object is encountered.  Missing objects will silently be omitted
>  from the results.
>  +
> +The value 'allow-promisor' is like 'allow-any' in that it will allow
> +object traversal to continue, but only for EXPECTED missing objects.
> ++
>  The value 'print' is like 'allow-any', but will also print a list of the
>  missing objects.  Object IDs are prefixed with a ``?'' character.
>  endif::git-rev-list[]
>  
> +--exclude-promisor-objects::
> + (For internal use only.)  Prefilter object traversal at
> + promisor boundary.  This is used with partial clone.  This is
> + stronger than `--missing=allow-promisor` because it limits the
> + traversal, rather than just silencing errors about missing
> + objects.
> +
>  --no-walk[=(sorted|unsorted)]::
>   Only show the given commits, but do not traverse their ancestors.
>   This has no effect if a range is specified. If the argument
> diff --git a/Documentation/technical/repository-version.txt 
> b/Documentation/technical/repository-version.txt
> index 074ccb9e0..e03eacceb 100644
> --- a/Documentation/technical/repository-version.txt
> +++ b/Documentation/technical/repository-version.txt
> @@ -87,14 +87,14 @@ When the config key `extensions.preciousObjects` is set 
> to `true`,
>  objects in the repository MUST NOT be deleted (e.g., by `git-prune` or
>  `git repack -d`).
>  
> -`partialClone`
> +`partialclone`
>  ~~
>  
> -When the config key `extensions.partialClone` is set, a remote is
> -designated as a "promisor remote". Objects referenced by packed objects
> -obtained from that promisor remote do not need to be in the local repo.
> -Instead, the promisor remote promises that all such objects can be
> -fetched from it in the 

Re: [PATCH v4 5/6] rev-list: add list-objects filtering support

2017-11-16 Thread Jonathan Tan
On Thu, 16 Nov 2017 18:07:42 +
Jeff Hostetler  wrote:

> From: Jeff Hostetler 
> 
> Teach rev-list to use the filtering provided by the
> traverse_commit_list_filtered() interface to omit
> unwanted objects from the result.  This feature is
> intended to help with partial clone.
> 
> Object filtering is only allowed when one of the "--objects*"
> options are used.
> 
> When the "--filter-print-omitted" option is used, the omitted
> objects are printed at the end.  These are marked with a "~".
> This option can be combined with "--quiet" to get a list of
> just the omitted objects.
> 
> Added "--missing=(error|print|omit)" argument to specify how
> rev-list should behave when it encounters a missing object
> (presumably from a prior partial clone).
> 
> When "--missing=print" is used, rev-list will print a list of
> any missing objects that should have been included in the output.
> These are marked with a "?".
> 
> Add t6112 test.

The patch itself looks good, except that I have a nagging feeling about
the usage of the term "partial clone" in the commit message,
documentation, and test description. I feel that the usage here leads
one to believe that partial clones haphazardly leave repositories
without random objects (and at the point that this patch is merged,
there will not be any patch in the main repo contradicting this
viewpoint), contrary to the fact that we will have a tracking mechanism
to track which missing objects are expected to be missing. (If I'm the
only one feeling this way, though, then I'll just let it slide.)

If it were up to me, I would remove all existing mentions of "partial
clone" and explain the presence of the "--missing" argument as follows:

In the future, we will introduce a "partial clone" mechanism wherein
an object in a repo, obtained from a remote, may reference a missing
object that can be dynamically fetched from that remote once needed.
This "partial clone" mechanism will have a way, sometimes slow, of
determining if a missing link is one of the links expected to be
produced by this mechanism.

This patch introduces handling of missing objects to help debugging
and development of the "partial clone" mechanism, and once the
mechanism is implemented, for a power user to perform operations
that are missing-object-aware without incurring the cost of checking
if a missing link is expected.


Re: [PATCH v4 4/6] list-objects: filter objects in traverse_commit_list

2017-11-16 Thread Jonathan Tan
On Thu, 16 Nov 2017 18:07:41 +
Jeff Hostetler  wrote:

> +/*
> + * Return 1 if the given string needs armoring because of "special"
> + * characters that may cause injection problems when a command passes
> + * the argument to a subordinate command (such as when upload-pack
> + * launches pack-objects).
> + *
> + * The usual alphanumeric and key punctuation do not trigger it.
> + */ 
> +static int arg_needs_armor(const char *arg)

First of all, about the injection problem, replying to your previous e-mail
[1]:

https://public-inbox.org/git/61855872-221b-0e97-abaa-24a011ad8...@jeffhostetler.com/

> I couldn't use quote.[ch] because it is more concerned with
> quoting pathnames because of LF and CR characters within
> them -- rather than semicolons and quotes and the like which
> I was concerned about.

sq_quote_buf() (or one of the other similarly-named functions) should
solve this problem, right? The single quotes around the argument takes
care of LF, CR, and semicolons, and things like backslashes and quotes
are taken care of as documented.

I don't think we need to invent another encoding to solve this.

> +{
> + const unsigned char *p;
> +
> + for (p = (const unsigned char *)arg; *p; p++) {
> + if (*p >= 'a' && *p <= 'z')
> + continue;
> + if (*p >= 'A' && *p <= 'Z')
> + continue;
> + if (*p >= '0' && *p <= '9')
> + continue;
> + if (*p == '-' || *p == '_' || *p == '.' || *p == '/')
> + continue;

If we do take this approach, can ':' also be included?

> + if (skip_prefix(arg, "sparse:oid=", )) {
> + struct object_context oc;
> + struct object_id sparse_oid;
> +
> + /*
> +  * Try to parse  into an OID for the current
> +  * command, but DO NOT complain if we don't have the blob or
> +  * ref locally.
> +  */
> + if (!get_oid_with_context(v0, GET_OID_BLOB,
> +   _oid, ))
> + filter_options->sparse_oid_value = oiddup(_oid);
> + filter_options->choice = LOFC_SPARSE_OID;
> + if (arg_needs_armor(v0))
> + filter_options->requires_armor = v0 - arg;
> + return 0;
> + }

In your previous e-mail, you mentioned:

> yes.  I always pass filter_options.raw_value over the wire.
> The code above tries to parse it and put it in an OID for
> private use by the current process -- just like the size limit
> value in the blob:limit filter.

So I think this function should complain if you don't have the blob or
ref locally. (I envision that if a filter string is to be directly sent
to a server, it should be stored as a string, not processed by this
function first.)


Re: [PATCHv4 7/7] builtin/describe.c: describe a blob

2017-11-14 Thread Jonathan Tan
On Tue, 14 Nov 2017 16:30:43 -0800
Stefan Beller  wrote:

> The walking is performed in reverse order to show the introduction of a
> blob rather than its last occurrence.

The code as implemented here does not do this - it instead shows the last
occurrence.

>  NAME
>  
> -git-describe - Describe a commit using the most recent tag reachable from it
> +git-describe - Describe a commit or blob using the graph relations

I would write "Describe a commit or blob using a tag reachable from it".

> +If the given object refers to a blob, it will be described
> +as `:`, such that the blob can be found
> +at `` in the ``. Note, that the commit is likely
> +not the commit that introduced the blob, but the one that was found
> +first; to find the commit that introduced the blob, you need to find
> +the commit that last touched the path, e.g.
> +`git log  -- `.
> +As blobs do not point at the commits they are contained in,
> +describing blobs is slow as we have to walk the whole graph.

I think some of this needs to be updated?

> +static void process_object(struct object *obj, const char *path, void *data)
> +{
> + struct process_commit_data *pcd = data;
> +
> + if (!oidcmp(>looking_for, >oid) && !pcd->dst->len) {
> + reset_revision_walk();
> + describe_commit(>current_commit, pcd->dst);
> + strbuf_addf(pcd->dst, ":%s", path);
> + pcd->revs->max_count = 0;
> + }
> +}

Setting max_count to 0 does not work when reverse is used, because the
commits are first buffered into revs->commits (see get_revision() in
revision.c). There doesn't seem to be a convenient way to terminate the
traversal immediately - I think setting revs->commits to NULL should
work (but I didn't check). Remember to free revs->commits (using
free_commit_list) first.

> +test_expect_success 'describe a blob at a tag' '
> + echo "make it a unique blob" >file &&
> + git add file && git commit -m "content in file" &&
> + git tag -a -m "latest annotated tag" unique-file &&
> + git describe HEAD:file >actual &&
> + echo "unique-file:file" >expect &&
> + test_cmp expect actual
> +'

This is probably better named "describe a blob at a directly tagged
commit". (Should we also test the case where a blob is directly
tagged?)

> +test_expect_success 'describe a blob with its last introduction' '
> + git commit --allow-empty -m "empty commit" &&
> + git rm file &&
> + git commit -m "delete blob" &&
> + git revert HEAD &&
> + git commit --allow-empty -m "empty commit" &&
> + git describe HEAD:file >actual &&
> + grep unique-file-3-g actual
> +'

The description is not true: firstly, this shows the last occurrence,
not the last introduction (you can verify this by adding another commit
and noticing that the contents of "actual" changes), and what we want is
not the last introduction anyway, but the first one.


Re: [PATCHv4 6/7] builtin/describe.c: factor out describe_commit

2017-11-14 Thread Jonathan Tan
On Tue, 14 Nov 2017 16:30:42 -0800
Stefan Beller  wrote:

> In the next patch we'll learn how to describe more than just commits,
> so factor out describing commits into its own function.  That will make
> the next patches easy as we still need to describe a commit as part of
> describing blobs.
> 
> While factoring out the functionality to describe_commit, make sure
> that any output to stdout is put into a strbuf, which we can print
> afterwards, using puts which also adds the line ending.

I think the sentences here are a bit jumbled. The aim is to make the
next patch easier (your 2nd sentence), and how we accomplish that is by
factoring out the description of commits (1st sentence) *and* by
outputting to a strbuf because we need to postprocess the output further
when describing a commit as part of describing a blob (3rd sentence).


Re: [PATCHv4 5/7] builtin/describe.c: print debug statements earlier

2017-11-14 Thread Jonathan Tan
On Tue, 14 Nov 2017 16:30:41 -0800
Stefan Beller  wrote:

> For debuggers aid we'd want to print debug statements early, so
> introduce a new line in the debug output that describes the whole
> function, and then change the next debug output to describe why we
> need to search. Conveniently drop the arg from the second line;
> which will be useful in a follow up commit, that refactors the
> describe function.
> 
> This re-arrangement of debug printing is solely done for a better
> refactoring in the next commit, not to aid debugging git-describe,
> which is expected to have the same information readily available
> with the new prints.

This paragraph ("not to aid debugging") contradicts the previous one
("For debuggers aid").

Looking at this patch and the subsequent patches, I would write the
commit message like this:

When debugging, print the received argument at the start of the
function instead of in the middle. This ensures that the received
argument is printed in all code paths, and also allows a subsequent
refactoring to not need to move the "arg" parameter.

Also change the title to "print arg earlier when debugging" or something
like that.


Re: [PATCHv4 3/7] revision.h: introduce blob/tree walking in order of the commits

2017-11-14 Thread Jonathan Tan
On Tue, 14 Nov 2017 16:30:39 -0800
Stefan Beller  wrote:

> The change in list-objects.c is rather minimal as we'll be re-using
> the infrastructure put in place of the revision walking machinery. For
> example one could expect that add_pending_tree is not called, but rather
> commit->tree is directly passed to the tree traversal function. This
> however requires a lot more code than just emptying the queue containing
> trees after each commit.

Ah, I see. You're calling add_pending_tree(), then
traverse_trees_and_blobs() will immediately flush the list of pending
trees.

I'm not sure that passing commit->tree directly to the tree traversal
function will require a lot more code, but even if it does, you should
at least add a NEEDSWORK - currently, flushing the list of pending trees
frees the array containing the list of pending trees, so each invocation
will need to reallocate a new array.


Re: [PATCHv3 7/7] builtin/describe.c: describe a blob

2017-11-14 Thread Jonathan Tan
On Tue, 14 Nov 2017 12:40:03 -0800
Stefan Beller  wrote:

> Thanks for the review!
> 
> This series was written with the mindset, that a user would only ever
> want to describe bad blobs. (bad in terms of file size, unwanted content, etc)
> 
> With the --reverse you only see the *first* introduction of said blob,
> so finding out if it was re-introduced is still not as easy, whereas "when
> was this blob last used" which is what the current algorithm does, covers
> that case better.

How does "when was this blob last used" cover reintroduction better? If
you want to check all introductions, then you'll need something like
what I describe (quoted below).

> > Alternatively, to me, it seems that listing commits that *introduces*
> > the blob (that is, where it references the blob, but none of its parents
> > do) would be the best way. That would then be independent of traversal
> > order (and we would no longer need to find a tag etc. to tie the blob
> > to).
> 
> What if it is introduced multiple times? (either in multiple competing
> side branches; or introduced, reverted and re-introduced?)

Then all of them should be listed.

> > If we do that, it seems to me that there is a future optimization that
> > could get the first commit to the user more quickly - once a commit
> > without the blob and a descendant commit with the blob is found, that
> > interval can be bisected, so that the first commit is found in O(log
> > number of commits) instead of O(commits). But this can be done later.
> 
> bisection assumes that we only have one "event" going from good to
> bad, which doesn't hold true here, as the blob can be there at different
> occasions of the history.

Yes, bisection would only be useful to output one commit more quickly.
We will still need to exhaustively search for all commits if the user
needs more than one.


Re: [PATCHv3 7/7] builtin/describe.c: describe a blob

2017-11-14 Thread Jonathan Tan
On Thu,  2 Nov 2017 12:41:48 -0700
Stefan Beller  wrote:

> Sometimes users are given a hash of an object and they want to
> identify it further (ex.: Use verify-pack to find the largest blobs,
> but what are these? or [1])
> 
> "This is an interesting endeavor, because describing things is hard."
>   -- me, upon writing this patch.
> 
> When describing commits, we try to anchor them to tags or refs, as these
> are conceptually on a higher level than the commit. And if there is no ref
> or tag that matches exactly, we're out of luck.  So we employ a heuristic
> to make up a name for the commit. These names are ambiguous, there might
> be different tags or refs to anchor to, and there might be different
> path in the DAG to travel to arrive at the commit precisely.
> 
> When describing a blob, we want to describe the blob from a higher layer
> as well, which is a tuple of (commit, deep/path) as the tree objects
> involved are rather uninteresting.  The same blob can be referenced by
> multiple commits, so how we decide which commit to use?  This patch
> implements a rather naive approach on this: As there are no back pointers
> from blobs to commits in which the blob occurs, we'll start walking from
> any tips available, listing the blobs in-order of the commit and once we
> found the blob, we'll take the first commit that listed the blob.  For
> source code this is likely not the first commit that introduced the blob,
> but rather the latest commit that contained the blob.  For example:
> 
>   git describe v0.99:Makefile
>   v0.99-5-gab6625e06a:Makefile
> 
> tells us the latest commit that contained the Makefile as it was in tag
> v0.99 is commit v0.99-5-gab6625e06a (and at the same path), as the next
> commit on top v0.99-6-gb1de9de2b9 ([PATCH] Bootstrap "make dist",
> 2005-07-11) touches the Makefile.
> 
> Let's see how this description turns out, if it is useful in day-to-day
> use as I have the intuition that we'd rather want to see the *first*
> commit that this blob was introduced to the repository (which can be
> achieved easily by giving the `--reverse` flag in the describe_blob rev
> walk).

The method of your intuition indeed seems better - could we just have
this from the start?

Alternatively, to me, it seems that listing commits that *introduces*
the blob (that is, where it references the blob, but none of its parents
do) would be the best way. That would then be independent of traversal
order (and we would no longer need to find a tag etc. to tie the blob
to).

If we do that, it seems to me that there is a future optimization that
could get the first commit to the user more quickly - once a commit
without the blob and a descendant commit with the blob is found, that
interval can be bisected, so that the first commit is found in O(log
number of commits) instead of O(commits). But this can be done later.


Re: [PATCHv3 5/7] builtin/describe.c: print debug statements earlier

2017-11-14 Thread Jonathan Tan
On Thu,  2 Nov 2017 12:41:46 -0700
Stefan Beller  wrote:

> For debuggers aid we'd want to print debug statements early, so
> introduce a new line in the debug output that describes the whole
> function, and then change the next debug output to describe why we
> need to search. Conveniently drop the arg from the second line;
> which will be useful in a follow up commit, that refactors the\
> describe function.
> 
> Signed-off-by: Stefan Beller 
> ---
>  builtin/describe.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/builtin/describe.c b/builtin/describe.c
> index fd61f463cf..3136efde31 100644
> --- a/builtin/describe.c
> +++ b/builtin/describe.c
> @@ -293,6 +293,9 @@ static void describe(const char *arg, int last_one)
>   unsigned long seen_commits = 0;
>   unsigned int unannotated_cnt = 0;
>  
> + if (debug)
> + fprintf(stderr, _("describe %s\n"), arg);
> +

Could you explain in the commit message why this wasn't needed before
(if it wasn't needed), and why this is needed now?

>   if (get_oid(arg, ))
>   die(_("Not a valid object name %s"), arg);
>   cmit = lookup_commit_reference();
> @@ -316,7 +319,7 @@ static void describe(const char *arg, int last_one)
>   if (!max_candidates)
>   die(_("no tag exactly matches '%s'"), 
> oid_to_hex(>object.oid));
>   if (debug)
> - fprintf(stderr, _("searching to describe %s\n"), arg);
> + fprintf(stderr, _("No exact match on refs or tags, searching to 
> describe\n"));

What is this arg that can be safely dropped?

You mention that it is for convenience (since the describe() function
will be refactored), but could the arg just be passed to the new
function?


Re: [PATCHv3 3/7] revision.h: introduce blob/tree walking in order of the commits

2017-11-14 Thread Jonathan Tan
On Thu,  2 Nov 2017 12:41:44 -0700
Stefan Beller  wrote:

> @@ -239,6 +239,8 @@ void traverse_commit_list(struct rev_info *revs,
>   if (commit->tree)
>   add_pending_tree(revs, commit->tree);
>   show_commit(commit, data);
> + if (revs->tree_blobs_in_commit_order)
> + traverse_trees_and_blobs(revs, , show_object, data);
>   }
>   traverse_trees_and_blobs(revs, , show_object, data);
>  

I would have expected add_pending_tree() above to no longer be invoked.
If it still needs to be invoked, maybe add an explanation in the form of
a comment or commit message.

> +test_expect_success 'rev-list --in-commit-order' '
> + for x in one two three four
> + do
> + echo $x >$x &&
> + git add $x &&
> + git commit -m "add file $x" ||
> + return 1
> + done &&
> + for x in four three
> + do
> + git rm $x &&
> + git commit -m "remove $x" ||
> + return 1
> + done &&
> + git rev-list --in-commit-order --objects HEAD >actual.raw &&
> + cut -c 1-40 >actual  +
> + git cat-file --batch-check="%(objectname)" >expect.raw <<-\EOF &&
> + HEAD^{commit}
> + HEAD^{tree}
> + HEAD^{tree}:one
> + HEAD^{tree}:two
> + HEAD~1^{commit}
> + HEAD~1^{tree}
> + HEAD~1^{tree}:three
> + HEAD~2^{commit}
> + HEAD~2^{tree}
> + HEAD~2^{tree}:four
> + HEAD~3^{commit}
> + # HEAD~3^{tree} skipped, same as HEAD~1^{tree}
> + HEAD~4^{commit}
> + # HEAD~4^{tree} skipped, same as HEAD^{tree}
> + HEAD~5^{commit}
> + HEAD~5^{tree}
> + EOF
> + grep -v "#" >expect  +
> + test_cmp expect actual
> +'

Would it be useful to have another test without --in-commit-order, so
that we can see the difference (and ensure that existing behavior is
unchanged)?


Re: [PATCH v2 2/4] progress: fix progress meters when dealing with lots of work

2017-11-13 Thread Jonathan Tan
On Mon, 13 Nov 2017 12:15:58 -0800
Elijah Newren  wrote:

> -static int display(struct progress *progress, unsigned n, const char *done)
> +static int display(struct progress *progress, uint64_t n, const char *done)
>  {
>   const char *eol, *tp;
>  
> @@ -106,7 +106,7 @@ static int display(struct progress *progress, unsigned n, 
> const char *done)
>   if (percent != progress->last_percent || progress_update) {
>   progress->last_percent = percent;
>   if (is_foreground_fd(fileno(stderr)) || done) {
> - fprintf(stderr, "%s: %3u%% (%u/%u)%s%s",
> + fprintf(stderr, "%s: %3u%% 
> (%"PRIuMAX"/%"PRIuMAX")%s%s",
>   progress->title, percent, n,
>   progress->total, tp, eol);

I think it would be better to cast the appropriate arguments to
uintmax_t - searching through the Git code shows that we do that in
several situations. Same for the rest of the diff.


Re: [PATCH v2 2/4] progress: fix progress meters when dealing with lots of work

2017-11-13 Thread Jonathan Tan
On Mon, 13 Nov 2017 12:15:58 -0800
Elijah Newren  wrote:

> The possibility of setting merge.renameLimit beyond 2^16 raises the
> possibility that the values passed to progress can exceed 2^32.
> Use uint64_t, because it "ought to be enough for anybody".  :-)
> 
> Signed-off-by: Elijah Newren 
> ---
> This does imply 64-bit math for all progress operations.  Possible 
> alternatives
> I could think of listed at
> https://public-inbox.org/git/CABPp-BH1Cpc9UfYpmBBAHWSqadg=QuD=28qx1ov29zdvf4n...@mail.gmail.com/
> Opinions of others on whether 64-bit is okay, or preference for which 
> alternative
> is picked?

I haven't looked into this in much detail, but another alternative to
consider is to use size_t everywhere. This also allows us to use st_add
and st_mult, which checks overflow for us.

Changing progress to use the size_t of the local machine makes sense to
me.


Re: [RFC 0/3] Add support for --cover-at-tip

2017-11-13 Thread Jonathan Tan
On Mon, 13 Nov 2017 18:13:27 +0100
Nicolas Morey-Chaisemartin  wrote:

> v2:
> - Enhance mailinfo to parse patch series id from subject
> - Detect cover using mailinfo parsed ids in git am

I noticed that this was done in the patch set by searching for "PATCH" -
that is probably quite error-prone as not all patches will have a
subject line of that form. It may be better to search for "0/" and
ensure that it is immediately followed by an integer.

Also, it might be worth checking the message IDs to ensure that the
PATCH M/Ns all indeed are replies to PATCH 0/N.

> - Support multiple patch series in a single run

Is this done? I would have expected that some buffering of messages
would be necessary, since you're writing a series of messages of the
form ... to the commits 

> TODO:
> - Add doc/comments
> - Add tests
> - Add a new "seperator" at the end of a cover letter.
>   Right now I added a triple dash to all cover letter (manual or 
> cover-at-tip) before shortlog/diff stat
>   This allows manually written cover letters to be handle by git am 
> --cover-at-tip without including the shortlog/diffstat but
>   breaks compat with older git am as it is seen has a malformed patch. A new 
> separator would solve that.

I think the triple dash works. I tried "git am" with a cover letter with
no triple dash, and it complains that the commit is empty anyway, so
compatibility with older git am might not be such a big issue. (With the
triple dash, it indeed complains about a malformed patch, as you
describe.)


Re: [RFC] protocol version 2

2017-11-10 Thread Jonathan Tan
On Fri, 20 Oct 2017 10:18:39 -0700
Brandon Williams  wrote:

> Some of the pain points with the current protocol spec are:

After some in-office discussion, I think that the most important pain
point is that we have to implement each protocol twice: once for
HTTP(S), and once for SSH (and friends) that support bidirectional byte
streams.

If it weren't for this, I think that what is discussed in this document
(e.g. ls-refs, fetch-object) can be less invasively accomplished with
v1, specifying "extra parameters" (explained in this e-mail [1]) to
merely tweak the output of upload-pack instead of replacing it nearly
completely, thus acting more as optimizations than changing the mode of
operation entirely.

[1] 
https://public-inbox.org/git/20171010193956.168385-1-jonathanta...@google.com/

>   * The server's initial response is the ref advertisement.  This
> advertisement cannot be omitted and can become an issue due to the
> sheer number of refs that can be sent with large repositories.  For
> example, when contacting the internal equivalent of
> `https://android.googlesource.com/`, the server will send
> approximately 1 million refs totaling 71MB.  This is data that is
> sent during each and every fetch and is not scalable.

For me, this is not a compelling one, because we can provide a ref
whitelist as an "extra parameter" in v1.

>   * Capabilities were implemented as a hack and are hidden behind a NUL
> byte after the first ref sent from the server during the ref
> advertisement:
> 
>\0  
> 
> Since they are sent in the context of a pkt-line they are also subject
> to the same length limitations (1k bytes with old clients).  While we
> may not be close to hitting this limitation with capabilities alone, it
> has become a problem when trying to abuse capabilities for other
> purposes (e.g. 
> [symrefs](https://public-inbox.org/git/20160816161838.klvjhhoxsftvkfmd@x/)).
> 
>   * Various other technical debt (e.g. abusing capabilities to
> communicate agent and symref data, service name set using a query
> parameter).

I think these 2 are the same - I would emphasize the fact that we cannot
add more stuff here, rather than the fact that we're putting this behind
NUL.

>  Special Packets
> -
> 
> In protocol v2 these special packets will have the following semantics:
> 
>   * '' Flush Packet (flush-pkt) - indicates the end of a message
>   * '0001' End-of-List delimiter (delim-pkt) - indicates the end of a list

To address the pain point of HTTP(S) being different from the others
(mentioned above), I think the packet semantics should be further
qualified:

 - Communications must be divided up into packets terminated by a
   flush-pkt. Also, each side must be implemented without knowing
   whether packets-in-progress can or cannot be seen by the other side.
 - Each request packet must have a corresponding, possibly empty,
   response packet.
 - A request packet may be sent even if a response packet corresponding
   to a previously sent request packet is awaited. (This allows us to
   retain the existing optimization in fetch-pack wherein, during
   negotiation, the "have" request-response packet pairs are
   interleaved.)

This will allow us to more easily share code between HTTP(S) and the
others.

In summary, I think that we need a big motivation to make the jump from
v1 to v2, instead of merely making small changes to v1 (and I do think
that the proposed new commands, such as "ls-refs" and "fetch-object",
can be implemented merely by small changes). And I think that the
ability to better share code between HTTP(S) and others provides that
motivation.


Re: [RFC] cover-at-tip

2017-11-10 Thread Jonathan Tan
On Fri, 10 Nov 2017 16:37:49 +0100
Nicolas Morey-Chaisemartin  wrote:

> > Hi,
> >
> > I'm starting to look into the cover-at-tip topic that I found in the 
> > leftover bits (http://www.spinics.net/lists/git/msg259573.html)

Thanks - I personally would find this very useful.

> > Here's a first draft of a patch that adds support for format-patch 
> > --cover-at-tip. It compiles and works in my nice and user firnedly test 
> > case.
> > Just wanted to make sure I was going roughly in the right direction here.
> >
> >
> > I was wondering where is the right place to put a commit_is_cover_at_tip() 
> > as the test will be needed in other place as the feature is extended to git 
> > am/merge/pull.

I think you can put this in (root)/commit.c, especially since that test
operates on a "struct commit *".

> > Feel free to comment. I know the help is not clear at this point and 
> > there's still some work to do on option handling (add a config option, 
> > probably have --cover-at-tip imply --cover-letter, etc) and
> > some testing :)

Both are good ideas. You should probably use a
--cover-letter={no,auto,yes} instead of the current boolean, so that the
config can use the same options and configuring it to "auto" (to use a
cover letter if the tip is empty and singly-parented, and not to use a
cover letter otherwise) is meaningful.

> The proposed patch for format-patch does not output any "---" to signal the 
> end of the commit log and the begining of the patch in the cover letter.
> This means that the log summary, the diffstat and the git footer ( --\n version>) is seen as part of the commit log. Which is just wrong.
> 
> Removing them would solve the issue but I feel they bring some useful info 
> (or they would not be here).
> Adding a "---" between the commit log and those added infos poses another 
> problem: git am does not see an empty patch anymore.
> I would need to add "some" level of parsing to am.c to make sure the patch 
> content is just garbage and that there are no actual hunks for that.

Could you just take the message from the commit and put that in the
cover letter? The summary and diffstat do normally have useful info, but
if the commit is specifically made to be used only for the cover letter,
I think that is no longer true.


Re: [RFD] Long term plan with submodule refs?

2017-11-08 Thread Jonathan Tan
On Wed,  8 Nov 2017 16:10:07 -0800
Stefan Beller  wrote:

I thought of a possible alternative and how it would work.

> Possible data models and workflow implications
> ==
> In the following different data models are presented, which aid a submodule
> heavy workflow each giving pros and cons.

What if, in the submodule, we have a new ref backend that mirrors the
superproject? When initializing the submodule, its original refs are not
cloned at all, but instead virtual refs are used.

Creation of brand-new refs is forbidden in the submodule.

When reading a ref in the submodule, if that ref is the current branch
in the superproject, read the corresponding gitlink entry in the index
(which may be dirty); otherwise read the gitlink in the tree of the tip
commit.

When updating a ref in the submodule, if that ref is the current branch
in the superproject, update the index; otherwise, create a commit on top
of the tip and update the ref to point to the new tip.

No synchronicity is enforced between superproject and submodule in terms
of HEAD, though: If a submodule is currently checked out to a branch,
and the gitlink for that branch is updated through whatever means, that
is equivalent to a "git reset --soft" in the submodule.

These rules seem straightforward to me (although I have been working
with Git for a while, so perhaps I'm not the best judge), and I think
leads to a good workflow, as discussed below.

> Workflows
> =
> * Obtaining a copy of the Superproject tightly coupled with submodules
>   solved via git clone --recurse-submodules=
> * Changing the submodule selection
>   solved via submodule.active flags
> * Changing the remote / Interacting with a different remote for all submodules
>   -> need to be solved, not core issue of this discussion
> * Syncing to the latest upstream
>   solved via git pull --recurse  

(skipping the above, since they are either solved or not a core issue)

> * Working on a local feature in one submodule
>   -> How do refs work spanning superproject/submodule?

This is perhaps one weak point of my proposal - you can't work on a
submodule as if it were independent. You can checkout a branch and make
commits, but (i) they will automatically affect the superproject, and
(ii) the "origin/foo" etc. branches are those of the superproject. (But
if you checkout a detached HEAD, everything should still work.)

> * Working on a feature spanning multiple submodules
>   -> How do refs work spanning multiple repos?

The above rules allow the following workflow:
 - "checkout --recurse-submodules" the branch you want on the
   superproject
 - make whatever changes you want in each submodule
 - commit each individual submodule (which updates the index of the
   superproject), then commit the superproject (we can introduce a
   commit --recurse-submodules to make this more convenient)
 - a "push --recurse-submodules" can be implemented to push the
   superproject and its submodules independently (and the same refspec
   can be legitimately used both when pushing the superproject and when
   pushing a submodule, since the ref names are the same, and not by
   coincidence)

If the user insists on making changes on a non-current branch (i.e. by
creating commits in submodules then using "git update-ref" or
equivalent), possibly multiple commits would be created in the
superproject, but the user can still squash them later if desired.

> * Working on a bug fix (Changing the feature that you currently work on, 
> branches)
>   -> How does switching branches in the superproject affect submodules

You will have to stash or commit your changes. (Which reminds me...GC in
the subproject will need to consult the revlog of the superproject too.)

> New type of symbolic refs
> =
> A symbolic ref can currently only point at a ref or another symbolic ref.
> This proposal showcases different scenarios on how this could change in the
> future.
> 
> HEAD pointing at the superprojects index
> 

Assuming we don't need synchronicity, the existing HEAD format can be
retained. To clarify what happens during ref writes, I'll reuse the
scenarios Stefan described:

> Ref write operations driven by the submodule, affecting target ref
>   e.g. git commit, reset --hard, update-ref (in the submodule)
> 
> The HEAD stays the same, pointing at the superproject.
> The gitlink is changed to the target sha1, using
> 
>   git -C  update-index --add \
>   --cacheinfo 16,$SHA1,
> 
> This will affect the superprojects index, such that then a commit in
> the superproject is needed.

In this proposal, the HEAD also stays the same (pointing at the branch).

Either the index is updated or a commit is needed. If a commit is
needed, it is automatically performed.

> Ref write operations driven by the superproject, changing the gitlink
>   e.g. git checkout , git reset --hard (in the 

Re: [RFC PATCH 0/4] git-status reports relation to superproject

2017-11-08 Thread Jonathan Tan
On Wed,  8 Nov 2017 11:55:05 -0800
Stefan Beller  wrote:

>   $ git -c status.superprojectinfo status
>   HEAD detached at v2.15-rc2
>   superproject is 6 commits behind HEAD 7070ce2..5e6d0fb
>   nothing to commit, working tree clean
> 
> How cool is that?
> 
> This series side steps the questions raised in
> https://public-inbox.org/git/xmqq4lq6hmp2.fsf...@gitster.mtv.corp.google.com/
> which I am also putting together albeit slowly.
> 
> This series just reports the relationship between the superprojects gitlink
> (if any) to HEAD. I think that is useful information in the current
> world of submodules.

The relationship is indeed currently useful, but if the long term plan
is to strongly discourage detached submodule HEAD, then I would think
that these patches are in the wrong direction. (If the long term plan is
to end up supporting both detached and linked submodule HEAD, then these
patches are fine, of course.) So I think that the plan referenced in
Junio's email (that you linked above) still needs to be discussed.

About the patches themselves, they look OK to me. Some minor things off
the top of my head are to retain the "ours" and "theirs" (instead of
"one" and "two"), and to replicate the language in remote.c more closely
("This submodule is (ahead of/behind) the superproject by %d commit(s)")
instead of inventing your own.


Re: [PATCH 1/9] extension.partialclone: introduce partial clone extension

2017-11-08 Thread Jonathan Tan
On Wed, 8 Nov 2017 15:32:21 -0500
Jeff Hostetler  wrote:

> Thanks Jonathan.
> 
> I moved my version of part 2 on top of yesterday's part 1.
> There are a few changes between my version and yours. Could
> you take a quick look at them and see if they make sense?
> (I'll spare the mailing list another patch series until after
> I attend to the feed back on part 1.)
> 
> https://github.com/jeffhostetler/git/commits/core/pc3_p2

Thanks - the differences are quite minor, and they generally make sense.
The main one is that finish_object() in builtin/rev-list.c now returns
int instead of void, but that makes sense.

Other than that:

 - I think you accidentally squashed the rev-list commit into
   "sha1_file: support lazily fetching missing objects".
 - The documentation for --exclude-promisor-objects in
   git-pack-objects.txt should be "Omit objects that are known to be in
   the promisor remote". (This option has the purpose of operating only
   on locally created objects, so that when we repack, we still maintain
   a distinction between locally created objects [without .promisor] and
   objects from the promisor remote [with .promisor].)
 - The transport options in gitremote-helpers.txt could have the same
   documentation as in transport.h.


Re: [PATCH v3 6/6] pack-objects: add list-objects filtering

2017-11-07 Thread Jonathan Tan
On Tue,  7 Nov 2017 19:35:46 +
Jeff Hostetler  wrote:

> +--filter-ignore-missing:
> + Ignore missing objects without error.  This may be used with
> + or without and of the above filtering.

There is a discussion about this parameter (and the corresponding ones
in patch 5/6), to which I have just replied:

https://public-inbox.org/git/20171107164118.97cc65c4030de0922b19d...@google.com/

> @@ -3028,6 +3048,12 @@ int cmd_pack_objects(int argc, const char **argv, 
> const char *prefix)
>   if (!rev_list_all || !rev_list_reflog || !rev_list_index)
>   unpack_unreachable_expiration = 0;
>  
> + if (filter_options.choice) {
> + if (!pack_to_stdout)
> + die("cannot use filtering with an indexable pack.");

I'm not sure what an "indexable pack" is - probably a better message is
"cannot use --filter without --stdout".


Re: [PATCH v2 0/6] Partial clone part 1: object filtering

2017-11-07 Thread Jonathan Tan
On Fri, 3 Nov 2017 14:34:39 -0400
Jeff Hostetler  wrote:

> > Assuming we eventually get promisor support working, would there be
> > any use case where "any missing is OK" mode would be useful in a
> > sense more reasonable than "because we could have such a mode" and
> > "it is not our business to prevent users from playing with fire"?
> > 
> 
> For now, I'd like to keep my "any missing is OK" option.
> I do think it has value all by itself.
> 
> We are essentially using something like that now with our GVFS
> users on the gigantic Windows repo and haven't had any issues.
> 
> But yes, when we get promisor support working, we could revisit
> the need for this parameter.

Well, it's probably not a good idea to include this parameter, and then
subsequently remove it.

> However, I do have some scaling concerns here.  If for example,
> I have 100M missing blobs (because we did an only commits-and-trees
> clone), the cost to compute "promisor missing" vs "just missing"
> might be prohibitively expensive.  It could be something we want
> fsck/gc to be aware of, but other commands may want to just assume
> any missing objects are expected and continue.
> 
> Hopefully, we won't have a scale problem, but we just don't know
> yet.

I can see some use for this parameter - for example, when doing a report
for statistical purposes (percentage of objects missing, for example) or
for a background task that downloads missing objects into a cache. Also,
power users who know what they're doing (or normal users in an
emergency) can use this option when they have no network connection if
they really need to find something out from the local repo.

In these cases, the promisor check (after detecting that the object is
missing) is indeed not so useful, I think. (Or we can do the
--exclude=missing and --exclude=promisor idea that Jeff mentioned -
--exclude=missing now, and --exclude=promisor after we add promisor
support.)

This is conceptually different from gc's use of
--exclude-promisor-objects (in my patch set), which does not intend to
touch promisor objects (objects that are known to be in the promisor
remote), whether they are present or not.

Having said that, I would be OK if we didn't have tolerance (and/or
reporting) of missing objects right now. As far as I know, for the
initial implementation of partial clone, only the server performs any
filtering, and we assume that the server possesses all objects (so it
does not need to filter out any missing objects).


Re: [PATCH v3 4/6] list-objects: filter objects in traverse_commit_list

2017-11-07 Thread Jonathan Tan
On Tue,  7 Nov 2017 19:35:44 +
Jeff Hostetler  wrote:

> +/*
> + * Reject the arg if it contains any characters that might
> + * require quoting or escaping when handing to a sub-command.
> + */
> +static int reject_injection_chars(const char *arg)
> +{
[snip]
> +}

Someone pointed me to quote.{c,h}, which is probably sufficient to
ensure shell safety if we do invoke subcommands through the shell. If
that is so, we probably don't need a blacklist.

Having said that, though, it might be safer to still introduce one, and
relax it later if necessary - it is much easier to relax a constraint
than to increase one.

> + } else if (skip_prefix(arg, "sparse:", )) {
> +
> + if (skip_prefix(v0, "oid=", )) {
> + struct object_context oc;
> + struct object_id sparse_oid;
> + filter_options->choice = LOFC_SPARSE_OID;
> + if (!get_oid_with_context(v1, GET_OID_BLOB,
> +   _oid, ))
> + filter_options->sparse_oid_value =
> + oiddup(_oid);
> + return 0;
> + }

In your recent e-mail [1], you said that you will change it to always pass
the original expression - is that still the plan?

[1] 
https://public-inbox.org/git/f698d5a8-bf31-cea1-a8da-88b755b0b...@jeffhostetler.com/

> +/* Remember to update object flag allocation in object.h */

You probably can delete this line.

> +/*
> + * FILTER_SHOWN_BUT_REVISIT -- we set this bit on tree objects
> + * that have been shown, but should be revisited if they appear
> + * in the traversal (until we mark it SEEN).  This is a way to
> + * let us silently de-dup calls to show() in the caller.

This is unclear to me at first reading. Maybe something like:

  FILTER_SHOWN_BUT_REVISIT -- we set this bit on tree objects that have
  been shown, but should not be skipped over if they reappear in the
  traversal. This ensures that the tree's descendants are re-processed
  if the tree reappears subsequently, and that the tree is not shown
  twice.

> + * This
> + * is subtly different from the "revision.h:SHOWN" and the
> + * "sha1_name.c:ONELINE_SEEN" bits.  And also different from
> + * the non-de-dup usage in pack-bitmap.c
> + */

Optional: I'm not sure if this comparison is useful. (Maybe it is useful
to others, though.)

> +/*
> + * A filter driven by a sparse-checkout specification to only
> + * include blobs that a sparse checkout would populate.
> + *
> + * The sparse-checkout spec can be loaded from a blob with the
> + * given OID or from a local pathname.  We allow an OID because
> + * the repo may be bare or we may be doing the filtering on the
> + * server.
> + */
> +struct frame {
> + /*
> +  * defval is the usual default include/exclude value that
> +  * should be inherited as we recurse into directories based
> +  * upon pattern matching of the directory itself or of a
> +  * containing directory.
> +  */
> + int defval;

Can this be an "unsigned defval : 1" as well? In the function below, I
see that you assign to an "int val" first (which can take -1, 0, and 1)
before assigning to this, so that is fine.

Also, maybe a better name would be "exclude", with the documentation:

  1 if the directory is excluded, 0 otherwise. Excluded directories will
  still be recursed through, because an "include" rule for an object
  might override an "exclude" rule for one of its ancestors.


[PATCH v2] Tests: clean up and document submodule helpers

2017-11-07 Thread Jonathan Tan
Factor out the commonalities from test_submodule_switch() and
test_submodule_forced_switch() in lib-submodule-update.sh, and document
their usage.

This also makes explicit (through the KNOWN_FAILURE_FORCED_SWITCH_TESTS
variable) the fact that, currently, all functionality tested using
test_submodule_forced_switch() do not correctly handle the situation in
which a submodule is replaced with an ordinary directory.

Signed-off-by: Jonathan Tan <jonathanta...@google.com>
---
Thanks for the review.

Change from v1:

- changed commit message title
- moved a test to the common function, so that it runs both on
  test_submodule_switch and test_submodule_forced_switch
---
 t/lib-submodule-update.sh | 251 +-
 1 file changed, 46 insertions(+), 205 deletions(-)

diff --git a/t/lib-submodule-update.sh b/t/lib-submodule-update.sh
index 2d26f8680..bb94c2320 100755
--- a/t/lib-submodule-update.sh
+++ b/t/lib-submodule-update.sh
@@ -306,9 +306,9 @@ test_submodule_content () {
 # to protect the history!
 #
 
-# Test that submodule contents are currently not updated when switching
-# between commits that change a submodule.
-test_submodule_switch () {
+# Internal function; use test_submodule_switch() or
+# test_submodule_forced_switch() instead.
+test_submodule_switch_common() {
command="$1"
# Appearing submodule #
# Switching to a commit letting a submodule appear creates empty dir ...
@@ -332,7 +332,7 @@ test_submodule_switch () {
test_submodule_content sub1 origin/add_sub1
)
'
-   # ... and doesn't care if it already exists ...
+   # ... and doesn't care if it already exists.
test_expect_$RESULT "$command: added submodule leaves existing empty 
directory alone" '
prolog &&
reset_work_tree_to no_submodule &&
@@ -347,19 +347,6 @@ test_submodule_switch () {
test_submodule_content sub1 origin/add_sub1
)
'
-   # ... unless there is an untracked file in its place.
-   test_expect_success "$command: added submodule doesn't remove untracked 
unignored file with same name" '
-   prolog &&
-   reset_work_tree_to no_submodule &&
-   (
-   cd submodule_update &&
-   git branch -t add_sub1 origin/add_sub1 &&
-   >sub1 &&
-   test_must_fail $command add_sub1 &&
-   test_superproject_content origin/no_submodule &&
-   test_must_be_empty sub1
-   )
-   '
# Replacing a tracked file with a submodule produces an empty
# directory ...
test_expect_$RESULT "$command: replace tracked file with submodule 
creates empty directory" '
@@ -441,6 +428,11 @@ test_submodule_switch () {
# submodule files with the newly checked out ones in the
# directory of the same name while it shouldn't.
RESULT="failure"
+   elif test "$KNOWN_FAILURE_FORCED_SWITCH_TESTS" = 1
+   then
+   # All existing tests that use test_submodule_forced_switch()
+   # require this.
+   RESULT="failure"
else
RESULT="success"
fi
@@ -522,7 +514,6 @@ test_submodule_switch () {
test_submodule_content sub1 origin/modify_sub1
)
'
-
# Updating a submodule to an invalid sha1 doesn't update the
# submodule's work tree, subsequent update will fail
test_expect_$RESULT "$command: modified submodule does not update 
submodule work tree to invalid commit" '
@@ -555,42 +546,51 @@ test_submodule_switch () {
'
 }
 
-# Test that submodule contents are currently not updated when switching
-# between commits that change a submodule, but throwing away local changes in
-# the superproject is allowed.
-test_submodule_forced_switch () {
+# Declares and invokes several tests that, in various situations, checks that
+# the provided transition function:
+#  - succeeds in updating the worktree and index of a superproject to a target
+#commit, or fails atomically (depending on the test situation)
+#  - if succeeds, the contents of submodule directories are unchanged
+#  - if succeeds, once "git submodule update" is invoked, the contents of
+#submodule directories are updated
+#
+# Use as follows:
+#
+# my_func () {
+#   target=$1
+#   # Do something here that updates the worktree and index to match target,
+#   # but not any submodule directories.
+# }
+# test_submodule_switch "my_func"
+test_submodule_switch () {
command="$1"
-   ##

[PATCH] Tests: document test_submodule_{,forced_}switch()

2017-11-06 Thread Jonathan Tan
Factor out the commonalities from test_submodule_switch() and
test_submodule_forced_switch() in lib-submodule-update.sh, and document
their usage.

This also makes explicit (through the KNOWN_FAILURE_FORCED_SWITCH_TESTS
variable) the fact that, currently, all functionality tested using
test_submodule_forced_switch() do not correctly handle the situation in
which a submodule is replaced with an ordinary directory.

Signed-off-by: Jonathan Tan <jonathanta...@google.com>
---
I find tests that use lib-submodule-update.sh difficult to understand
due to the lack of clarity of what test_submodule_switch() and others do
with their argument - I hope this will make things easier for future
readers.
---
 t/lib-submodule-update.sh | 250 +-
 1 file changed, 46 insertions(+), 204 deletions(-)

diff --git a/t/lib-submodule-update.sh b/t/lib-submodule-update.sh
index 2d26f8680..ca428fdc9 100755
--- a/t/lib-submodule-update.sh
+++ b/t/lib-submodule-update.sh
@@ -306,9 +306,9 @@ test_submodule_content () {
 # to protect the history!
 #
 
-# Test that submodule contents are currently not updated when switching
-# between commits that change a submodule.
-test_submodule_switch () {
+# Internal function; use test_submodule_switch() or
+# test_submodule_forced_switch() instead.
+test_submodule_switch_common() {
command="$1"
# Appearing submodule #
# Switching to a commit letting a submodule appear creates empty dir ...
@@ -332,7 +332,7 @@ test_submodule_switch () {
test_submodule_content sub1 origin/add_sub1
)
'
-   # ... and doesn't care if it already exists ...
+   # ... and doesn't care if it already exists.
test_expect_$RESULT "$command: added submodule leaves existing empty 
directory alone" '
prolog &&
reset_work_tree_to no_submodule &&
@@ -347,19 +347,6 @@ test_submodule_switch () {
test_submodule_content sub1 origin/add_sub1
)
'
-   # ... unless there is an untracked file in its place.
-   test_expect_success "$command: added submodule doesn't remove untracked 
unignored file with same name" '
-   prolog &&
-   reset_work_tree_to no_submodule &&
-   (
-   cd submodule_update &&
-   git branch -t add_sub1 origin/add_sub1 &&
-   >sub1 &&
-   test_must_fail $command add_sub1 &&
-   test_superproject_content origin/no_submodule &&
-   test_must_be_empty sub1
-   )
-   '
# Replacing a tracked file with a submodule produces an empty
# directory ...
test_expect_$RESULT "$command: replace tracked file with submodule 
creates empty directory" '
@@ -441,6 +428,11 @@ test_submodule_switch () {
# submodule files with the newly checked out ones in the
# directory of the same name while it shouldn't.
RESULT="failure"
+   elif test "$KNOWN_FAILURE_FORCED_SWITCH_TESTS" = 1
+   then
+   # All existing tests that use test_submodule_forced_switch()
+   # require this.
+   RESULT="failure"
else
RESULT="success"
fi
@@ -522,7 +514,6 @@ test_submodule_switch () {
test_submodule_content sub1 origin/modify_sub1
)
'
-
# Updating a submodule to an invalid sha1 doesn't update the
# submodule's work tree, subsequent update will fail
test_expect_$RESULT "$command: modified submodule does not update 
submodule work tree to invalid commit" '
@@ -538,59 +529,53 @@ test_submodule_switch () {
test_submodule_content sub1 origin/add_sub1
)
'
-   # Updating a submodule from an invalid sha1 doesn't update the
-   # submodule's work tree, subsequent update will succeed
-   test_expect_$RESULT "$command: modified submodule does not update 
submodule work tree from invalid commit" '
-   prolog &&
-   reset_work_tree_to invalid_sub1 &&
-   (
-   cd submodule_update &&
-   git branch -t valid_sub1 origin/valid_sub1 &&
-   $command valid_sub1 &&
-   test_superproject_content origin/valid_sub1 &&
-   test_dir_is_empty sub1 &&
-   git submodule update --init --recursive &&
-   test_submodule_content sub1 origin/valid_sub1
-   )

Re: [PATCH 1/9] extension.partialclone: introduce partial clone extension

2017-11-06 Thread Jonathan Tan
On Mon, 6 Nov 2017 12:32:45 -0500
Jeff Hostetler  wrote:

> >> Yes, that is a point I wanted to ask about.  I renamed the
> >> extensions.partialclone that you created and then I moved your
> >> remote..blob-max-bytes setting to be in extensions too.
> >> Moving it to core.partialclonefilter is fine.
> > 
> > OK - in that case, it might be easier to just reuse my first patch in
> > its entirety. "core.partialclonefilter" is not used until the
> > fetching/cloning part anyway.
> > 
> 
> Good point.  I'll take a look at refactoring that.
> If it looks like the result will be mostly/effectively
> your original patches, I'll let you know and hand part 2
> back to you.

Sounds good. I uploaded the result of rebasing my part 2 patches on top
of your part 1 patches here, if you would like it as a reference:

https://github.com/jonathantanmy/git/commits/pc20171106


Re: [PATCH v2 4/6] list-objects: filter objects in traverse_commit_list

2017-11-06 Thread Jonathan Tan
On Mon, 6 Nov 2017 12:51:52 -0500
Jeff Hostetler  wrote:

> Jonathan and I were talking off-list about the performance
> effects of inspecting the pathnames to identify the ".git*"
> special files. I added it in my first draft back in the spring,
> thinking that even if you set the blob-limit to a small
> number (or zero), you'd probably still always want the
> .gitattribute and .gitignore files.  But now with the addition
> of the sparse filter and functional dynamic object fetching,
> I'm not sure I see the need for this.
> 
> Also, if the primary use of the blob-limit is to filter out
> giant binary assets, it is unlikely anyone is going to have
> a 1MB+ .git* file, so it is unlikely that the is_special_file
> would include anything that wouldn't already be included by
> the size criteria.
> 
> So, if there's no objections, I think I'll remove this and
> simplify the blob-limit filter function.  (That would let me
> get rid of the provisional omit code here.)

This sounds like a good idea to me. (For the record, one of the
performance impacts of checking the filename is that bitmaps can't be
used to obtain a whitelist of what is to be packed - instead, a regular
object walk must be used.)


Re: [PATCH 04/14] fetch: add object filtering for partial fetch

2017-11-03 Thread Jonathan Tan
I did some of my own investigation and have a working (i.e. passing
tests) version of this patch here:

https://github.com/jonathantanmy/git/commits/pc20171103

If you want, you can use that, or incorporate the changes therein here.
I'll also remark on my findings inline.

On Thu,  2 Nov 2017 20:31:19 +
Jeff Hostetler  wrote:

> @@ -754,6 +757,7 @@ static int store_updated_refs(const char *raw_url, const 
> char *remote_name,
>   const char *filename = dry_run ? "/dev/null" : git_path_fetch_head();
>   int want_status;
>   int summary_width = transport_summary_width(ref_map);
> + struct check_connected_options opt = CHECK_CONNECTED_INIT;
>  
>   fp = fopen(filename, "a");
>   if (!fp)
> @@ -765,7 +769,7 @@ static int store_updated_refs(const char *raw_url, const 
> char *remote_name,
>   url = xstrdup("foreign");
>  
>   rm = ref_map;
> - if (check_connected(iterate_ref_map, , NULL)) {
> + if (check_connected(iterate_ref_map, , )) {

opt here is unchanged from CHECK_CONNECTED_INIT, so this change is unnecessary.

>   rc = error(_("%s did not send all necessary objects\n"), url);
>   goto abort;
>   }
> @@ -1044,6 +1048,9 @@ static struct transport *prepare_transport(struct 
> remote *remote, int deepen)
>   set_option(transport, TRANS_OPT_DEEPEN_RELATIVE, "yes");
>   if (update_shallow)
>   set_option(transport, TRANS_OPT_UPDATE_SHALLOW, "yes");
> + if (filter_options.choice)
> + set_option(transport, TRANS_OPT_LIST_OBJECTS_FILTER,
> +filter_options.raw_value);

You'll also need to set TRANS_OPT_FROM_PROMISOR.

> @@ -1242,6 +1249,20 @@ static int fetch_multiple(struct string_list *list)
>   int i, result = 0;
>   struct argv_array argv = ARGV_ARRAY_INIT;
>  
> + if (filter_options.choice) {
> + /*
> +  * We currently only support partial-fetches to the remote
> +  * used for the partial-clone because we only support 1
> +  * promisor remote, so we DO NOT allow explicit command
> +  * line filter arguments.
> +  *
> +  * Note that the loop below will spawn background fetches
> +  * for each remote and one of them MAY INHERIT the proper
> +  * partial-fetch settings, so everything is consistent.
> +  */
> + die(_("partial-fetch is not supported on multiple remotes"));
> + }
> +
>   if (!append && !dry_run) {
>   int errcode = truncate_fetch_head();
>   if (errcode)

My intention in doing the "fetch: refactor calculation of remote list"
patch is so that the interaction between the provided list of remotes
and the specification of the filter can be handled using the following
diff:

-   if (remote)
+   if (remote) {
+   if (filter_options.choice &&
+   strcmp(remote->name, 
repository_format_partial_clone_remote))
+   die(_("--blob-max-bytes can only be used with the 
remote configured in core.partialClone"));
result = fetch_one(remote, argc, argv);
-   else
+   } else {
+   if (filter_options.choice)
+   die(_("--blob-max-bytes can only be used with the 
remote configured in core.partialClone"));
result = fetch_multiple();
+   }

(Ignore the "blob-max-bytes" in the error message - that needs to be
updated.)

The GitHub link I provided above has this diff, and it seems to work.


Re: [PATCH 02/14] clone, fetch-pack, index-pack, transport: partial clone

2017-11-03 Thread Jonathan Tan
On Thu,  2 Nov 2017 20:31:17 +
Jeff Hostetler  wrote:

> From: Jeff Hostetler 
> 
> Signed-off-by: Jeff Hostetler 
> ---
>  builtin/clone.c  |  9 +
>  builtin/fetch-pack.c |  4 
>  builtin/index-pack.c | 10 ++
>  fetch-pack.c | 13 +
>  fetch-pack.h |  2 ++
>  transport-helper.c   |  5 +
>  transport.c  |  4 
>  transport.h  |  5 +
>  8 files changed, 52 insertions(+)

I managed to take a look at some of these patches. Firstly, consider
separating out the clone part, since it will not be tested until a few
patches later.

> diff --git a/builtin/index-pack.c b/builtin/index-pack.c
> index a0a35e6..31cd5ba 100644
> --- a/builtin/index-pack.c
> +++ b/builtin/index-pack.c
> @@ -222,6 +222,16 @@ static unsigned check_object(struct object *obj)
>   if (!(obj->flags & FLAG_CHECKED)) {
>   unsigned long size;
>   int type = sha1_object_info(obj->oid.hash, );
> +
> + if (type <= 0) {
> + /*
> +  * TODO Use the promisor code to conditionally
> +  * try to fetch this object -or- assume it is ok.
> +  */
> + obj->flags |= FLAG_CHECKED;
> + return 0;
> + }
> +
>   if (type <= 0)
>   die(_("did not receive expected object %s"),
> oid_to_hex(>oid));

This causes some repo corruption tests to fail.

If I remove this and rebase the fetch-pack tests on top [1], the tests
pass, so this might not be necessary (for now, at least).

[1] https://github.com/jonathantanmy/git/commits/pc20171103


Re: [PATCH 1/9] extension.partialclone: introduce partial clone extension

2017-11-03 Thread Jonathan Tan
On Fri, 3 Nov 2017 09:57:18 -0400
Jeff Hostetler <g...@jeffhostetler.com> wrote:

> On 11/2/2017 6:24 PM, Jonathan Tan wrote:
> > On Thu,  2 Nov 2017 20:20:44 +
> > Jeff Hostetler <g...@jeffhostetler.com> wrote:
> > 
> >> From: Jeff Hostetler <jeffh...@microsoft.com>
> >>
> >> Introduce the ability to have missing objects in a repo.  This
> >> functionality is guarded by new repository extension options:
> >>  `extensions.partialcloneremote` and
> >>  `extensions.partialclonefilter`.
> > 
> > With this, it is unclear what happens if extensions.partialcloneremote
> > is not set but extensions.partialclonefilter is set. For something as
> > significant as a repository extension (which Git uses to determine if it
> > will even attempt to interact with a repo), I think - I would prefer
> > just extensions.partialclone (or extensions.partialcloneremote, though I
> > prefer the former) which determines the remote (the important part,
> > which controls the dynamic object fetching), and have another option
> > "core.partialclonefilter" which is only useful if
> > "extensions.partialclone" is set.
> 
> Yes, that is a point I wanted to ask about.  I renamed the
> extensions.partialclone that you created and then I moved your
> remote..blob-max-bytes setting to be in extensions too.
> Moving it to core.partialclonefilter is fine.

OK - in that case, it might be easier to just reuse my first patch in
its entirety. "core.partialclonefilter" is not used until the
fetching/cloning part anyway.

I agree that "core.partialclonefilter" (or another place not in
"remote") instead of "remote..blob-max-bytes" is a good idea - in
the future, we might want to reuse the same filter setting for
non-fetching functionality.


Re: [PATCH 00/14] WIP Partial clone part 3: clone, fetch, fetch-pack, upload-pack, and tests

2017-11-02 Thread Jonathan Tan
On Thu,  2 Nov 2017 20:31:15 +
Jeff Hostetler  wrote:

> From: Jeff Hostetler 
> 
> This is part 3 of 3 for partial clone.
> It assumes that part 1 [1] and part 2 [2] are in place.
> 
> Part 3 is concerned with the commands: clone, fetch, upload-pack, fetch-pack,
> remote-curl, index-pack, and the pack-protocol.
> 
> Jonathan and I independently started on this task.  This is a first
> pass at merging those efforts.  So there are several places that need
> refactoring and cleanup.  In particular, the test cases should be
> squashed and new tests added.

Thanks. What are your future plans with this patch set? In particular, the
tests don't pass at HEAD^.

I took a quick glance to see if there were any issues that I could
immediately spot, but couldn't find any. I thought of fetch_if_missing,
but it seems that it is indeed used in this patch set (as expected).

I'll look at it more thorougly, and feel free to let me know if there is
anything in particular you would like comments on.


Re: [PATCH 1/9] extension.partialclone: introduce partial clone extension

2017-11-02 Thread Jonathan Tan
On Thu,  2 Nov 2017 20:20:44 +
Jeff Hostetler  wrote:

> From: Jeff Hostetler 
> 
> Introduce the ability to have missing objects in a repo.  This
> functionality is guarded by new repository extension options:
> `extensions.partialcloneremote` and
> `extensions.partialclonefilter`.

With this, it is unclear what happens if extensions.partialcloneremote
is not set but extensions.partialclonefilter is set. For something as
significant as a repository extension (which Git uses to determine if it
will even attempt to interact with a repo), I think - I would prefer
just extensions.partialclone (or extensions.partialcloneremote, though I
prefer the former) which determines the remote (the important part,
which controls the dynamic object fetching), and have another option
"core.partialclonefilter" which is only useful if
"extensions.partialclone" is set.

I also don't think extensions.partialclonefilter (or
core.partialclonefilter, if we use my suggestion) needs to be introduced
so early in the patch set when it will only be used once we start
fetching/cloning.

> +void partial_clone_utils_register(
> + const struct list_objects_filter_options *filter_options,
> + const char *remote,
> + const char *cmd_name)
> +{

This function is useful once we have fetch/clone, but probably not
before that. Since the fetch/clone patches are several patches ahead,
could this be moved there?

> @@ -420,6 +420,19 @@ static int check_repo_format(const char *var, const char 
> *value, void *vdata)
>   ;
>   else if (!strcmp(ext, "preciousobjects"))
>   data->precious_objects = git_config_bool(var, value);
> +
> + else if (!strcmp(ext, KEY_PARTIALCLONEREMOTE))
> + if (!value)
> + return config_error_nonbool(var);
> + else
> + data->partial_clone_remote = xstrdup(value);
> +
> + else if (!strcmp(ext, KEY_PARTIALCLONEFILTER))
> + if (!value)
> + return config_error_nonbool(var);
> + else
> + data->partial_clone_filter = xstrdup(value);
> +
>   else
>   string_list_append(>unknown_extensions, ext);
>   } else if (strcmp(var, "core.bare") == 0) {

With a complicated block, probably better to use braces in these
clauses.


Re: [PATCH v2 0/6] Partial clone part 1: object filtering

2017-11-02 Thread Jonathan Tan
On Thu,  2 Nov 2017 17:50:07 +
Jeff Hostetler  wrote:

> From: Jeff Hostetler 
> 
> Here is V2 of the list-object filtering. It replaces [1]
> and reflect a refactoring and simplification of the original.

Thanks, overall this looks quite good. I reviewed patches 2-6 (skipping
1 since it's already in next), made my comments on 4, and don't have any
for the rest (besides what's below).

> I've added "--filter-ignore-missing" parameter to rev-list and
> pack-objects to ignore missing objects rather than error out.
> This allows this patch series to better stand on its own eliminates
> the need in part 1 for "patch 9" from V1.
> 
> This is a brute force ignore all missing objects.  Later, in part
> 2 or part 3 when --exclude-promisor-objects is introduced, we will
> be able to ignore EXPECTED missing objects.

(This is regarding patches 5 and 6.) Is the intention to support both
flags? (That is, --ignore-missing to ignore without checking whether the
object being missing is not unexpected, and --exclude-promisor-objects
to check and ignore.)


Re: [PATCH v2 4/6] list-objects: filter objects in traverse_commit_list

2017-11-02 Thread Jonathan Tan
On Thu,  2 Nov 2017 17:50:11 +
Jeff Hostetler  wrote:

> +int parse_list_objects_filter(struct list_objects_filter_options 
> *filter_options,
> +   const char *arg)

Returning void is fine, I think. It seems that all your code paths
either return 0 or die.

> +{
> + struct object_context oc;
> + struct object_id sparse_oid;
> + const char *v0;
> + const char *v1;
> +
> + if (filter_options->choice)
> + die(_("multiple object filter types cannot be combined"));
> +
> + /*
> +  * TODO consider rejecting 'arg' if it contains any
> +  * TODO injection characters (since we might send this
> +  * TODO to a sub-command or to the server and we don't
> +  * TODO want to deal with legacy quoting/escaping for
> +  * TODO a new feature).
> +  */
> +
> + filter_options->raw_value = strdup(arg);
> +
> + if (skip_prefix(arg, "blob:", ) || skip_prefix(arg, "blobs:", )) {
> + if (!strcmp(v0, "none")) {
> + filter_options->choice = LOFC_BLOB_NONE;
> + return 0;
> + }
> +
> + if (skip_prefix(v0, "limit=", ) &&
> + git_parse_ulong(v1, _options->blob_limit_value)) {
> + filter_options->choice = LOFC_BLOB_LIMIT;
> + return 0;
> + }
> + }
> + else if (skip_prefix(arg, "sparse:", )) {

Style: join the 2 lines above.

> + if (skip_prefix(v0, "oid=", )) {
> + filter_options->choice = LOFC_SPARSE_OID;
> + if (!get_oid_with_context(v1, GET_OID_BLOB,
> +   _oid, )) {
> + /*
> +  * We successfully converted the 
> +  * into an actual OID.  Rewrite the raw_value
> +  * in canonoical form with just the OID.
> +  * (If we send this request to the server, we
> +  * want an absolute expression rather than a
> +  * local-ref-relative expression.)
> +  */

I think this would lead to confusing behavior - for example, a fetch
with "--filter=oid=mybranch:sparseconfig" would have different results
depending on whether "mybranch" refers to a valid object locally.

The way I see it, this should either (i) only accept full 40-character
OIDs or (ii) retain the raw string to be interpreted only when the
filtering is done. (i) is simpler and safer, but is not so useful. In
both cases, if the user really wants client-side interpretation, they
can still use "$(git rev-parse foo)" to make it explicit.

> + free((char *)filter_options->raw_value);
> + filter_options->raw_value =
> + xstrfmt("sparse:oid=%s",
> + oid_to_hex(_oid));
> + filter_options->sparse_oid_value =
> + oiddup(_oid);
> + } else {
> + /*
> +  * We could not turn the  into an
> +  * OID.  Leave the raw_value as is in case
> +  * the server can parse it.  (It may refer to
> +  * a branch, commit, or blob we don't have.)
> +  */
> + }
> + return 0;
> + }
> +
> + if (skip_prefix(v0, "path=", )) {
> + filter_options->choice = LOFC_SPARSE_PATH;
> + filter_options->sparse_path_value = strdup(v1);
> + return 0;
> + }
> + }
> +
> + die(_("invalid filter expression '%s'"), arg);
> + return 0;
> +}

[snip]

> +void arg_format_list_objects_filter(
> + struct argv_array *argv_array,
> + const struct list_objects_filter_options *filter_options)

Is this function used anywhere (in this patch or subsequent patches)?

> diff --git a/list-objects-filter.c b/list-objects-filter.c
> +/* See object.h and revision.h */
> +#define FILTER_REVISIT (1<<25)

Looking later in the code, this flag indicates that a tree has been
SHOWN, so it might be better to just call this FILTER_SHOWN.

Also document this flag in object.h in the table above FLAG_BITS.

> +static enum list_objects_filter_result filter_blobs_limit(
> + enum list_objects_filter_type filter_type,
> + struct object *obj,
> + const char *pathname,
> + const char *filename,
> + void *filter_data_)
> +{
> + struct filter_blobs_limit_data *filter_data = filter_data_;
> + unsigned long object_length;
> + enum object_type t;
> + int is_special_filename;
> +
> + switch (filter_type) {
> + default:
> +  

Re: What's cooking in git.git (Oct 2017, #07; Mon, 30)

2017-11-01 Thread Jonathan Tan
On Wed, 01 Nov 2017 10:21:20 +0900
Junio C Hamano  wrote:

> Jeff Hostetler  writes:
> 
> >> Yes, that, together with the expectation that I will hear from both you 
> >> and JTan 
> >> once the result of combined effort becomes ready to replace this 
> >> placeholder, 
> >> matches my assumption.
> >> 
> >> Is that happening now?
> >
> > Yes, I'm merging our them now and hope to have a version to
> > send to Jonathan and/or the list sometime this week.
> 
> Thanks.

Junio, would you prefer that the combined effort be in one single patch
series or separated out into 3? The way I see it, there are two
independent patch series - this one (object filter support in rev-list
and pack-objects) and my one (repo extension for partial clone, fsck,
and gc), and one patch series that depends on these two.

I prefer to have smaller patch series as a patch author (for example,
less need to update later patches due to a design or API issue caught in
review), as a reviewer (for example, less things to keep in mind as I
review things patch by patch), and as an experimenter (assuming that
smaller patch series go through the "next" -> "master" process faster,
this means fewer locally-applied patches to juggle when testing out a
new feature, as Jonathan Nieder said [1]).

I understand if you prefer each patch set to have at least one useful
self-contained feature (or bug fix), though. (Having said that, I think
that this patch set, at least - object filter support in rev-list and
pack-objects - is useful enough on its own, but I understand if it
doesn't reach the bar.)

[1] 
https://public-inbox.org/git/20171030222726.g26nryjxktyj2...@aiede.mtv.corp.google.com/


Re: [PATCH 00/13] WIP Partial clone part 1: object filtering

2017-10-25 Thread Jonathan Tan
On Tue, Oct 24, 2017 at 10:00 PM, Junio C Hamano  wrote:
> OK, thanks for working well together.  So does this (1) build on
> Jonathan's fsck-squelching series, or (2) ignores that and builds
> filtering first, potentially leaving the codebase to a broken state
> where it can create fsck-unclean repository until Jonathan's series
> is rebased on top of this, or (3) something else?  [*1*]

Excluding the partialclone patch (patch 9), I think that the answer is
(2), but I don't think that it leaves the codebase in a broken state.
In particular, none of the code modifies the repo, so it can't create
a fsck-unclean one.

Maybe one could say that this leaves the codebase with features that
no one will ever use in the absence of partial clone, but I don't
think that's true - rev-list with blob-size/sparse-specification
filter seems independently useful, at least (for example, when
desiring to operate on a repo in a sparse way without going through a
workdir), and if we're planning to allow listing of objects, we
probably should allow packing as well (especially since this doesn't
add much code).

The above is relevant only if we can exclude the partialclone patch,
but I think that we can and we should, as I wrote in my reply to Jeff
Hostetler [1].

As for how this patch set (excluding the partialclone patch) interacts
with my fsck series, they are relatively independent, as far as I can
tell. I'll rebase my fsck, gc, and lazy object fetch patches (but not
the fetch and clone parts, which we plan to instead adapt from Jeff
Hostetler's patches, as far as I know) on top of these and resend
those out once discussion on this has settled.

[1] 
https://public-inbox.org/git/CAGf8dg+8AR=xfsv92odatktnjbnd1+ovzp9rs4y4otz_ezy...@mail.gmail.com/

> I also saw a patch marked as "this is from Jonathan's earlier work",
> taking the authorship (which to me implies that the changes were
> extensive enough), so I am a bit at loss envisioning how this piece
> fits in the bigger picture together with the other piece.

The patch you mentioned is the partialclone patch, which I think can
be considered separately from the rest (as I said above).


Re: [PATCH 00/13] WIP Partial clone part 1: object filtering

2017-10-24 Thread Jonathan Tan
On Tue, Oct 24, 2017 at 11:53 AM, Jeff Hostetler <g...@jeffhostetler.com> wrote:
> From: Jeff Hostetler <jeffh...@microsoft.com>
>
> I've been working with Jonathan Tan to combine our partial clone
> proposals.  This patch series represents a first step in that effort
> and introduces an object filtering mechanism to select unwanted
> objects.
>
> [1] traverse_commit_list and list-objects is extended to allow
> various filters.
> [2] rev-list is extended to expose filtering.  This allows testing
> of the filtering options.  And can be used later to predict
> missing objects before commands like checkout or merge.
> [3] pack-objects is extended to use filtering parameters and build
> packfiles that omit unwanted objects.
>
> This patch series lays the ground work for subsequent parts which
> will extend clone, fetch, fetch-pack, upload-pack, fsck, and etc.

Thanks - I've taken a look at all of them except for the partialclone
extension one, which I've only glanced over briefly. Apart from some
style issues (indentation and typedef-ing of enums) I think that they
generally look all right.

One possible issue with using a formatted filter string (e.g.
blob:none) directly passed to the server as opposed to actual
client-interpreted flags (e.g. --blob-byte-limit=0) is that a user may
be confused if the version of Git they're using supports fancier
filters, which will work if they're running rev-list locally, but not
when they're fetching from a not-so-fancy Git server. Maybe that is
fine, though - something we've discussed internally is an ability to
offload some calculations (e.g. git log -S) to Git servers, and if we
end up doing something similar to that, users will need to deal with
this problem anyway.

The reason why I only glanced over the partialclone patch is because I
think that that change needs more discussion than the rest, and it
would be good to get the others in first. I know that you included the
partialclone patch because it is needed by the rev-list one, but as I
commented in the rev-list one, I think that it does not need to be
aware of partial clones, at least for now. The overall ideas in the
partialclone patch seem good, though - in particular, one conceptual
difference from my patch [1] is that the filter setting is a property
of the repository as opposed to the remote, which does seem like an
improvement, because it makes it easier to make and explain e.g. a
"git log --use-repo-filter -S" command that uses the preconfigured
config.

[1] 
https://public-inbox.org/git/407a298b52a9e0a2ee4135fe844e35b9a14c0f7b.1506714999.git.jonathanta...@google.com/


Re: [PATCH 10/13] rev-list: add list-objects filtering support

2017-10-24 Thread Jonathan Tan
On Tue, Oct 24, 2017 at 11:53 AM, Jeff Hostetler  wrote:
>  static void finish_object(struct object *obj, const char *name, void 
> *cb_data)
>  {
> struct rev_list_info *info = cb_data;
> -   if (obj->type == OBJ_BLOB && !has_object_file(>oid))
> +   if (obj->type == OBJ_BLOB && !has_object_file(>oid)) {
> +   if (arg_print_missing) {
> +   list_objects_filter_map_insert(
> +   _objects, >oid, name, obj->type);
> +   return;
> +   }
> +
> +   /*
> +* Relax consistency checks when we expect missing
> +* objects because of partial-clone or a previous
> +* partial-fetch.
> +*
> +* Note that this is independent of any filtering that
> +* we are doing in this run.
> +*/
> +   if (is_partial_clone_registered())
> +   return;
> +
> die("missing blob object '%s'", oid_to_hex(>oid));

I'm fine with arg_print_missing suppressing lazy fetching (when I
rebase my patches on this, I'll have to ensure that fetch_if_missing
is set to 0 if arg_print_missing is true), but I think that the
behavior when arg_print_missing is false should be the opposite - we
should let has_object_file() perform the lazy fetching, and die if it
returns false (that is, if the fetching failed).

> +   }
> if (info->revs->verify_objects && !obj->parsed && obj->type != 
> OBJ_COMMIT)
> parse_object(>oid);
>  }


Re: [PATCH 08/13] list-objects: add traverse_commit_list_filtered method

2017-10-24 Thread Jonathan Tan
On Tue, Oct 24, 2017 at 11:53 AM, Jeff Hostetler  wrote:
> +void traverse_commit_list_filtered(
> +   struct list_objects_filter_options *filter_options,
> +   struct rev_info *revs,
> +   show_commit_fn show_commit,
> +   show_object_fn show_object,
> +   list_objects_filter_map_foreach_cb print_omitted_object,
> +   void *show_data);

So the function call chain, if we wanted a filtered traversal, is:
traverse_commit_list_filtered -> traverse_commit_list__sparse_path
(and friends, and each algorithm is in its own file) ->
traverse_commit_list_worker

This makes the implementation of each algorithm more easily understood
(since they are all in their own files), but also increases the number
of global functions and code files. I personally would combine the
traverse_commit_list__* functions into one file
(list-objects-filtered.c), make them static, and also put
traverse_commit_list_filtered in there, but I understand that other
people in the Git project may differ on this.


Re: [PATCH 07/13] list-objects-filter-options: common argument parsing

2017-10-24 Thread Jonathan Tan
On Tue, Oct 24, 2017 at 11:53 AM, Jeff Hostetler  wrote:
> + *  ::= blob:none
> + *   blob:limit:[kmg]
> + *   sparse:oid:
> + *   sparse:path:

I notice in the code below that there are some usages of "=" instead
of ":" - could you clarify which one it is? (Ideally this would point
to one point of documentation which serves as both user and technical
documentation.)

> + */
> +int parse_list_objects_filter(struct list_objects_filter_options 
> *filter_options,
> + const char *arg)
> +{
> +   struct object_context oc;
> +   struct object_id sparse_oid;
> +   const char *v0;
> +   const char *v1;
> +
> +   if (filter_options->choice)
> +   die(_("multiple object filter types cannot be combined"));
> +
> +   /*
> +* TODO consider rejecting 'arg' if it contains any
> +* TODO injection characters (since we might send this
> +* TODO to a sub-command or to the server and we don't
> +* TODO want to deal with legacy quoting/escaping for
> +* TODO a new feature).
> +*/
> +
> +   filter_options->raw_value = strdup(arg);
> +
> +   if (skip_prefix(arg, "blob:", ) || skip_prefix(arg, "blobs:", 
> )) {

I know that some people prefer leniency, but I think it's better to
standardize on one form ("blob" instead of both "blob" and "blobs").

> +   if (!strcmp(v0, "none")) {
> +   filter_options->choice = LOFC_BLOB_NONE;
> +   return 0;
> +   }
> +
> +   if (skip_prefix(v0, "limit=", ) &&
> +   git_parse_ulong(v1, _options->blob_limit_value)) {
> +   filter_options->choice = LOFC_BLOB_LIMIT;
> +   return 0;
> +   }
> +   }
> +   else if (skip_prefix(arg, "sparse:", )) {
> +   if (skip_prefix(v0, "oid=", )) {
> +   filter_options->choice = LOFC_SPARSE_OID;
> +   if (!get_oid_with_context(v1, GET_OID_BLOB,
> + _oid, )) {
> +   /*
> +* We successfully converted the 
> +* into an actual OID.  Rewrite the raw_value
> +* in canonoical form with just the OID.
> +* (If we send this request to the server, we
> +* want an absolute expression rather than a
> +* local-ref-relative expression.)
> +*/
> +   free((char *)filter_options->raw_value);
> +   filter_options->raw_value =
> +   xstrfmt("sparse:oid=%s",
> +   oid_to_hex(_oid));
> +   filter_options->sparse_oid_value =
> +   oiddup(_oid);
> +   } else {
> +   /*
> +* We could not turn the  into an
> +* OID.  Leave the raw_value as is in case
> +* the server can parse it.  (It may refer to
> +* a branch, commit, or blob we don't have.)
> +*/
> +   }
> +   return 0;
> +   }
> +
> +   if (skip_prefix(v0, "path=", )) {
> +   filter_options->choice = LOFC_SPARSE_PATH;
> +   filter_options->sparse_path_value = strdup(v1);
> +   return 0;
> +   }
> +   }
> +
> +   die(_("invalid filter expression '%s'"), arg);
> +   return 0;
> +}
> +
> +int opt_parse_list_objects_filter(const struct option *opt,
> + const char *arg, int unset)
> +{
> +   struct list_objects_filter_options *filter_options = opt->value;
> +
> +   assert(arg);
> +   assert(!unset);
> +
> +   return parse_list_objects_filter(filter_options, arg);
> +}
> diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h
> new file mode 100644
> index 000..23bd68e
> --- /dev/null
> +++ b/list-objects-filter-options.h
> @@ -0,0 +1,50 @@
> +#ifndef LIST_OBJECTS_FILTER_OPTIONS_H
> +#define LIST_OBJECTS_FILTER_OPTIONS_H
> +
> +#include "parse-options.h"
> +
> +/*
> + * Common declarations and utilities for filtering objects (such as omitting
> + * large blobs) in list_objects:traverse_commit_list() and git-rev-list.
> + */
> +
> +enum list_objects_filter_choice {
> +   LOFC_DISABLED = 0,
> +   LOFC_BLOB_NONE,
> +   LOFC_BLOB_LIMIT,
> +   LOFC_SPARSE_OID,
> +   LOFC_SPARSE_PATH,
> +};
> +
> +struct list_objects_filter_options {
> +   /*
> +

Re: [PATCH 03/13] list-objects: filter objects in traverse_commit_list

2017-10-24 Thread Jonathan Tan
On Tue, Oct 24, 2017 at 11:53 AM, Jeff Hostetler  wrote:

> +enum list_objects_filter_result {
> +   LOFR_ZERO  = 0,
> +   LOFR_MARK_SEEN = 1<<0,

Probably worth documenting, something like /* Mark this object so that
it is skipped for the rest of the traversal. */

> +   LOFR_SHOW  = 1<<1,

And something like /* Invoke show_object_fn on this object. This
object may be revisited unless LOFR_MARK_SEEN is also set. */

> +};
> +
> +/* See object.h and revision.h */
> +#define FILTER_REVISIT (1<<25)

I think this should be declared closer to its use - in the sparse
filter code or in the file that uses it. Wherever it is, also update
the chart in object.h to indicate that we're using this 25th bit.

> +
> +enum list_objects_filter_type {
> +   LOFT_BEGIN_TREE,
> +   LOFT_END_TREE,
> +   LOFT_BLOB
> +};
> +
> +typedef enum list_objects_filter_result list_objects_filter_result;
> +typedef enum list_objects_filter_type list_objects_filter_type;

I don't think we typedef enums in Git code.

> +
> +typedef list_objects_filter_result (*filter_object_fn)(
> +   list_objects_filter_type filter_type,
> +   struct object *obj,
> +   const char *pathname,
> +   const char *filename,
> +   void *filter_data);
> +
> +void traverse_commit_list_worker(
> +   struct rev_info *,
> +   show_commit_fn, show_object_fn, void *show_data,
> +   filter_object_fn filter, void *filter_data);

I think things would be much clearer if a default filter was declared
(matching the behavior as of this patch when filter == NULL), say:
static inline default_filter(args) { switch(filter_type) { case
LOFT_BEGIN_TREE: return LOFR_MARK_SEEN | LOFR_SHOW; case
LOFT_END_TREE: return LOFT_ZERO; ...

And inline traverse_commit_list() instead of putting it in the .c file.

This would reduce or eliminate the need to document
traverse_commit_list_worker, including what happens if filter is NULL,
and explain how a user would make their own filter_object_fn.

> +
> +#endif /* LIST_OBJECTS_H */
> --
> 2.9.3
>


Re: [PATCH 3/5] ssh: 'auto' variant to select between 'ssh' and 'simple'

2017-10-23 Thread Jonathan Tan
On Mon, 23 Oct 2017 15:51:06 -0700
Brandon Williams  wrote:

> On 10/23, Jonathan Nieder wrote:
> > Separately from how to document it, what do you think a good behavior
> > would be?  Should the "auto" configuration trigger command line based
> > detection just like no configuration at all?  Should the "auto" value
> > for configuration be removed and that behavior restricted to the
> > no-configuration case?
> > 
> > I'm tempted to go with the former, which would look like the following.
> > What do you think?
> 
> As a user having some variant as 'auto' doesn't make much sense, i mean
> isn't that exactly what the default behavior is?

So you're suggesting the second option ("that behavior restricted to the
no-configuration case")?

I'm leaning towards supporting "auto", actually. At the very least, it
gives the user a clear way to override an existing config.

> Check if my ssh
> command matches existing variants and go with that.  What you are
> proposing is the make the existing auto detection better (yay!) though I
> don't know if it warrants adding a new variant all together.
> 
> Instead it may be better to stick this new improved detection at the end
> of the existing variant discovery function 'determine_ssh_variant()' as
> a last ditch effort to figure out the variant.  That way we don't have
> an extra variant type that can be configured and eliminates some of the
> additional code in the switch statements to handle that enum value
> (though that isn't really that big of a deal).

This sounds like what is already being done in the code.

> > If this looks good, I can reroll in a moment.

Yes, this looks good.


Re: [PATCH 3/5] ssh: 'auto' variant to select between 'ssh' and 'simple'

2017-10-23 Thread Jonathan Tan
On Mon, 23 Oct 2017 14:31:59 -0700
Jonathan Nieder  wrote:

> @@ -2083,14 +2083,19 @@ visited as a result of a redirection do not 
> participate in matching.
>  ssh.variant::
>   Depending on the value of the environment variables `GIT_SSH` or
>   `GIT_SSH_COMMAND`, or the config setting `core.sshCommand`, Git
> - auto-detects whether to adjust its command-line parameters for use
> - with ssh (OpenSSH), plink or tortoiseplink, as opposed to the default
> - (simple).
> + auto-detects whether to pass command-line parameters for use
> + with a simple wrapper script (simple), OpenSSH (ssh), plink, or
> + tortoiseplink.
> ++
> +The default is `auto`, which means to auto-detect whether the ssh command
> +implements OpenSSH options using the `-G` (print configuration) option.
> +If the ssh command supports OpenSSH options, it then behaves like `ssh`;
> +otherwise, it behaves like `simple`.
>  +
>  The config variable `ssh.variant` can be set to override this auto-detection;
> -valid values are `ssh`, `simple`, `plink`, `putty` or `tortoiseplink`. Any
> -other value will be treated as normal ssh. This setting can be overridden via
> -the environment variable `GIT_SSH_VARIANT`.
> +valid values are `ssh`, `simple`, `plink`, `putty`, `tortoiseplink`, and
> +`auto`.  Any other value will be treated as normal ssh.  This setting can be
> +overridden via the environment variable `GIT_SSH_VARIANT`.

The new documentation seems to imply that setting ssh.variant (or
GIT_SSH_VARIANT) to "auto" is equivalent to not setting it at all, but
looking at the code, it doesn't seem to be the case (not setting it at
all invokes checking the first word of core.sshCommand, and only uses
VARIANT_AUTO if that check is inconclusive, whereas setting
ssh.variant=auto skips the core.sshCommand check entirely).

Maybe document ssh.variant as follows:

If unset, Git will determine the command-line arguments to use based
on the basename of the configured SSH command (through the
environment variable `GIT_SSH` or `GIT_SSH_COMMAND`, or the config
setting `core.sshCommand`). If the basename is unrecognized, Git
will attempt to detect support of OpenSSH options by first invoking
the configured SSH command with the `-G` (print configuration) flag,
and will subsequently use OpenSSH options (upon success) or no
options besides the host (upon failure).

If set, Git will not do any auto-detection based on the basename of
the configured SSH command. This can be set to `ssh` (OpenSSH
options), `plink`, `putty`, `tortoiseplink`, `simple` (no options
besides the host), or `auto` (the detection with `-G` as described
above). If set to any other value, Git behaves as if this is set to
`ssh`.

(Patches 1, 2, 4, and 5 seem fine to me.)


Re: Fwd: how can I conform if I succeed in sending patch to mailing list

2017-10-11 Thread Jonathan Tan
On Thu, 12 Oct 2017 04:14:18 +0900
小川恭史  wrote:

> Hello, I found a mistake in documents, fixed it, and send patch to mailing 
> list.
> 
> Sending patches by 'git send-email' with Gmail smtp seemed to be
> successful because CC included my email address and I received it.
> However, I never received email from mailing list. Of course I'm
> subscribing mailing list.
> 
> How can I conform if I succeed in sending patch to mailing list?

The easiest way I can think of is to check an online mailing list
archive [1].

I think your patch was received, as you can see in [2].

[1] for example, https://public-inbox.org/git/

[2] https://public-inbox.org/git/?q=aiueogawa217


Re: [PATCH v3 03/10] protocol: introduce protocol extention mechanisms

2017-10-10 Thread Jonathan Tan
On Tue,  3 Oct 2017 13:15:00 -0700
Brandon Williams  wrote:

> +enum protocol_version determine_protocol_version_server(void)
> +{
> + const char *git_protocol = getenv(GIT_PROTOCOL_ENVIRONMENT);
> + enum protocol_version version = protocol_v0;
> +
> + /*
> +  * Determine which protocol version the client has requested.  Since
> +  * multiple 'version' keys can be sent by the client, indicating that
> +  * the client is okay to speak any of them, select the greatest version
> +  * that the client has requested.  This is due to the assumption that
> +  * the most recent protocol version will be the most state-of-the-art.
> +  */
> + if (git_protocol) {
> + struct string_list list = STRING_LIST_INIT_DUP;
> + const struct string_list_item *item;
> + string_list_split(, git_protocol, ':', -1);
> +
> + for_each_string_list_item(item, ) {
> + const char *value;
> + enum protocol_version v;
> +
> + if (skip_prefix(item->string, "version=", )) {

After writing some protocol docs [1], I wonder if this is also too
lenient. The code should probably die if a lone "version" (without the
equals sign) is given.

[1] 
https://public-inbox.org/git/20171010193956.168385-1-jonathanta...@google.com/

> + v = parse_protocol_version(value);
> + if (v > version)
> + version = v;
> + }
> + }
> +
> + string_list_clear(, 0);
> + }
> +
> + return version;
> +}

Also, in your commit title, it is "extension", not "extention".


[PATCH] Documentation: document Extra Parameters

2017-10-10 Thread Jonathan Tan
Document the server support for Extra Parameters, additional information
that the client can send in its first message to the server during a
Git client-server interaction.

Signed-off-by: Jonathan Tan <jonathanta...@google.com>
---
I noticed that Documentation/technical was not updated in this patch
series. It probably should, and here is a suggestion on how to do it.

Also, I'm not sure what to call the extra sendable information. I'm open
to suggestions for a better name than Extra Parameters.
---
 Documentation/technical/http-protocol.txt |  8 ++
 Documentation/technical/pack-protocol.txt | 43 ++-
 2 files changed, 44 insertions(+), 7 deletions(-)

diff --git a/Documentation/technical/http-protocol.txt 
b/Documentation/technical/http-protocol.txt
index 1c561bdd9..a0e45f288 100644
--- a/Documentation/technical/http-protocol.txt
+++ b/Documentation/technical/http-protocol.txt
@@ -219,6 +219,10 @@ smart server reply:
S: 003c2cb58b79488a98d2721cea644875a8dd0026b115 refs/tags/v1.0\n
S: 003fa3c2e2402b99163d1d59756e5f207ae21cccba4c refs/tags/v1.0^{}\n
 
+The client may send Extra Parameters (see
+Documentation/technical/pack-protocol.txt) as a colon-separated string
+in the Git-Protocol HTTP header.
+
 Dumb Server Response
 
 Dumb servers MUST respond with the dumb server reply format.
@@ -269,7 +273,11 @@ the C locale ordering.  The stream SHOULD include the 
default ref
 named `HEAD` as the first ref.  The stream MUST include capability
 declarations behind a NUL on the first ref.
 
+The returned response contains "version 1" if "version=1" was sent as an
+Extra Parameter.
+
   smart_reply =  PKT-LINE("# service=$servicename" LF)
+*1("version 1")
 ref_list
 ""
   ref_list=  empty_list / non_empty_list
diff --git a/Documentation/technical/pack-protocol.txt 
b/Documentation/technical/pack-protocol.txt
index ed1eae8b8..f9ebfb23e 100644
--- a/Documentation/technical/pack-protocol.txt
+++ b/Documentation/technical/pack-protocol.txt
@@ -39,6 +39,19 @@ communicates with that invoked process over the SSH 
connection.
 The file:// transport runs the 'upload-pack' or 'receive-pack'
 process locally and communicates with it over a pipe.
 
+Extra Parameters
+
+
+The protocol provides a mechanism in which clients can send additional
+information in its first message to the server. These are called "Extra
+Parameters", and are supported by the Git, SSH, and HTTP protocols.
+
+Each Extra Parameter takes the form of `=` or ``.
+
+Servers that receive any such Extra Parameters MUST ignore all
+unrecognized keys. Currently, the only Extra Parameter recognized is
+"version=1".
+
 Git Transport
 -
 
@@ -46,18 +59,25 @@ The Git transport starts off by sending the command and 
repository
 on the wire using the pkt-line format, followed by a NUL byte and a
 hostname parameter, terminated by a NUL byte.
 
-   0032git-upload-pack /project.git\0host=myserver.com\0
+   0033git-upload-pack /project.git\0host=myserver.com\0
+
+The transport may send Extra Parameters by adding an additional NUL
+byte, and then adding one or more NUL-terminated strings:
+
+   003egit-upload-pack /project.git\0host=myserver.com\0\0version=1\0
 
 --
-   git-proto-request = request-command SP pathname NUL [ host-parameter NUL ]
+   git-proto-request = request-command SP pathname NUL
+  [ host-parameter NUL [ NUL extra-parameters ] ]
request-command   = "git-upload-pack" / "git-receive-pack" /
   "git-upload-archive"   ; case sensitive
pathname  = *( %x01-ff ) ; exclude NUL
host-parameter= "host=" hostname [ ":" port ]
+   extra-parameters  = 1*extra-parameter
+   extra-parameter   = 1*( %x01-ff ) NUL
 --
 
-Only host-parameter is allowed in the git-proto-request. Clients
-MUST NOT attempt to send additional parameters. It is used for the
+host-parameter is used for the
 git-daemon name based virtual hosting.  See --interpolated-path
 option to git daemon, with the %H/%CH format characters.
 
@@ -117,6 +137,12 @@ we execute it without the leading '/'.
 v
ssh u...@example.com "git-upload-pack '~alice/project.git'"
 
+Depending on the value of the `protocol.version` configuration variable,
+Git may attempt to send Extra Parameters as a colon-separated string in
+the GIT_PROTOCOL environment variable. This is done only if
+the `ssh.variant` configuration variable indicates that the ssh command
+supports passing environment variables as an argument.
+
 A few things to remember here:
 
 - The "command name" is spelled with dash (e.g. git-upload-pack), but
@@ -137,11 +163,13 @@ Reference Discovery
 ---
 
 When the client initially connects the ser

Re: [PATCH v3 07/10] connect: tell server that the client understands v1

2017-10-10 Thread Jonathan Tan
On Tue,  3 Oct 2017 13:15:04 -0700
Brandon Williams  wrote:

> 2. ssh://, file://
>Set 'GIT_PROTOCOL' environment variable with the desired protocol
>version.  With the file:// transport, 'GIT_PROTOCOL' can be set
>explicitly in the locally running git-upload-pack or git-receive-pack
>processes.  With the ssh:// transport and OpenSSH compliant ssh
>programs, 'GIT_PROTOCOL' can be sent across ssh by using '-o
>SendEnv=GIT_PROTOCOL' and having the server whitelist this
>environment variable.

In your commit message, also mention what GIT_PROTOCOL contains
(version=?). (From this commit message alone, I would have expected a
lone integer, but that is not the case.)

Same comment for the commit message of PATCH v3 08/10.


Re: [PATCH v3 05/10] upload-pack, receive-pack: introduce protocol version 1

2017-10-10 Thread Jonathan Tan
On Tue,  3 Oct 2017 13:15:02 -0700
Brandon Williams  wrote:

> + switch (determine_protocol_version_server()) {
> + case protocol_v1:
> + if (advertise_refs || !stateless_rpc)
> + packet_write_fmt(1, "version 1\n");
> + /*
> +  * v1 is just the original protocol with a version string,
> +  * so just fall through after writing the version string.
> +  */

Peff sent out at least one patch [1] that reformats fallthrough comments
to be understood by GCC. It's probably worth doing here too.

In this case, I would do the 2-comment system that Peff suggested:

/*
 * v1 is just the original protocol with a version string,
 * so just fall through after writing the version string.
 */
if (advertise_refs || !stateless_rpc)
packet_write_fmt(1, "version 1\n");
/* fallthrough */

(I put the first comment before the code, so it doesn't look so weird.)

[1] 
https://public-inbox.org/git/20170921062541.ew67gyvrmb2ot...@sigill.intra.peff.net/

> + case protocol_v0:
> + break;
> + default:
> + BUG("unknown protocol version");
> + }


Re: [PATCH v3 04/10] daemon: recognize hidden request arguments

2017-10-10 Thread Jonathan Tan
On Tue,  3 Oct 2017 13:15:01 -0700
Brandon Williams  wrote:

>  /*
>   * Read the host as supplied by the client connection.

The return value is probably worth documenting. Something like "Returns
a pointer to the character *after* the NUL byte terminating the host
argument, or extra_args if there is no host argument."

>   */
> -static void parse_host_arg(struct hostinfo *hi, char *extra_args, int buflen)
> +static char *parse_host_arg(struct hostinfo *hi, char *extra_args, int 
> buflen)

[snip]

> +static void parse_extra_args(struct hostinfo *hi, struct argv_array *env,
> +  char *extra_args, int buflen)
> +{
> + const char *end = extra_args + buflen;
> + struct strbuf git_protocol = STRBUF_INIT;
> +
> + /* First look for the host argument */
> + extra_args = parse_host_arg(hi, extra_args, buflen);

This works, but is a bit loose in what it accepts. I think it's better
to be tighter - in particular, if there is no host argument, we
shouldn't be looking for extra args.

> +
> + /* Look for additional arguments places after a second NUL byte */
> + for (; extra_args < end; extra_args += strlen(extra_args) + 1) {

Assuming that the host argument exists, this for loop should start at
extra_args + 1 to skip the second NUL byte. This currently works
because this code is lenient towards empty strings.

> + const char *arg = extra_args;
> +
> + /*
> +  * Parse the extra arguments, adding most to 'git_protocol'
> +  * which will be used to set the 'GIT_PROTOCOL' envvar in the
> +  * service that will be run.
> +  *
> +  * If there ends up being a particular arg in the future that
> +  * git-daemon needs to parse specificly (like the 'host' arg)
> +  * then it can be parsed here and not added to 'git_protocol'.
> +  */
> + if (*arg) {
> + if (git_protocol.len > 0)
> + strbuf_addch(_protocol, ':');
> + strbuf_addstr(_protocol, arg);
> + }
> + }

But I think we shouldn't be lenient towards empty strings.

> +
> + if (git_protocol.len > 0)
> + argv_array_pushf(env, GIT_PROTOCOL_ENVIRONMENT "=%s",
> +  git_protocol.buf);
> + strbuf_release(_protocol);
>  }


<    1   2   3   4   5   6   7   8   9   10   >