[PATCH v3 0/3] Add commit-graph fuzzer and fix buffer overflow

2018-12-07 Thread Josh Steadmon
Ad a new fuzz test for the commit graph and fix a buffer read-overflow
that it discovered. Additionally, fix the Makefile instructions for
building fuzzers.

Changes since V2:
  * Avoid pointer arithmetic overflow when checking the graph's chunk
count.
  * Merge the corrupt_graph_and_verify and
corrupt_and_zero_graph_then_verify test functions.

Josh Steadmon (3):
  commit-graph, fuzz: Add fuzzer for commit-graph
  commit-graph: fix buffer read-overflow
  Makefile: correct example fuzz build

 .gitignore  |  1 +
 Makefile|  3 +-
 commit-graph.c  | 67 +
 commit-graph.h  |  3 ++
 fuzz-commit-graph.c | 16 ++
 t/t5318-commit-graph.sh | 15 +++--
 6 files changed, 83 insertions(+), 22 deletions(-)
 create mode 100644 fuzz-commit-graph.c

Range-diff against v2:
1:  af45c2337f ! 1:  675d58ecea commit-graph: fix buffer read-overflow
@@ -22,8 +22,8 @@
 +  uint64_t chunk_offset;
int chunk_repeated = 0;
  
-+  if (chunk_lookup + GRAPH_CHUNKLOOKUP_WIDTH >
-+  data + graph_size) {
++  if (data + graph_size - chunk_lookup <
++  GRAPH_CHUNKLOOKUP_WIDTH) {
 +  error(_("chunk lookup table entry missing; graph file 
may be incomplete"));
 +  free(graph);
 +  return NULL;
@@ -40,31 +40,34 @@
  --- a/t/t5318-commit-graph.sh
  +++ b/t/t5318-commit-graph.sh
 @@
-   test_i18ngrep "$grepstr" err
- }
+ GRAPH_BYTE_OCTOPUS=$(($GRAPH_OCTOPUS_DATA_OFFSET + 4))
+ GRAPH_BYTE_FOOTER=$(($GRAPH_OCTOPUS_DATA_OFFSET + 4 * $NUM_OCTOPUS_EDGES))
  
-+
-+# usage: corrupt_and_zero_graph_then_verify   
 
-+# Manipulates the commit-graph file at  by inserting 
the data,
-+# then zeros the file starting at . Finally, runs
-+# 'git commit-graph verify' and places the output in the file 'err'. 
Tests 'err'
-+# for the given string.
-+corrupt_and_zero_graph_then_verify() {
-+  corrupt_pos=$1
-+  data="${2:-\0}"
-+  zero_pos=$3
-+  grepstr=$4
+-# usage: corrupt_graph_and_verify   
++# usage: corrupt_graph_and_verify[]
+ # Manipulates the commit-graph file at the position
+-# by inserting the data, then runs 'git commit-graph verify'
++# by inserting the data, optionally zeroing the file
++# starting at , then runs 'git commit-graph verify'
+ # and places the output in the file 'err'. Test 'err' for
+ # the given string.
+ corrupt_graph_and_verify() {
+   pos=$1
+   data="${2:-\0}"
+   grepstr=$3
 +  orig_size=$(stat --format=%s $objdir/info/commit-graph)
-+  cd "$TRASH_DIRECTORY/full" &&
-+  test_when_finished mv commit-graph-backup $objdir/info/commit-graph &&
-+  cp $objdir/info/commit-graph commit-graph-backup &&
-+  printf "$data" | dd of="$objdir/info/commit-graph" bs=1 
seek="$corrupt_pos" conv=notrunc &&
++  zero_pos=${4:-${orig_size}}
+   cd "$TRASH_DIRECTORY/full" &&
+   test_when_finished mv commit-graph-backup $objdir/info/commit-graph &&
+   cp $objdir/info/commit-graph commit-graph-backup &&
+   printf "$data" | dd of="$objdir/info/commit-graph" bs=1 seek="$pos" 
conv=notrunc &&
 +  truncate --size=$zero_pos $objdir/info/commit-graph &&
 +  truncate --size=$orig_size $objdir/info/commit-graph &&
-+  test_must_fail git commit-graph verify 2>test_err &&
-+  grep -v "^+" test_err >err &&
-+  test_i18ngrep "$grepstr" err
-+}
+   test_must_fail git commit-graph verify 2>test_err &&
+   grep -v "^+" test_err >err
+   test_i18ngrep "$grepstr" err
+ }
+ 
 +
  test_expect_success 'detect bad signature' '
corrupt_graph_and_verify 0 "\0" \
@@ -73,9 +76,9 @@
"incorrect checksum"
  '
  
-+test_expect_success 'detect truncated graph' '
-+  corrupt_and_zero_graph_then_verify $GRAPH_BYTE_CHUNK_COUNT "\xff" \
-+  $GRAPH_CHUNK_LOOKUP_OFFSET "chunk lookup table entry missing"
++test_expect_success 'detect incorrect chunk count' '
++  corrupt_graph_and_verify $GRAPH_BYTE_CHUNK_COUNT "\xff" \
++  "chunk lookup table entry missing" $GRAPH_CHUNK_LOOKUP_OFFSET
 +'
 +
  test_expect_success 'git fsck (checks commit-graph)' '
2:  7519fc76df = 2:  06a32bfe8b Makefile: correct example fuzz build
-- 
2.20.0.rc2.12.g4c11c11dec



[PATCH v3 1/3] commit-graph, fuzz: Add fuzzer for commit-graph

2018-12-07 Thread Josh Steadmon
Breaks load_commit_graph_one() into a new function,
parse_commit_graph(). The latter function operates on arbitrary buffers,
which makes it suitable as a fuzzing target. Since parse_commit_graph()
is only called by load_commit_graph_one() (and the fuzzer described
below), we omit error messages that would be duplicated by the caller.

Adds fuzz-commit-graph.c, which provides a fuzzing entry point
compatible with libFuzzer (and possibly other fuzzing engines).

Signed-off-by: Josh Steadmon 
---
 .gitignore  |  1 +
 Makefile|  1 +
 commit-graph.c  | 53 ++---
 commit-graph.h  |  3 +++
 fuzz-commit-graph.c | 16 ++
 5 files changed, 57 insertions(+), 17 deletions(-)
 create mode 100644 fuzz-commit-graph.c

diff --git a/.gitignore b/.gitignore
index 0d77ea5894..8bcf153ed9 100644
--- a/.gitignore
+++ b/.gitignore
@@ -1,3 +1,4 @@
+/fuzz-commit-graph
 /fuzz_corpora
 /fuzz-pack-headers
 /fuzz-pack-idx
diff --git a/Makefile b/Makefile
index 1a44c811aa..6b72f37c29 100644
--- a/Makefile
+++ b/Makefile
@@ -684,6 +684,7 @@ SCRIPTS = $(SCRIPT_SH_INS) \
 
 ETAGS_TARGET = TAGS
 
+FUZZ_OBJS += fuzz-commit-graph.o
 FUZZ_OBJS += fuzz-pack-headers.o
 FUZZ_OBJS += fuzz-pack-idx.o
 
diff --git a/commit-graph.c b/commit-graph.c
index 40c855f185..07dd410f3c 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -84,16 +84,10 @@ static int commit_graph_compatible(struct repository *r)
 struct commit_graph *load_commit_graph_one(const char *graph_file)
 {
void *graph_map;
-   const unsigned char *data, *chunk_lookup;
size_t graph_size;
struct stat st;
-   uint32_t i;
-   struct commit_graph *graph;
+   struct commit_graph *ret;
int fd = git_open(graph_file);
-   uint64_t last_chunk_offset;
-   uint32_t last_chunk_id;
-   uint32_t graph_signature;
-   unsigned char graph_version, hash_version;
 
if (fd < 0)
return NULL;
@@ -108,27 +102,55 @@ struct commit_graph *load_commit_graph_one(const char 
*graph_file)
die(_("graph file %s is too small"), graph_file);
}
graph_map = xmmap(NULL, graph_size, PROT_READ, MAP_PRIVATE, fd, 0);
+   ret = parse_commit_graph(graph_map, fd, graph_size);
+
+   if (!ret) {
+   munmap(graph_map, graph_size);
+   close(fd);
+   exit(1);
+   }
+
+   return ret;
+}
+
+struct commit_graph *parse_commit_graph(void *graph_map, int fd,
+   size_t graph_size)
+{
+   const unsigned char *data, *chunk_lookup;
+   uint32_t i;
+   struct commit_graph *graph;
+   uint64_t last_chunk_offset;
+   uint32_t last_chunk_id;
+   uint32_t graph_signature;
+   unsigned char graph_version, hash_version;
+
+   if (!graph_map)
+   return NULL;
+
+   if (graph_size < GRAPH_MIN_SIZE)
+   return NULL;
+
data = (const unsigned char *)graph_map;
 
graph_signature = get_be32(data);
if (graph_signature != GRAPH_SIGNATURE) {
error(_("graph signature %X does not match signature %X"),
  graph_signature, GRAPH_SIGNATURE);
-   goto cleanup_fail;
+   return NULL;
}
 
graph_version = *(unsigned char*)(data + 4);
if (graph_version != GRAPH_VERSION) {
error(_("graph version %X does not match version %X"),
  graph_version, GRAPH_VERSION);
-   goto cleanup_fail;
+   return NULL;
}
 
hash_version = *(unsigned char*)(data + 5);
if (hash_version != GRAPH_OID_VERSION) {
error(_("hash version %X does not match version %X"),
  hash_version, GRAPH_OID_VERSION);
-   goto cleanup_fail;
+   return NULL;
}
 
graph = alloc_commit_graph();
@@ -152,7 +174,8 @@ struct commit_graph *load_commit_graph_one(const char 
*graph_file)
if (chunk_offset > graph_size - GIT_MAX_RAWSZ) {
error(_("improper chunk offset %08x%08x"), 
(uint32_t)(chunk_offset >> 32),
  (uint32_t)chunk_offset);
-   goto cleanup_fail;
+   free(graph);
+   return NULL;
}
 
switch (chunk_id) {
@@ -187,7 +210,8 @@ struct commit_graph *load_commit_graph_one(const char 
*graph_file)
 
if (chunk_repeated) {
error(_("chunk id %08x appears multiple times"), 
chunk_id);
-   goto cleanup_fail;
+   free(graph);
+   return NULL;
}
 
if (last_chunk_id == GRAPH_CHUNKID_OIDLOOKUP)
@@ -201,11 +225,6 @@ struct commit_graph *load_commit_graph_one

[PATCH v3 2/3] commit-graph: fix buffer read-overflow

2018-12-07 Thread Josh Steadmon
fuzz-commit-graph identified a case where Git will read past the end of
a buffer containing a commit graph if the graph's header has an
incorrect chunk count. A simple bounds check in parse_commit_graph()
prevents this.

Signed-off-by: Josh Steadmon 
---
 commit-graph.c  | 14 --
 t/t5318-commit-graph.sh | 15 +--
 2 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 07dd410f3c..836d65a1d3 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -165,10 +165,20 @@ struct commit_graph *parse_commit_graph(void *graph_map, 
int fd,
last_chunk_offset = 8;
chunk_lookup = data + 8;
for (i = 0; i < graph->num_chunks; i++) {
-   uint32_t chunk_id = get_be32(chunk_lookup + 0);
-   uint64_t chunk_offset = get_be64(chunk_lookup + 4);
+   uint32_t chunk_id;
+   uint64_t chunk_offset;
int chunk_repeated = 0;
 
+   if (data + graph_size - chunk_lookup <
+   GRAPH_CHUNKLOOKUP_WIDTH) {
+   error(_("chunk lookup table entry missing; graph file 
may be incomplete"));
+   free(graph);
+   return NULL;
+   }
+
+   chunk_id = get_be32(chunk_lookup + 0);
+   chunk_offset = get_be64(chunk_lookup + 4);
+
chunk_lookup += GRAPH_CHUNKLOOKUP_WIDTH;
 
if (chunk_offset > graph_size - GIT_MAX_RAWSZ) {
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 5fe21db99f..5b6b44b78e 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -366,24 +366,30 @@ GRAPH_OCTOPUS_DATA_OFFSET=$(($GRAPH_COMMIT_DATA_OFFSET + \
 GRAPH_BYTE_OCTOPUS=$(($GRAPH_OCTOPUS_DATA_OFFSET + 4))
 GRAPH_BYTE_FOOTER=$(($GRAPH_OCTOPUS_DATA_OFFSET + 4 * $NUM_OCTOPUS_EDGES))
 
-# usage: corrupt_graph_and_verify   
+# usage: corrupt_graph_and_verify[]
 # Manipulates the commit-graph file at the position
-# by inserting the data, then runs 'git commit-graph verify'
+# by inserting the data, optionally zeroing the file
+# starting at , then runs 'git commit-graph verify'
 # and places the output in the file 'err'. Test 'err' for
 # the given string.
 corrupt_graph_and_verify() {
pos=$1
data="${2:-\0}"
grepstr=$3
+   orig_size=$(stat --format=%s $objdir/info/commit-graph)
+   zero_pos=${4:-${orig_size}}
cd "$TRASH_DIRECTORY/full" &&
test_when_finished mv commit-graph-backup $objdir/info/commit-graph &&
cp $objdir/info/commit-graph commit-graph-backup &&
printf "$data" | dd of="$objdir/info/commit-graph" bs=1 seek="$pos" 
conv=notrunc &&
+   truncate --size=$zero_pos $objdir/info/commit-graph &&
+   truncate --size=$orig_size $objdir/info/commit-graph &&
test_must_fail git commit-graph verify 2>test_err &&
grep -v "^+" test_err >err
test_i18ngrep "$grepstr" err
 }
 
+
 test_expect_success 'detect bad signature' '
corrupt_graph_and_verify 0 "\0" \
"graph signature"
@@ -484,6 +490,11 @@ test_expect_success 'detect invalid checksum hash' '
"incorrect checksum"
 '
 
+test_expect_success 'detect incorrect chunk count' '
+   corrupt_graph_and_verify $GRAPH_BYTE_CHUNK_COUNT "\xff" \
+   "chunk lookup table entry missing" $GRAPH_CHUNK_LOOKUP_OFFSET
+'
+
 test_expect_success 'git fsck (checks commit-graph)' '
cd "$TRASH_DIRECTORY/full" &&
git fsck &&
-- 
2.20.0.rc2.12.g4c11c11dec



[PATCH v3 3/3] Makefile: correct example fuzz build

2018-12-07 Thread Josh Steadmon
Signed-off-by: Josh Steadmon 
---
 Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index 6b72f37c29..bbcfc2bc9f 100644
--- a/Makefile
+++ b/Makefile
@@ -3104,7 +3104,7 @@ cover_db_html: cover_db
 # An example command to build against libFuzzer from LLVM 4.0.0:
 #
 # make CC=clang CXX=clang++ \
-#  FUZZ_CXXFLAGS="-fsanitize-coverage=trace-pc-guard -fsanitize=address" \
+#  CFLAGS="-fsanitize-coverage=trace-pc-guard -fsanitize=address" \
 #  LIB_FUZZING_ENGINE=/usr/lib/llvm-4.0/lib/libFuzzer.a \
 #  fuzz-all
 #
-- 
2.20.0.rc2.12.g4c11c11dec



Re: [PATCHv2 0/9] Resending sb/submodule-recursive-fetch-gets-the-tip

2018-12-06 Thread Josh Steadmon
On 2018.11.28 16:27, Stefan Beller wrote:
> This is a resend of sb/submodule-recursive-fetch-gets-the-tip,
> with all feedback addressed. As it took some time, I'll send it
> without range-diff, but would ask for full review.
> 
> I plan on resending after the next release as this got delayed quite a bit,
> which is why I also rebased it to master.
> 
> Thanks,
> Stefan

I am not very familiar with most of the submodule code, but for what
it's worth, this entire series looks good to me. I'll note that most of
the commits caused some style complaints, but I'll leave it up to your
judgement as to whether they're valid or not.

Reviewed-by: Josh Steadmon 


[PATCH v2 2/3] commit-graph: fix buffer read-overflow

2018-12-06 Thread Josh Steadmon
fuzz-commit-graph identified a case where Git will read past the end of
a buffer containing a commit graph if the graph's header has an
incorrect chunk count. A simple bounds check in parse_commit_graph()
prevents this.

Signed-off-by: Josh Steadmon 
---
 commit-graph.c  | 14 --
 t/t5318-commit-graph.sh | 28 
 2 files changed, 40 insertions(+), 2 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 07dd410f3c..224a5f161e 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -165,10 +165,20 @@ struct commit_graph *parse_commit_graph(void *graph_map, 
int fd,
last_chunk_offset = 8;
chunk_lookup = data + 8;
for (i = 0; i < graph->num_chunks; i++) {
-   uint32_t chunk_id = get_be32(chunk_lookup + 0);
-   uint64_t chunk_offset = get_be64(chunk_lookup + 4);
+   uint32_t chunk_id;
+   uint64_t chunk_offset;
int chunk_repeated = 0;
 
+   if (chunk_lookup + GRAPH_CHUNKLOOKUP_WIDTH >
+   data + graph_size) {
+   error(_("chunk lookup table entry missing; graph file 
may be incomplete"));
+   free(graph);
+   return NULL;
+   }
+
+   chunk_id = get_be32(chunk_lookup + 0);
+   chunk_offset = get_be64(chunk_lookup + 4);
+
chunk_lookup += GRAPH_CHUNKLOOKUP_WIDTH;
 
if (chunk_offset > graph_size - GIT_MAX_RAWSZ) {
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 5fe21db99f..2503cb0345 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -384,6 +384,29 @@ corrupt_graph_and_verify() {
test_i18ngrep "$grepstr" err
 }
 
+
+# usage: corrupt_and_zero_graph_then_verify   
 
+# Manipulates the commit-graph file at  by inserting the 
data,
+# then zeros the file starting at . Finally, runs
+# 'git commit-graph verify' and places the output in the file 'err'. Tests 
'err'
+# for the given string.
+corrupt_and_zero_graph_then_verify() {
+   corrupt_pos=$1
+   data="${2:-\0}"
+   zero_pos=$3
+   grepstr=$4
+   orig_size=$(stat --format=%s $objdir/info/commit-graph)
+   cd "$TRASH_DIRECTORY/full" &&
+   test_when_finished mv commit-graph-backup $objdir/info/commit-graph &&
+   cp $objdir/info/commit-graph commit-graph-backup &&
+   printf "$data" | dd of="$objdir/info/commit-graph" bs=1 
seek="$corrupt_pos" conv=notrunc &&
+   truncate --size=$zero_pos $objdir/info/commit-graph &&
+   truncate --size=$orig_size $objdir/info/commit-graph &&
+   test_must_fail git commit-graph verify 2>test_err &&
+   grep -v "^+" test_err >err &&
+   test_i18ngrep "$grepstr" err
+}
+
 test_expect_success 'detect bad signature' '
corrupt_graph_and_verify 0 "\0" \
"graph signature"
@@ -484,6 +507,11 @@ test_expect_success 'detect invalid checksum hash' '
"incorrect checksum"
 '
 
+test_expect_success 'detect truncated graph' '
+   corrupt_and_zero_graph_then_verify $GRAPH_BYTE_CHUNK_COUNT "\xff" \
+   $GRAPH_CHUNK_LOOKUP_OFFSET "chunk lookup table entry missing"
+'
+
 test_expect_success 'git fsck (checks commit-graph)' '
cd "$TRASH_DIRECTORY/full" &&
git fsck &&
-- 
2.20.0.rc2.10.g7519fc76df



[PATCH v2 3/3] Makefile: correct example fuzz build

2018-12-06 Thread Josh Steadmon
Signed-off-by: Josh Steadmon 
---
 Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index 6b72f37c29..bbcfc2bc9f 100644
--- a/Makefile
+++ b/Makefile
@@ -3104,7 +3104,7 @@ cover_db_html: cover_db
 # An example command to build against libFuzzer from LLVM 4.0.0:
 #
 # make CC=clang CXX=clang++ \
-#  FUZZ_CXXFLAGS="-fsanitize-coverage=trace-pc-guard -fsanitize=address" \
+#  CFLAGS="-fsanitize-coverage=trace-pc-guard -fsanitize=address" \
 #  LIB_FUZZING_ENGINE=/usr/lib/llvm-4.0/lib/libFuzzer.a \
 #  fuzz-all
 #
-- 
2.20.0.rc2.10.g7519fc76df



[PATCH v2 1/3] commit-graph, fuzz: Add fuzzer for commit-graph

2018-12-06 Thread Josh Steadmon
Breaks load_commit_graph_one() into a new function,
parse_commit_graph(). The latter function operates on arbitrary buffers,
which makes it suitable as a fuzzing target. Since parse_commit_graph()
is only called by load_commit_graph_one() (and the fuzzer described
below), we omit error messages that would be duplicated by the caller.

Adds fuzz-commit-graph.c, which provides a fuzzing entry point
compatible with libFuzzer (and possibly other fuzzing engines).

Signed-off-by: Josh Steadmon 
---
 .gitignore  |  1 +
 Makefile|  1 +
 commit-graph.c  | 53 ++---
 commit-graph.h  |  3 +++
 fuzz-commit-graph.c | 16 ++
 5 files changed, 57 insertions(+), 17 deletions(-)
 create mode 100644 fuzz-commit-graph.c

diff --git a/.gitignore b/.gitignore
index 0d77ea5894..8bcf153ed9 100644
--- a/.gitignore
+++ b/.gitignore
@@ -1,3 +1,4 @@
+/fuzz-commit-graph
 /fuzz_corpora
 /fuzz-pack-headers
 /fuzz-pack-idx
diff --git a/Makefile b/Makefile
index 1a44c811aa..6b72f37c29 100644
--- a/Makefile
+++ b/Makefile
@@ -684,6 +684,7 @@ SCRIPTS = $(SCRIPT_SH_INS) \
 
 ETAGS_TARGET = TAGS
 
+FUZZ_OBJS += fuzz-commit-graph.o
 FUZZ_OBJS += fuzz-pack-headers.o
 FUZZ_OBJS += fuzz-pack-idx.o
 
diff --git a/commit-graph.c b/commit-graph.c
index 40c855f185..07dd410f3c 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -84,16 +84,10 @@ static int commit_graph_compatible(struct repository *r)
 struct commit_graph *load_commit_graph_one(const char *graph_file)
 {
void *graph_map;
-   const unsigned char *data, *chunk_lookup;
size_t graph_size;
struct stat st;
-   uint32_t i;
-   struct commit_graph *graph;
+   struct commit_graph *ret;
int fd = git_open(graph_file);
-   uint64_t last_chunk_offset;
-   uint32_t last_chunk_id;
-   uint32_t graph_signature;
-   unsigned char graph_version, hash_version;
 
if (fd < 0)
return NULL;
@@ -108,27 +102,55 @@ struct commit_graph *load_commit_graph_one(const char 
*graph_file)
die(_("graph file %s is too small"), graph_file);
}
graph_map = xmmap(NULL, graph_size, PROT_READ, MAP_PRIVATE, fd, 0);
+   ret = parse_commit_graph(graph_map, fd, graph_size);
+
+   if (!ret) {
+   munmap(graph_map, graph_size);
+   close(fd);
+   exit(1);
+   }
+
+   return ret;
+}
+
+struct commit_graph *parse_commit_graph(void *graph_map, int fd,
+   size_t graph_size)
+{
+   const unsigned char *data, *chunk_lookup;
+   uint32_t i;
+   struct commit_graph *graph;
+   uint64_t last_chunk_offset;
+   uint32_t last_chunk_id;
+   uint32_t graph_signature;
+   unsigned char graph_version, hash_version;
+
+   if (!graph_map)
+   return NULL;
+
+   if (graph_size < GRAPH_MIN_SIZE)
+   return NULL;
+
data = (const unsigned char *)graph_map;
 
graph_signature = get_be32(data);
if (graph_signature != GRAPH_SIGNATURE) {
error(_("graph signature %X does not match signature %X"),
  graph_signature, GRAPH_SIGNATURE);
-   goto cleanup_fail;
+   return NULL;
}
 
graph_version = *(unsigned char*)(data + 4);
if (graph_version != GRAPH_VERSION) {
error(_("graph version %X does not match version %X"),
  graph_version, GRAPH_VERSION);
-   goto cleanup_fail;
+   return NULL;
}
 
hash_version = *(unsigned char*)(data + 5);
if (hash_version != GRAPH_OID_VERSION) {
error(_("hash version %X does not match version %X"),
  hash_version, GRAPH_OID_VERSION);
-   goto cleanup_fail;
+   return NULL;
}
 
graph = alloc_commit_graph();
@@ -152,7 +174,8 @@ struct commit_graph *load_commit_graph_one(const char 
*graph_file)
if (chunk_offset > graph_size - GIT_MAX_RAWSZ) {
error(_("improper chunk offset %08x%08x"), 
(uint32_t)(chunk_offset >> 32),
  (uint32_t)chunk_offset);
-   goto cleanup_fail;
+   free(graph);
+   return NULL;
}
 
switch (chunk_id) {
@@ -187,7 +210,8 @@ struct commit_graph *load_commit_graph_one(const char 
*graph_file)
 
if (chunk_repeated) {
error(_("chunk id %08x appears multiple times"), 
chunk_id);
-   goto cleanup_fail;
+   free(graph);
+   return NULL;
}
 
if (last_chunk_id == GRAPH_CHUNKID_OIDLOOKUP)
@@ -201,11 +225,6 @@ struct commit_graph *load_commit_graph_one

[PATCH v2 0/3] Add commit-graph fuzzer and fix buffer overflow

2018-12-06 Thread Josh Steadmon
Add a new fuzz test for the commit graph and fix a buffer read-overflow
that it discovered. Additionally, fix the Makefile instructions for
building fuzzers.

Changes since V1:
  * Moved the parse_commit_graph() declaration to the header file, since
we don't mind if others use it.
  * Moved some unnecessary comments into commit messages.
  * Fixed some style issues.
  * Added a test case for detecting commit graphs with missing chunk
lookup entries.
  * Ævar's comments on the Makefile made me realize the fuzzer build
instructions were using the wrong variable. Added a new commit to
fix this.

Josh Steadmon (3):
  commit-graph, fuzz: Add fuzzer for commit-graph
  commit-graph: fix buffer read-overflow
  Makefile: correct example fuzz build

 .gitignore  |  1 +
 Makefile|  3 +-
 commit-graph.c  | 67 +
 commit-graph.h  |  3 ++
 fuzz-commit-graph.c | 16 ++
 t/t5318-commit-graph.sh | 28 +
 6 files changed, 98 insertions(+), 20 deletions(-)
 create mode 100644 fuzz-commit-graph.c

Range-diff against v1:
1:  53e62baaa8 ! 1:  0b57ecbe1b commit-graph, fuzz: Add fuzzer for commit-graph
@@ -4,7 +4,9 @@
 
 Breaks load_commit_graph_one() into a new function,
 parse_commit_graph(). The latter function operates on arbitrary 
buffers,
-which makes it suitable as a fuzzing target.
+which makes it suitable as a fuzzing target. Since parse_commit_graph()
+is only called by load_commit_graph_one() (and the fuzzer described
+below), we omit error messages that would be duplicated by the caller.
 
 Adds fuzz-commit-graph.c, which provides a fuzzing entry point
 compatible with libFuzzer (and possibly other fuzzing engines).
@@ -35,17 +37,6 @@
  diff --git a/commit-graph.c b/commit-graph.c
  --- a/commit-graph.c
  +++ b/commit-graph.c
-@@
- #define GRAPH_MIN_SIZE (GRAPH_HEADER_SIZE + 4 * GRAPH_CHUNKLOOKUP_WIDTH \
-   + GRAPH_FANOUT_SIZE + GRAPH_OID_LEN)
- 
-+struct commit_graph *parse_commit_graph(void *graph_map, int fd,
-+  size_t graph_size);
-+
-+
- char *get_commit_graph_filename(const char *obj_dir)
- {
-   return xstrfmt("%s/info/commit-graph", obj_dir);
 @@
  struct commit_graph *load_commit_graph_one(const char *graph_file)
  {
@@ -70,7 +61,7 @@
graph_map = xmmap(NULL, graph_size, PROT_READ, MAP_PRIVATE, fd, 0);
 +  ret = parse_commit_graph(graph_map, fd, graph_size);
 +
-+  if (ret == NULL) {
++  if (!ret) {
 +  munmap(graph_map, graph_size);
 +  close(fd);
 +  exit(1);
@@ -79,10 +70,6 @@
 +  return ret;
 +}
 +
-+/*
-+ * This function is intended to be used only from load_commit_graph_one() 
or in
-+ * fuzz tests.
-+ */
 +struct commit_graph *parse_commit_graph(void *graph_map, int fd,
 +  size_t graph_size)
 +{
@@ -94,11 +81,9 @@
 +  uint32_t graph_signature;
 +  unsigned char graph_version, hash_version;
 +
-+  /*
-+   * This should already be checked in load_commit_graph_one, but we still
-+   * need a check here for when we're calling parse_commit_graph directly
-+   * from fuzz tests. We can omit the error message in that case.
-+   */
++  if (!graph_map)
++  return NULL;
++
 +  if (graph_size < GRAPH_MIN_SIZE)
 +  return NULL;
 +
@@ -162,12 +147,25 @@
  
  static void prepare_commit_graph_one(struct repository *r, const char 
*obj_dir)
 
+ diff --git a/commit-graph.h b/commit-graph.h
+ --- a/commit-graph.h
+ +++ b/commit-graph.h
+@@
+ 
+ struct commit_graph *load_commit_graph_one(const char *graph_file);
+ 
++struct commit_graph *parse_commit_graph(void *graph_map, int fd,
++  size_t graph_size);
++
+ /*
+  * Return 1 if and only if the repository has a commit-graph
+  * file and generation numbers are computed in that file.
+
  diff --git a/fuzz-commit-graph.c b/fuzz-commit-graph.c
  new file mode 100644
  --- /dev/null
  +++ b/fuzz-commit-graph.c
 @@
-+#include "object-store.h"
 +#include "commit-graph.h"
 +
 +struct commit_graph *parse_commit_graph(void *graph_map, int fd,
@@ -179,9 +177,8 @@
 +{
 +  struct commit_graph *g;
 +
-+  g = parse_commit_graph((void *) data, -1, size);
-+  if (g)
-+  free(g);
++  g = parse_commit_graph((void *)data, -1, size);
++  free(g);
 +
 +  return 0;
 +}
2:  ad2e761f44 ! 2:  af45c2337f commit-graph: fix buffer read-overflow
@@ -22,7 +22,8 @@
 +  uint64_t chunk_offset;
 

Re: [PATCH 1/2] commit-graph, fuzz: Add fuzzer for commit-graph

2018-12-05 Thread Josh Steadmon
On 2018.12.05 23:48, Ævar Arnfjörð Bjarmason wrote:
> 
> On Wed, Dec 05 2018, Josh Steadmon wrote:
> 
> > Breaks load_commit_graph_one() into a new function,
> > parse_commit_graph(). The latter function operates on arbitrary buffers,
> > which makes it suitable as a fuzzing target.
> >
> > Adds fuzz-commit-graph.c, which provides a fuzzing entry point
> > compatible with libFuzzer (and possibly other fuzzing engines).
> >
> > Signed-off-by: Josh Steadmon 
> > ---
> >  .gitignore  |  1 +
> >  Makefile|  1 +
> >  commit-graph.c  | 63 +
> >  fuzz-commit-graph.c | 18 +
> >  4 files changed, 66 insertions(+), 17 deletions(-)
> >  create mode 100644 fuzz-commit-graph.c
> >
> > diff --git a/.gitignore b/.gitignore
> > index 0d77ea5894..8bcf153ed9 100644
> > --- a/.gitignore
> > +++ b/.gitignore
> > @@ -1,3 +1,4 @@
> > +/fuzz-commit-graph
> >  /fuzz_corpora
> >  /fuzz-pack-headers
> >  /fuzz-pack-idx
> > diff --git a/Makefile b/Makefile
> > index 1a44c811aa..6b72f37c29 100644
> > --- a/Makefile
> > +++ b/Makefile
> > @@ -684,6 +684,7 @@ SCRIPTS = $(SCRIPT_SH_INS) \
> >
> >  ETAGS_TARGET = TAGS
> >
> > +FUZZ_OBJS += fuzz-commit-graph.o
> >  FUZZ_OBJS += fuzz-pack-headers.o
> >  FUZZ_OBJS += fuzz-pack-idx.o
> >
> > diff --git a/commit-graph.c b/commit-graph.c
> > index 40c855f185..0755359b1a 100644
> > --- a/commit-graph.c
> > +++ b/commit-graph.c
> > @@ -46,6 +46,10 @@
> >  #define GRAPH_MIN_SIZE (GRAPH_HEADER_SIZE + 4 * GRAPH_CHUNKLOOKUP_WIDTH \
> > + GRAPH_FANOUT_SIZE + GRAPH_OID_LEN)
> >
> > +struct commit_graph *parse_commit_graph(void *graph_map, int fd,
> > +   size_t graph_size);
> > +
> > +
> >  char *get_commit_graph_filename(const char *obj_dir)
> >  {
> > return xstrfmt("%s/info/commit-graph", obj_dir);
> > @@ -84,16 +88,10 @@ static int commit_graph_compatible(struct repository *r)
> >  struct commit_graph *load_commit_graph_one(const char *graph_file)
> >  {
> > void *graph_map;
> > -   const unsigned char *data, *chunk_lookup;
> > size_t graph_size;
> > struct stat st;
> > -   uint32_t i;
> > -   struct commit_graph *graph;
> > +   struct commit_graph *ret;
> > int fd = git_open(graph_file);
> > -   uint64_t last_chunk_offset;
> > -   uint32_t last_chunk_id;
> > -   uint32_t graph_signature;
> > -   unsigned char graph_version, hash_version;
> >
> > if (fd < 0)
> > return NULL;
> > @@ -108,27 +106,61 @@ struct commit_graph *load_commit_graph_one(const char 
> > *graph_file)
> > die(_("graph file %s is too small"), graph_file);
> > }
> > graph_map = xmmap(NULL, graph_size, PROT_READ, MAP_PRIVATE, fd, 0);
> > +   ret = parse_commit_graph(graph_map, fd, graph_size);
> > +
> > +   if (ret == NULL) {
> 
> Code in git usually uses just !ret.

Will fix in V2, thanks.


> > +   munmap(graph_map, graph_size);
> > +   close(fd);
> > +   exit(1);
> 
> Ouch, exit(1) from load_commit_graph_one()? Can't we return NULL here
> instead? Nasty to exit from a library routine, but then I see later...
> 
> > @@ -201,11 +235,6 @@ struct commit_graph *load_commit_graph_one(const char 
> > *graph_file)
> > }
> >
> > return graph;
> > -
> > -cleanup_fail:
> > -   munmap(graph_map, graph_size);
> > -   close(fd);
> > -   exit(1);
> >  }
> 
> ... ah, I see this is where you got the exit(1) from. So it was there
> already.
> 
> >  static void prepare_commit_graph_one(struct repository *r, const char 
> > *obj_dir)
> > diff --git a/fuzz-commit-graph.c b/fuzz-commit-graph.c
> > new file mode 100644
> > index 00..420851d0d2
> > --- /dev/null
> > +++ b/fuzz-commit-graph.c
> > @@ -0,0 +1,18 @@
> > +#include "object-store.h"
> > +#include "commit-graph.h"
> > +
> > +struct commit_graph *parse_commit_graph(void *graph_map, int fd,
> > +   size_t graph_size);
> > +
> > +int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size);
> > +
> > +int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size)
> > +{
> > +   struct commit_graph *g;
> > +
> > +   g = parse_commit_graph((void *) data, -1, size);
> > +   if (g)
> > +   free(g);
> > +
> > +   return 0;
> > +}
> 
> I hadn't looked at this before, but see your 5e47215080 ("fuzz: add
> basic fuzz testing target.", 2018-10-12) for some prior art.
> 
> There's instructions there for a very long "make" invocation. Would be
> nice if this were friendlier and we could just do "make test-fuzz" or
> something...

Yeah, the problem is that there are too many combinations of fuzzing
engine, sanitizer, and compiler to make any reasonable default here.
Even if you just stick with libFuzzer, address sanitizer, and clang, the
flags change radically depending on which version of clang you're using.


[PATCH 2/2] commit-graph: fix buffer read-overflow

2018-12-05 Thread Josh Steadmon
fuzz-commit-graph identified a case where Git will read past the end of
a buffer containing a commit graph if the graph's header has an
incorrect chunk count. A simple bounds check in parse_commit_graph()
prevents this.

Signed-off-by: Josh Steadmon 
Helped-by: Derrick Stolee 
---
 commit-graph.c | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 0755359b1a..fee171a5f3 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -175,10 +175,19 @@ struct commit_graph *parse_commit_graph(void *graph_map, 
int fd,
last_chunk_offset = 8;
chunk_lookup = data + 8;
for (i = 0; i < graph->num_chunks; i++) {
-   uint32_t chunk_id = get_be32(chunk_lookup + 0);
-   uint64_t chunk_offset = get_be64(chunk_lookup + 4);
+   uint32_t chunk_id;
+   uint64_t chunk_offset;
int chunk_repeated = 0;
 
+   if (chunk_lookup + GRAPH_CHUNKLOOKUP_WIDTH > data + graph_size) 
{
+   error(_("chunk lookup table entry missing; graph file 
may be incomplete"));
+   free(graph);
+   return NULL;
+   }
+
+   chunk_id = get_be32(chunk_lookup + 0);
+   chunk_offset = get_be64(chunk_lookup + 4);
+
chunk_lookup += GRAPH_CHUNKLOOKUP_WIDTH;
 
if (chunk_offset > graph_size - GIT_MAX_RAWSZ) {
-- 
2.20.0.rc2.403.gdbc3b29805-goog



[PATCH 1/2] commit-graph, fuzz: Add fuzzer for commit-graph

2018-12-05 Thread Josh Steadmon
Breaks load_commit_graph_one() into a new function,
parse_commit_graph(). The latter function operates on arbitrary buffers,
which makes it suitable as a fuzzing target.

Adds fuzz-commit-graph.c, which provides a fuzzing entry point
compatible with libFuzzer (and possibly other fuzzing engines).

Signed-off-by: Josh Steadmon 
---
 .gitignore  |  1 +
 Makefile|  1 +
 commit-graph.c  | 63 +
 fuzz-commit-graph.c | 18 +
 4 files changed, 66 insertions(+), 17 deletions(-)
 create mode 100644 fuzz-commit-graph.c

diff --git a/.gitignore b/.gitignore
index 0d77ea5894..8bcf153ed9 100644
--- a/.gitignore
+++ b/.gitignore
@@ -1,3 +1,4 @@
+/fuzz-commit-graph
 /fuzz_corpora
 /fuzz-pack-headers
 /fuzz-pack-idx
diff --git a/Makefile b/Makefile
index 1a44c811aa..6b72f37c29 100644
--- a/Makefile
+++ b/Makefile
@@ -684,6 +684,7 @@ SCRIPTS = $(SCRIPT_SH_INS) \
 
 ETAGS_TARGET = TAGS
 
+FUZZ_OBJS += fuzz-commit-graph.o
 FUZZ_OBJS += fuzz-pack-headers.o
 FUZZ_OBJS += fuzz-pack-idx.o
 
diff --git a/commit-graph.c b/commit-graph.c
index 40c855f185..0755359b1a 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -46,6 +46,10 @@
 #define GRAPH_MIN_SIZE (GRAPH_HEADER_SIZE + 4 * GRAPH_CHUNKLOOKUP_WIDTH \
+ GRAPH_FANOUT_SIZE + GRAPH_OID_LEN)
 
+struct commit_graph *parse_commit_graph(void *graph_map, int fd,
+   size_t graph_size);
+
+
 char *get_commit_graph_filename(const char *obj_dir)
 {
return xstrfmt("%s/info/commit-graph", obj_dir);
@@ -84,16 +88,10 @@ static int commit_graph_compatible(struct repository *r)
 struct commit_graph *load_commit_graph_one(const char *graph_file)
 {
void *graph_map;
-   const unsigned char *data, *chunk_lookup;
size_t graph_size;
struct stat st;
-   uint32_t i;
-   struct commit_graph *graph;
+   struct commit_graph *ret;
int fd = git_open(graph_file);
-   uint64_t last_chunk_offset;
-   uint32_t last_chunk_id;
-   uint32_t graph_signature;
-   unsigned char graph_version, hash_version;
 
if (fd < 0)
return NULL;
@@ -108,27 +106,61 @@ struct commit_graph *load_commit_graph_one(const char 
*graph_file)
die(_("graph file %s is too small"), graph_file);
}
graph_map = xmmap(NULL, graph_size, PROT_READ, MAP_PRIVATE, fd, 0);
+   ret = parse_commit_graph(graph_map, fd, graph_size);
+
+   if (ret == NULL) {
+   munmap(graph_map, graph_size);
+   close(fd);
+   exit(1);
+   }
+
+   return ret;
+}
+
+/*
+ * This function is intended to be used only from load_commit_graph_one() or in
+ * fuzz tests.
+ */
+struct commit_graph *parse_commit_graph(void *graph_map, int fd,
+   size_t graph_size)
+{
+   const unsigned char *data, *chunk_lookup;
+   uint32_t i;
+   struct commit_graph *graph;
+   uint64_t last_chunk_offset;
+   uint32_t last_chunk_id;
+   uint32_t graph_signature;
+   unsigned char graph_version, hash_version;
+
+   /*
+* This should already be checked in load_commit_graph_one, but we still
+* need a check here for when we're calling parse_commit_graph directly
+* from fuzz tests. We can omit the error message in that case.
+*/
+   if (graph_size < GRAPH_MIN_SIZE)
+   return NULL;
+
data = (const unsigned char *)graph_map;
 
graph_signature = get_be32(data);
if (graph_signature != GRAPH_SIGNATURE) {
error(_("graph signature %X does not match signature %X"),
  graph_signature, GRAPH_SIGNATURE);
-   goto cleanup_fail;
+   return NULL;
}
 
graph_version = *(unsigned char*)(data + 4);
if (graph_version != GRAPH_VERSION) {
error(_("graph version %X does not match version %X"),
  graph_version, GRAPH_VERSION);
-   goto cleanup_fail;
+   return NULL;
}
 
hash_version = *(unsigned char*)(data + 5);
if (hash_version != GRAPH_OID_VERSION) {
error(_("hash version %X does not match version %X"),
  hash_version, GRAPH_OID_VERSION);
-   goto cleanup_fail;
+   return NULL;
}
 
graph = alloc_commit_graph();
@@ -152,7 +184,8 @@ struct commit_graph *load_commit_graph_one(const char 
*graph_file)
if (chunk_offset > graph_size - GIT_MAX_RAWSZ) {
error(_("improper chunk offset %08x%08x"), 
(uint32_t)(chunk_offset >> 32),
  (uint32_t)chunk_offset);
-   goto cleanup_fail;
+   free(graph);
+   return NULL;
}
 

[PATCH 0/2] Add commit-graph fuzzer and fix buffer overflow

2018-12-05 Thread Josh Steadmon
Add a new fuzz test for the commit graph and fix a buffer read-overflow
that it discovered.

Josh Steadmon (2):
  commit-graph, fuzz: Add fuzzer for commit-graph
  commit-graph: fix buffer read-overflow

 .gitignore  |  1 +
 Makefile|  1 +
 commit-graph.c  | 76 +
 fuzz-commit-graph.c | 18 +++
 4 files changed, 77 insertions(+), 19 deletions(-)
 create mode 100644 fuzz-commit-graph.c

-- 
2.20.0.rc2.403.gdbc3b29805-goog



[PATCH v5 1/1] protocol: advertise multiple supported versions

2018-11-16 Thread Josh Steadmon
Currently the client advertises that it supports the wire protocol
version set in the protocol.version config. However, not all services
support the same set of protocol versions. For example, git-receive-pack
supports v1 and v0, but not v2. If a client connects to git-receive-pack
and requests v2, it will instead be downgraded to v0. Other services,
such as git-upload-archive, do not do any version negotiation checks.

This patch creates a protocol version registry. Individual client and
server programs register all the protocol versions they support prior to
communicating with a remote instance. Versions should be listed in
preference order; the version specified in protocol.version will
automatically be moved to the front of the registry.

The protocol version registry is passed to remote helpers via the
GIT_PROTOCOL environment variable.

Clients now advertise the full list of registered versions. Servers
select the first allowed version from this advertisement.

Additionally, remove special cases around advertising version=0.
Previously we avoided adding version negotiation fields in server
responses if it looked like the client wanted v0. However, including
these fields does not change behavior, so it's better to have simpler
code.

While we're at it, remove unnecessary externs from function declarations
in protocol.h.

Signed-off-by: Josh Steadmon 
---
 builtin/archive.c   |   3 +
 builtin/clone.c |   4 ++
 builtin/fetch-pack.c|   4 ++
 builtin/fetch.c |   5 ++
 builtin/ls-remote.c |   5 ++
 builtin/pull.c  |   5 ++
 builtin/push.c  |   4 ++
 builtin/receive-pack.c  |   3 +
 builtin/send-pack.c |   3 +
 builtin/upload-archive.c|   3 +
 builtin/upload-pack.c   |   4 ++
 connect.c   |  52 +++
 protocol.c  | 122 +---
 protocol.h  |  23 ++-
 remote-curl.c   |  27 +---
 t/t5551-http-fetch-smart.sh |   1 +
 t/t5570-git-daemon.sh   |   2 +-
 t/t5601-clone.sh|  38 +--
 t/t5700-protocol-v1.sh  |   8 +--
 t/t5702-protocol-v2.sh  |  16 +++--
 transport-helper.c  |   6 ++
 21 files changed, 256 insertions(+), 82 deletions(-)

diff --git a/builtin/archive.c b/builtin/archive.c
index e74f675390..8adb9f381b 100644
--- a/builtin/archive.c
+++ b/builtin/archive.c
@@ -8,6 +8,7 @@
 #include "transport.h"
 #include "parse-options.h"
 #include "pkt-line.h"
+#include "protocol.h"
 #include "sideband.h"
 
 static void create_output_file(const char *output_file)
@@ -94,6 +95,8 @@ int cmd_archive(int argc, const char **argv, const char 
*prefix)
OPT_END()
};
 
+   register_allowed_protocol_version(protocol_v0);
+
argc = parse_options(argc, argv, prefix, local_opts, NULL,
 PARSE_OPT_KEEP_ALL);
 
diff --git a/builtin/clone.c b/builtin/clone.c
index fd2c3ef090..1651a950b6 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -900,6 +900,10 @@ int cmd_clone(int argc, const char **argv, const char 
*prefix)
struct refspec rs = REFSPEC_INIT_FETCH;
struct argv_array ref_prefixes = ARGV_ARRAY_INIT;
 
+   register_allowed_protocol_version(protocol_v2);
+   register_allowed_protocol_version(protocol_v1);
+   register_allowed_protocol_version(protocol_v0);
+
fetch_if_missing = 0;
 
packet_trace_identity("clone");
diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index 1a1bc63566..cba935e4d3 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -57,6 +57,10 @@ int cmd_fetch_pack(int argc, const char **argv, const char 
*prefix)
 
fetch_if_missing = 0;
 
+   register_allowed_protocol_version(protocol_v2);
+   register_allowed_protocol_version(protocol_v1);
+   register_allowed_protocol_version(protocol_v0);
+
packet_trace_identity("fetch-pack");
 
memset(, 0, sizeof(args));
diff --git a/builtin/fetch.c b/builtin/fetch.c
index 61bec5d213..2a20cf8bfd 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -21,6 +21,7 @@
 #include "argv-array.h"
 #include "utf8.h"
 #include "packfile.h"
+#include "protocol.h"
 #include "list-objects-filter-options.h"
 
 static const char * const builtin_fetch_usage[] = {
@@ -1476,6 +1477,10 @@ int cmd_fetch(int argc, const char **argv, const char 
*prefix)
int prune_tags_ok = 1;
struct argv_array argv_gc_auto = ARGV_ARRAY_INIT;
 
+   register_allowed_protocol_version(protocol_v2);
+   register_allowed_protocol_version(protocol_v1);
+   register_allowed_protocol_version(protocol_v0);
+
packet_trace_identity("fetch");
 
fetch_if_missing = 0;
diff --git a/builtin/ls-remote.c b/builtin/ls-remote.c
index 1a25df7ee1..ea685e8bb9 100644
-

[PATCH v5 0/1] Advertise multiple supported proto versions

2018-11-16 Thread Josh Steadmon
Changes from V4: remove special cases around advertising version=0.

Add a registry of supported wire protocol versions that individual
commands can use to declare supported versions before contacting a
server. The client will then advertise all supported versions, while the
server will choose the first allowed version from the advertised
list.

Every command that acts as a client or server must now register its
supported protocol versions.


Josh Steadmon (1):
  protocol: advertise multiple supported versions

 builtin/archive.c   |   3 +
 builtin/clone.c |   4 ++
 builtin/fetch-pack.c|   4 ++
 builtin/fetch.c |   5 ++
 builtin/ls-remote.c |   5 ++
 builtin/pull.c  |   5 ++
 builtin/push.c  |   4 ++
 builtin/receive-pack.c  |   3 +
 builtin/send-pack.c |   3 +
 builtin/upload-archive.c|   3 +
 builtin/upload-pack.c   |   4 ++
 connect.c   |  52 +++
 protocol.c  | 122 +---
 protocol.h  |  23 ++-
 remote-curl.c   |  27 +---
 t/t5551-http-fetch-smart.sh |   1 +
 t/t5570-git-daemon.sh   |   2 +-
 t/t5601-clone.sh|  38 +--
 t/t5700-protocol-v1.sh  |   8 +--
 t/t5702-protocol-v2.sh  |  16 +++--
 transport-helper.c  |   6 ++
 21 files changed, 256 insertions(+), 82 deletions(-)

Range-diff against v4:
1:  3c023991c0 ! 1:  60f6f2fbd8 protocol: advertise multiple supported versions
@@ -21,6 +21,12 @@
 Clients now advertise the full list of registered versions. Servers
 select the first allowed version from this advertisement.
 
+Additionally, remove special cases around advertising version=0.
+Previously we avoided adding version negotiation fields in server
+responses if it looked like the client wanted v0. However, including
+these fields does not change behavior, so it's better to have simpler
+code.
+
 While we're at it, remove unnecessary externs from function 
declarations
 in protocol.h.
 
@@ -245,18 +251,21 @@
  {
struct child_process *conn;
 @@
+   prog, path, 0,
target_host, 0);
  
-   /* If using a new version put that stuff here after a second null byte 
*/
+-  /* If using a new version put that stuff here after a second null byte 
*/
 -  if (version > 0) {
-+  if (strcmp(version_advert->buf, "version=0")) {
-   strbuf_addch(, '\0');
+-  strbuf_addch(, '\0');
 -  strbuf_addf(, "version=%d%c",
 -  version, '\0');
-+  strbuf_addf(, "%s%c", version_advert->buf, '\0');
-   }
+-  }
++  /* Add version fields after a second null byte */
++  strbuf_addch(, '\0');
++  strbuf_addf(, "%s%c", version_advert->buf, '\0');
  
packet_write(fd[1], request.buf, request.len);
+ 
 @@
   */
  static void push_ssh_options(struct argv_array *args, struct argv_array 
*env,
@@ -264,9 +273,9 @@
 -   enum protocol_version version, int flags)
 +   const struct strbuf *version_advert, int flags)
  {
-   if (variant == VARIANT_SSH &&
+-  if (variant == VARIANT_SSH &&
 -  version > 0) {
-+  strcmp(version_advert->buf, "version=0")) {
++  if (variant == VARIANT_SSH) {
argv_array_push(args, "-o");
argv_array_push(args, "SendEnv=" GIT_PROTOCOL_ENVIRONMENT);
 -  argv_array_pushf(env, GIT_PROTOCOL_ENVIRONMENT "=version=%d",
@@ -346,13 +355,13 @@
 -  if (version > 0) {
 -  argv_array_pushf(>env_array, 
GIT_PROTOCOL_ENVIRONMENT "=version=%d",
 -   version);
-+  if (strcmp(version_advert.buf, "version=0")) {
-+  argv_array_pushf(>env_array,
-+   GIT_PROTOCOL_ENVIRONMENT "=%s",
-+   version_advert.buf);
-   }
+-  }
++  argv_array_pushf(>env_array,
++   GIT_PROTOCOL_ENVIRONMENT "=%s",
++   version_advert.buf);
}
argv_array_push(>args, cmd.buf);
+ 
 
  diff --git a/protocol.c b/protocol.c
  --- a/protocol.c
@@ -573,8 +582,7 @@
 +static int get_client_protocol_http_header(const struct strbuf 
*version_advert,
 + struct strbuf *header)
 +{
-+  if (version_advert

Re: [PATCH 2/3] remote-curl: tighten "version 2" check for smart-http

2018-11-16 Thread Josh Steadmon
On 2018.11.16 03:48, Jeff King wrote:
> In a v2 smart-http conversation, the server should reply to our initial
> request with a pkt-line saying "version 2" (this is the start of the
> "capabilities advertisement"). We check for the string using
> starts_with(), but that's overly permissive (it would match "version
> 20", for example).
> 
> Let's tighten this check to use strcmp(). Note that we don't need to
> worry about a trailing newline here, because the ptk-line code will have
> chomped it for us already.
> 
> Signed-off-by: Jeff King 
> ---
>  remote-curl.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/remote-curl.c b/remote-curl.c
> index dd9bc41aa1..3c9c4a07c3 100644
> --- a/remote-curl.c
> +++ b/remote-curl.c
> @@ -375,7 +375,7 @@ static void check_smart_http(struct discovery *d, const 
> char *service,
>   d->len = src_len;
>   d->proto_git = 1;
>  
> - } else if (starts_with(line, "version 2")) {
> + } else if (!strcmp(line, "version 2")) {
>   /*
>* v2 smart http; do not consume version packet, which will
>* be handled elsewhere.
> -- 
> 2.19.1.1636.gc7a073d580
> 

Looks good to me.

Reviewed-by: Josh Steadmon 


Re: [PATCH 1/3] remote-curl: refactor smart-http discovery

2018-11-16 Thread Josh Steadmon
On 2018.11.16 03:47, Jeff King wrote:
> After making initial contact with an http server, we have to decide if
> the server supports smart-http, and if so, which version. Our rules are
> a bit inconsistent:
> 
>   1. For v0, we require that the content-type indicates a smart-http
>  response. We also require the response to look vaguely like a
>  pkt-line starting with "#". If one of those does not match, we fall
>  back to dumb-http.
> 
>  But according to our http protocol spec[1]:
> 
>Dumb servers MUST NOT return a return type starting with
>`application/x-git-`.
> 
>  If we see the expected content-type, we should consider it
>  smart-http. At that point we can parse the pkt-line for real, and
>  complain if it is not syntactically valid.
> 
>   2. For v2, we do not actually check the content-type. Our v2 protocol
>  spec says[2]:
> 
>When using the http:// or https:// transport a client makes a
>"smart" info/refs request as described in `http-protocol.txt`[...]
> 
>  and the http spec is clear that for a smart-http[3]:
> 
>The Content-Type MUST be `application/x-$servicename-advertisement`.
> 
>  So it is required according to the spec.
> 
> These inconsistencies were easy to miss because of the way the original
> code was written as an inline conditional. Let's pull it out into its
> own function for readability, and improve a few things:
> 
>  - we now predicate the smart/dumb decision entirely on the presence of
>the correct content-type
> 
>  - we do a real pkt-line parse before deciding how to proceed (and die
>if it isn't valid)
> 
>  - use skip_prefix() for comparing service strings, instead of
>constructing expected output in a strbuf; this avoids dealing with
>memory cleanup
> 
> Note that this _is_ tightening what the client will allow. It's all
> according to the spec, but it's possible that other implementations
> might violate these. However, violating these particular rules seems
> like an odd choice for a server to make.
> 
> [1] Documentation/technical/http-protocol.txt, l. 166-167
> [2] Documentation/technical/protocol-v2.txt, l. 63-64
> [3] Documentation/technical/http-protocol.txt, l. 247
> 
> Helped-by: Josh Steadmon 
> Signed-off-by: Jeff King 
> ---
>  remote-curl.c | 93 ---
>  1 file changed, 59 insertions(+), 34 deletions(-)
> 
> diff --git a/remote-curl.c b/remote-curl.c
> index 762a55a75f..dd9bc41aa1 100644
> --- a/remote-curl.c
> +++ b/remote-curl.c
> @@ -330,9 +330,65 @@ static int get_protocol_http_header(enum 
> protocol_version version,
>   return 0;
>  }
>  
> +static void check_smart_http(struct discovery *d, const char *service,
> +  struct strbuf *type)
> +{
> + char *src_buf;
> + size_t src_len;
> + char *line;
> + const char *p;
> +
> + /*
> +  * If we don't see x-$service-advertisement, then it's not smart-http.
> +  * But once we do, we commit to it and assume any other protocol
> +  * violations are hard errors.
> +  */
> + if (!skip_prefix(type->buf, "application/x-", ) ||
> + !skip_prefix(p, service, ) ||
> + strcmp(p, "-advertisement"))
> + return;
> +
> + /*
> +  * "Peek" at the first packet by using a separate buf/len pair; some
> +  * cases below require us leaving the originals intact.
> +  */
> + src_buf = d->buf;
> + src_len = d->len;
> + line = packet_read_line_buf(_buf, _len, NULL);
> + if (!line)
> + die("invalid server response; expected service, got flush 
> packet");
> +
> + if (skip_prefix(line, "# service=", ) && !strcmp(p, service)) {
> + /*
> +  * The header can include additional metadata lines, up
> +  * until a packet flush marker.  Ignore these now, but
> +  * in the future we might start to scan them.
> +  */
> + while (packet_read_line_buf(_buf, _len, NULL))
> + ;
> +
> + /*
> +  * v0 smart http; callers expect us to soak up the
> +  * service and header packets
> +  */
> + d->buf = src_buf;
> + d->len = src_len;
> + d->proto_git = 1;
> +
> + } else if (starts_with(line, "version 2")) {
> + /*
> +  * v2 smart http; do not consume version packet, which will
> +  * be handled elsewhere

Re: [PATCH 0/3] remote-curl smart-http discovery cleanup

2018-11-16 Thread Josh Steadmon
On 2018.11.16 03:44, Jeff King wrote:
[...]
> Amusingly, this does break the test you just added, because it tries to
> issue an ERR after claiming "text/html" (and after my patch, we
> correctly fall back to dumb-http).

Heh yeah, I copied the script from a dumb-http test without reading the
spec.


Re: [PATCH v3 1/1] protocol: advertise multiple supported versions

2018-11-16 Thread Josh Steadmon
On 2018.11.16 11:45, Junio C Hamano wrote:
> Josh Steadmon  writes:
> 
> >> What I was alludding to was a lot simpler, though.  An advert string
> >> "version=0:version=1" from a client that prefers version 0 won't be
> >> !strcmp("version=0", advert) but seeing many strcmp() in the patch
> >> made me wonder.
> >
> > Ah I see. The strcmp()s against "version=0" are special cases for where
> > it looks like the client does not understand any sort of version
> > negotiation. If we see multiple versions listed in the advert, then the
> > rest of the selection logic should do the right thing.
> 
> OK, let me try to see if I understand correctly by rephrasing.
> 
> If the client does not say anything about which version it prefers
> (because it only knows about version 0 without even realizing that
> there is a version negotiation exchange), we substitute the lack of
> proposed versions with string "version=0", and the strcmp()s I saw
> and was puzzled by are all about special casing such a client.  But
> we would end up behaving the same way (at least when judged only by
> externally visible effects) to a client that is aware of version
> negotiation and tells us it prefers version 0 (and nothing else)
> with the selection logic anyway.
> 
> Did I get it right?  If so, yeah, I think it makes sense to avoid
> two codepaths that ends up doing the same thing by removing the
> special case.

Yes, that is correct. The next version will remove the special cases
here.

> > However, I think that it might work to remove the special cases. In the
> > event that the client is so old that it doesn't understand any form of
> > version negotiation, it should just ignore the version fields /
> > environment vars. If you think it's cleaner to remove the special cases,
> > let me know.


Re: [PATCH] remote-curl: die on server-side errors

2018-11-15 Thread Josh Steadmon
On 2018.11.14 02:00, Jeff King wrote:
> On Tue, Nov 13, 2018 at 07:49:15PM -0500, Jeff King wrote:
> 
> > Yes, the packet_read_line_buf() interface will both advance the buf
> > pointer and decrement the length.  So if we want to "peek", we have to
> > do so with a copy (there's a peek function if you use the packet_reader
> > interface, but that might be overkill here).
> > 
> > You can rewrite it like this, which is a pretty faithful conversion and
> > passes the tests (but see below).
> > [...]
> 
> Here's a version which is less faithful, but I think does sensible
> things in all cases, and is much easier to follow. I get a little
> nervous just because it tightens some cases, and one never knows if
> other implementations might be relying on the looseness. E.g.:
> 
>   - in the current code we'd still drop back to dumb http if the server
> tells us "application/x-git-upload-pack" but the initial pktline
> doesn't start with "#" (even though if it _does_ have "#" at
> position 5 but isn't a valid pktline, we'd complain!)
> 
>   - right now the v2 protocol does not require the server to say
> "application/x-git-upload-pack" for the content-type
> 
> This patch tightens both of those (I also made a few stylistic tweaks,
> and added the ERR condition to show where it would go). I dunno. Part of
> me sees this as a nice cleanup, but maybe it is better to just leave it
> alone. A lot of these behaviors are just how it happens to work now, and
> not part of the spec, but we don't know what might be relying on them.

At least according to the protocol-v2 and http-protocol docs, the
stricter behavior seems correct:

For the first point above, dumb servers should never use an
"application/x-git-*" content type (http-protocol.txt line 163-167).

For the second point, the docs require v2 servers to use
"application/x-git-*" content types. protocol-v2.txt lines 63-65 state
that v2 clients should make a smart http request, while
http-protocol.txt lines 247-252 state that a smart server's response
type must be "application/x-git-*".

Of course we don't know if other implementations follow the spec, but
ISTM that this patch at least doesn't contradict how we've promised the
protocols should work.

If no one has any objections, I'll include the diff below in v2. Thanks
for the help Jeff!

> diff --git a/remote-curl.c b/remote-curl.c
> index 762a55a75f..1adb96311b 100644
> --- a/remote-curl.c
> +++ b/remote-curl.c
> @@ -330,9 +330,61 @@ static int get_protocol_http_header(enum 
> protocol_version version,
>   return 0;
>  }
>  
> +static void check_smart_http(struct discovery *d, const char *service,
> +  struct strbuf *type)
> +{
> + char *src_buf;
> + size_t src_len;
> + char *line;
> + const char *p;
> +
> + if (!skip_prefix(type->buf, "application/x-", ) ||
> + !skip_prefix(p, service, ) ||
> + strcmp(p, "-advertisement"))
> + return;
> +
> + /*
> +  * We speculatively try to read a packet, which means we must preserve
> +  * the original buf/len pair in some cases.
> +  */
> + src_buf = d->buf;
> + src_len = d->len;
> + line = packet_read_line_buf(_buf, _len, NULL);
> + if (!line)
> + die("invalid server response; expected service, got flush 
> packet");
> +
> + if (skip_prefix(line, "# service=", ) && !strcmp(p, service)) {
> + /*
> +  * The header can include additional metadata lines, up
> +  * until a packet flush marker.  Ignore these now, but
> +  * in the future we might start to scan them.
> +  */
> + while (packet_read_line_buf(_buf, _len, NULL))
> + ;
> +
> + /*
> +  * v0 smart http; callers expect us to soak up the
> +  * service and header packets
> +  */
> + d->buf = src_buf;
> + d->len = src_len;
> + d->proto_git = 1;
> +
> + } else if (starts_with(line, "version 2")) { /* should be strcmp()? */
> + /*
> +  * v2 smart http; do not consume version packet, which will
> +  * be handled elsewhere.
> +  */
> + d->proto_git = 1;
> + } else if (skip_prefix(line, "ERR ", )) {
> + die(_("remote error: %s"), p);
> + } else {
> + die("invalid server response; got '%s'", line);
> + }
> +}
> +
>  static struct discovery *discover_refs(const char *service, int for_push)
>  {
> - struct strbuf exp = STRBUF_INIT;
>   struct strbuf type = STRBUF_INIT;
>   struct strbuf charset = STRBUF_INIT;
>   struct strbuf buffer = STRBUF_INIT;
> @@ -405,38 +457,8 @@ static struct discovery *discover_refs(const char 
> *service, int for_push)
>   last->buf_alloc = strbuf_detach(, >len);
>   last->buf = last->buf_alloc;
>  
> - strbuf_addf(, "application/x-%s-advertisement", service);
> - if 

Re: [PATCH v3 1/1] protocol: advertise multiple supported versions

2018-11-14 Thread Josh Steadmon
On 2018.11.14 11:38, Junio C Hamano wrote:
> Josh Steadmon  writes:
> 
> > On 2018.11.13 13:01, Junio C Hamano wrote:
> >> I am wondering if the code added by this patch outside this
> >> function, with if (strcmp(client_ad.buf, "version=0") sprinkled all
> >> over the place, works sensibly when the other side says "I prefer
> >> version=0 but I do not mind talking version=1".
> >
> > Depends on what you mean by "sensibly" :). In the current case, if the
> > client prefers v0, it will always end up using v0. After the fixes
> > described above, it will always use v0 unless the server no longer
> > supports v0. Is that what you would expect?
> 
> Yes, that sounds like a sensible behaviour.
> 
> What I was alludding to was a lot simpler, though.  An advert string
> "version=0:version=1" from a client that prefers version 0 won't be
> !strcmp("version=0", advert) but seeing many strcmp() in the patch
> made me wonder.

Ah I see. The strcmp()s against "version=0" are special cases for where
it looks like the client does not understand any sort of version
negotiation. If we see multiple versions listed in the advert, then the
rest of the selection logic should do the right thing.

However, I think that it might work to remove the special cases. In the
event that the client is so old that it doesn't understand any form of
version negotiation, it should just ignore the version fields /
environment vars. If you think it's cleaner to remove the special cases,
let me know.


Re: [PATCH v4 0/1] Advertise multiple supported proto versions

2018-11-14 Thread Josh Steadmon
On 2018.11.14 19:22, Junio C Hamano wrote:
> Josh Steadmon  writes:
> 
> > Fix several bugs identified in v3, clarify commit message, and clean up
> > extern keyword in protocol.h.
> 
> It is good to descirbe the change relative to v3 here, which would
> help those who are interested and reviewed v3.
> 
> To help those who missed the boat and v4 is their first encounter
> with this series, having the description relative to v3 alone and
> nothing else is not very friendly, though.

Ack. Will keep this in mind for future patches.

> > + diff --git a/builtin/upload-archive.c b/builtin/upload-archive.c
> > + --- a/builtin/upload-archive.c
> > + +++ b/builtin/upload-archive.c
> > +@@
> > + #include "builtin.h"
> > + #include "archive.h"
> > + #include "pkt-line.h"
> > ++#include "protocol.h"
> > + #include "sideband.h"
> > + #include "run-command.h"
> > + #include "argv-array.h"
> > +@@
> > +   if (argc == 2 && !strcmp(argv[1], "-h"))
> > +   usage(upload_archive_usage);
> > + 
> > ++  register_allowed_protocol_version(protocol_v0);
> > ++
> > +   /*
> > +* Set up sideband subprocess.
> > +*
> 
> This one unfortunately seems to interact rather badly with your
> js/remote-archive-v2 topic which is already in 'next', when this
> topic is merged to 'pu', and my attempt to mechanically resolve
> conflicts breaks t5000.

Hmm, should we drop js/remote-archive-v2 then? Any solution that
unblocks js/remote-archive-v2 will almost certainly depend on this
patch.


[PATCH v2] Makefile: use FUZZ_CXXFLAGS for linking fuzzers

2018-11-14 Thread Josh Steadmon
OSS-Fuzz requires C++-specific flags to link fuzzers. Passing these in
CFLAGS causes lots of build warnings. Using separate FUZZ_CXXFLAGS
avoids this.

Signed-off-by: Josh Steadmon 
---
Since there's nothing else using CXXFLAGS, let's just make it explicit
that these apply to the fuzzers.


Range-diff against v1:
1:  1630a93f82 < -:  -- Makefile: use CXXFLAGS for linking fuzzers
-:  -- > 1:  6b3d6dd7f0 Makefile: use FUZZ_CXXFLAGS for linking fuzzers

 Makefile | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/Makefile b/Makefile
index bbfbb4292d..0c05896797 100644
--- a/Makefile
+++ b/Makefile
@@ -3098,14 +3098,16 @@ cover_db_html: cover_db
 # An example command to build against libFuzzer from LLVM 4.0.0:
 #
 # make CC=clang CXX=clang++ \
-#  CFLAGS="-fsanitize-coverage=trace-pc-guard -fsanitize=address" \
+#  FUZZ_CXXFLAGS="-fsanitize-coverage=trace-pc-guard -fsanitize=address" \
 #  LIB_FUZZING_ENGINE=/usr/lib/llvm-4.0/lib/libFuzzer.a \
 #  fuzz-all
 #
+FUZZ_CXXFLAGS ?= $(CFLAGS)
+
 .PHONY: fuzz-all
 
 $(FUZZ_PROGRAMS): all
-   $(QUIET_LINK)$(CXX) $(CFLAGS) $(LIB_OBJS) $(BUILTIN_OBJS) \
+   $(QUIET_LINK)$(CXX) $(FUZZ_CXXFLAGS) $(LIB_OBJS) $(BUILTIN_OBJS) \
$(XDIFF_OBJS) $(EXTLIBS) git.o $@.o $(LIB_FUZZING_ENGINE) -o $@
 
 fuzz-all: $(FUZZ_PROGRAMS)
-- 
2.19.1.930.g4563a0d9d0-goog



[PATCH v4 1/1] protocol: advertise multiple supported versions

2018-11-13 Thread Josh Steadmon
Currently the client advertises that it supports the wire protocol
version set in the protocol.version config. However, not all services
support the same set of protocol versions. For example, git-receive-pack
supports v1 and v0, but not v2. If a client connects to git-receive-pack
and requests v2, it will instead be downgraded to v0. Other services,
such as git-upload-archive, do not do any version negotiation checks.

This patch creates a protocol version registry. Individual client and
server programs register all the protocol versions they support prior to
communicating with a remote instance. Versions should be listed in
preference order; the version specified in protocol.version will
automatically be moved to the front of the registry.

The protocol version registry is passed to remote helpers via the
GIT_PROTOCOL environment variable.

Clients now advertise the full list of registered versions. Servers
select the first allowed version from this advertisement.

While we're at it, remove unnecessary externs from function declarations
in protocol.h.

Signed-off-by: Josh Steadmon 
---
 builtin/archive.c|   3 +
 builtin/clone.c  |   4 ++
 builtin/fetch-pack.c |   4 ++
 builtin/fetch.c  |   5 ++
 builtin/ls-remote.c  |   5 ++
 builtin/pull.c   |   5 ++
 builtin/push.c   |   4 ++
 builtin/receive-pack.c   |   3 +
 builtin/send-pack.c  |   3 +
 builtin/upload-archive.c |   3 +
 builtin/upload-pack.c|   4 ++
 connect.c|  47 +++
 protocol.c   | 122 ---
 protocol.h   |  23 +++-
 remote-curl.c|  28 ++---
 t/t5570-git-daemon.sh|   2 +-
 t/t5700-protocol-v1.sh   |   8 +--
 t/t5702-protocol-v2.sh   |  16 +++--
 transport-helper.c   |   6 ++
 19 files changed, 237 insertions(+), 58 deletions(-)

diff --git a/builtin/archive.c b/builtin/archive.c
index e74f675390..8adb9f381b 100644
--- a/builtin/archive.c
+++ b/builtin/archive.c
@@ -8,6 +8,7 @@
 #include "transport.h"
 #include "parse-options.h"
 #include "pkt-line.h"
+#include "protocol.h"
 #include "sideband.h"
 
 static void create_output_file(const char *output_file)
@@ -94,6 +95,8 @@ int cmd_archive(int argc, const char **argv, const char 
*prefix)
OPT_END()
};
 
+   register_allowed_protocol_version(protocol_v0);
+
argc = parse_options(argc, argv, prefix, local_opts, NULL,
 PARSE_OPT_KEEP_ALL);
 
diff --git a/builtin/clone.c b/builtin/clone.c
index fd2c3ef090..1651a950b6 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -900,6 +900,10 @@ int cmd_clone(int argc, const char **argv, const char 
*prefix)
struct refspec rs = REFSPEC_INIT_FETCH;
struct argv_array ref_prefixes = ARGV_ARRAY_INIT;
 
+   register_allowed_protocol_version(protocol_v2);
+   register_allowed_protocol_version(protocol_v1);
+   register_allowed_protocol_version(protocol_v0);
+
fetch_if_missing = 0;
 
packet_trace_identity("clone");
diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index 1a1bc63566..cba935e4d3 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -57,6 +57,10 @@ int cmd_fetch_pack(int argc, const char **argv, const char 
*prefix)
 
fetch_if_missing = 0;
 
+   register_allowed_protocol_version(protocol_v2);
+   register_allowed_protocol_version(protocol_v1);
+   register_allowed_protocol_version(protocol_v0);
+
packet_trace_identity("fetch-pack");
 
memset(, 0, sizeof(args));
diff --git a/builtin/fetch.c b/builtin/fetch.c
index 61bec5d213..2a20cf8bfd 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -21,6 +21,7 @@
 #include "argv-array.h"
 #include "utf8.h"
 #include "packfile.h"
+#include "protocol.h"
 #include "list-objects-filter-options.h"
 
 static const char * const builtin_fetch_usage[] = {
@@ -1476,6 +1477,10 @@ int cmd_fetch(int argc, const char **argv, const char 
*prefix)
int prune_tags_ok = 1;
struct argv_array argv_gc_auto = ARGV_ARRAY_INIT;
 
+   register_allowed_protocol_version(protocol_v2);
+   register_allowed_protocol_version(protocol_v1);
+   register_allowed_protocol_version(protocol_v0);
+
packet_trace_identity("fetch");
 
fetch_if_missing = 0;
diff --git a/builtin/ls-remote.c b/builtin/ls-remote.c
index 1a25df7ee1..ea685e8bb9 100644
--- a/builtin/ls-remote.c
+++ b/builtin/ls-remote.c
@@ -1,5 +1,6 @@
 #include "builtin.h"
 #include "cache.h"
+#include "protocol.h"
 #include "transport.h"
 #include "ref-filter.h"
 #include "remote.h"
@@ -80,6 +81,10 @@ int cmd_ls_remote(int argc, const char **argv, const char 
*prefix)
 
memset(_array, 0, sizeof(ref_array));
 
+   register_al

[PATCH v4 0/1] Advertise multiple supported proto versions

2018-11-13 Thread Josh Steadmon
Fix several bugs identified in v3, clarify commit message, and clean up
extern keyword in protocol.h.

Josh Steadmon (1):
  protocol: advertise multiple supported versions

 builtin/archive.c|   3 +
 builtin/clone.c  |   4 ++
 builtin/fetch-pack.c |   4 ++
 builtin/fetch.c  |   5 ++
 builtin/ls-remote.c  |   5 ++
 builtin/pull.c   |   5 ++
 builtin/push.c   |   4 ++
 builtin/receive-pack.c   |   3 +
 builtin/send-pack.c  |   3 +
 builtin/upload-archive.c |   3 +
 builtin/upload-pack.c|   4 ++
 connect.c|  47 +++
 protocol.c   | 122 ---
 protocol.h   |  23 +++-
 remote-curl.c|  28 ++---
 t/t5570-git-daemon.sh|   2 +-
 t/t5700-protocol-v1.sh   |   8 +--
 t/t5702-protocol-v2.sh   |  16 +++--
 transport-helper.c   |   6 ++
 19 files changed, 237 insertions(+), 58 deletions(-)

Range-diff against v3:
1:  b9968e3fb0 ! 1:  3c023991c0 protocol: advertise multiple supported versions
@@ -4,22 +4,25 @@
 
 Currently the client advertises that it supports the wire protocol
 version set in the protocol.version config. However, not all services
-support the same set of protocol versions. When connecting to
-git-receive-pack, the client automatically downgrades to v0 if
-config.protocol is set to v2, but this check is not performed for other
-services.
+support the same set of protocol versions. For example, 
git-receive-pack
+supports v1 and v0, but not v2. If a client connects to 
git-receive-pack
+and requests v2, it will instead be downgraded to v0. Other services,
+such as git-upload-archive, do not do any version negotiation checks.
 
-This patch creates a protocol version registry. Individual operations
-register all the protocol versions they support prior to communicating
-with a server. Versions should be listed in preference order; the
-version specified in protocol.version will automatically be moved to 
the
-front of the registry.
+This patch creates a protocol version registry. Individual client and
+server programs register all the protocol versions they support prior 
to
+communicating with a remote instance. Versions should be listed in
+preference order; the version specified in protocol.version will
+automatically be moved to the front of the registry.
 
 The protocol version registry is passed to remote helpers via the
 GIT_PROTOCOL environment variable.
 
 Clients now advertise the full list of registered versions. Servers
-select the first recognized version from this advertisement.
+select the first allowed version from this advertisement.
+
+While we're at it, remove unnecessary externs from function 
declarations
+in protocol.h.
 
 Signed-off-by: Josh Steadmon 
@@ -165,6 +168,20 @@
git_config(git_push_config, );
argc = parse_options(argc, argv, prefix, options, push_usage, 0);
 
+ diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
+ --- a/builtin/receive-pack.c
+ +++ b/builtin/receive-pack.c
+@@
+ 
+   packet_trace_identity("receive-pack");
+ 
++  register_allowed_protocol_version(protocol_v1);
++  register_allowed_protocol_version(protocol_v0);
++
+   argc = parse_options(argc, argv, prefix, options, receive_pack_usage, 
0);
+ 
+   if (argc > 1)
+
  diff --git a/builtin/send-pack.c b/builtin/send-pack.c
  --- a/builtin/send-pack.c
  +++ b/builtin/send-pack.c
@@ -179,6 +196,42 @@
argc = parse_options(argc, argv, prefix, options, send_pack_usage, 0);
if (argc > 0) {
 
+ diff --git a/builtin/upload-archive.c b/builtin/upload-archive.c
+ --- a/builtin/upload-archive.c
+ +++ b/builtin/upload-archive.c
+@@
+ #include "builtin.h"
+ #include "archive.h"
+ #include "pkt-line.h"
++#include "protocol.h"
+ #include "sideband.h"
+ #include "run-command.h"
+ #include "argv-array.h"
+@@
+   if (argc == 2 && !strcmp(argv[1], "-h"))
+   usage(upload_archive_usage);
+ 
++  register_allowed_protocol_version(protocol_v0);
++
+   /*
+* Set up sideband subprocess.
+*
+
+ diff --git a/builtin/upload-pack.c b/builtin/upload-pack.c
+ --- a/builtin/upload-pack.c
+ +++ b/builtin/upload-pack.c
+@@
+   packet_trace_identity("upload-pack");
+   read_replace_refs = 0;
+ 
++  register_allowed_protocol_version(protocol_v2);
++  register_allowed_protocol_version(protocol_v1);
++  register_allowed_protocol_version(protocol_v0);

Re: [PATCH v3 1/1] protocol: advertise multiple supported versions

2018-11-13 Thread Josh Steadmon
On 2018.11.13 19:28, SZEDER Gábor wrote:
> On Mon, Nov 12, 2018 at 01:49:05PM -0800, stead...@google.com wrote:
> 
> > diff --git a/protocol.c b/protocol.c
> > index 5e636785d1..54d2ab991b 100644
> > --- a/protocol.c
> > +++ b/protocol.c
> 
> > +void get_client_protocol_version_advertisement(struct strbuf *advert)
> > +{
> > +   int tmp_nr = nr_allowed_versions;
> > +   enum protocol_version *tmp_allowed_versions, config_version;
> > +   strbuf_reset(advert);
> > +
> > +   have_advertised_versions_already = 1;
> > +
> > +   config_version = get_protocol_version_config();
> > +   if (config_version == protocol_v0) {
> > +   strbuf_addstr(advert, "version=0");
> > +   return;
> > +   }
> > +
> > +   if (tmp_nr > 0) {
> > +   ALLOC_ARRAY(tmp_allowed_versions, tmp_nr);
> > +   copy_array(tmp_allowed_versions, allowed_versions, tmp_nr,
> > +  sizeof(enum protocol_version));
> > +   } else {
> > +   ALLOC_ARRAY(tmp_allowed_versions, 1);
> > +   tmp_nr = 1;
> > +   tmp_allowed_versions[0] = config_version;
> > +   }
> > +
> > +   if (tmp_allowed_versions[0] != config_version)
> > +   for (int i = 1; i < nr_allowed_versions; i++)
> 
> We don't do C99 yet, thus the declaration of a loop variable like this
> is not allowed and triggers compiler errors.
> 
> > +   if (tmp_allowed_versions[i] == config_version) {
> > +   enum protocol_version swap =
> > +   tmp_allowed_versions[0];
> > +   tmp_allowed_versions[0] =
> > +   tmp_allowed_versions[i];
> > +   tmp_allowed_versions[i] = swap;
> > +   }
> > +
> > +   strbuf_addf(advert, "version=%s",
> > +   format_protocol_version(tmp_allowed_versions[0]));
> > +   for (int i = 1; i < tmp_nr; i++)
> 
> Likewise.
> 
> > +   strbuf_addf(advert, ":version=%s",
> > +   format_protocol_version(tmp_allowed_versions[i]));
> > +}

Sorry about that. Will fix in v4. Out of curiousity, do you have a
config.mak snippet that will make these into errors? I played around
with adding combinations of -ansi, -std=c89, and -pedantic to CFLAGS,
but I couldn't get anything that detect the problem without also
breaking on other parts of the build.


Re: [PATCH v3 1/1] protocol: advertise multiple supported versions

2018-11-13 Thread Josh Steadmon
On 2018.11.13 13:01, Junio C Hamano wrote:
> stead...@google.com writes:
> 
> > Currently the client advertises that it supports the wire protocol
> > version set in the protocol.version config. However, not all services
> > support the same set of protocol versions. When connecting to
> > git-receive-pack, the client automatically downgrades to v0 if
> > config.protocol is set to v2, but this check is not performed for other
> > services.
> 
> "downgrades to v0 even if ... is set to v2" you mean?  Otherwise it
> is unclear why asking for v2 leads to using v0.

The downgrade on push happens only when the the configured version is
v2. If v1 is requested, no downgrade is triggered. I'll clarify the
commit message in the next version.

> > This patch creates a protocol version registry. Individual operations
> > register all the protocol versions they support prior to communicating
> > with a server. Versions should be listed in preference order; the
> > version specified in protocol.version will automatically be moved to the
> > front of the registry.
> >
> > The protocol version registry is passed to remote helpers via the
> > GIT_PROTOCOL environment variable.
> >
> > Clients now advertise the full list of registered versions. Servers
> > select the first recognized version from this advertisement.
> 
> Makes sense.
> 
> > +void get_client_protocol_version_advertisement(struct strbuf *advert)
> > +{
> > +   int tmp_nr = nr_allowed_versions;
> > +   enum protocol_version *tmp_allowed_versions, config_version;
> > +   strbuf_reset(advert);
> > +
> > +   have_advertised_versions_already = 1;
> > +
> > +   config_version = get_protocol_version_config();
> > +   if (config_version == protocol_v0) {
> > +   strbuf_addstr(advert, "version=0");
> > +   return;
> > +   }
> > +
> > +   if (tmp_nr > 0) {
> > +   ALLOC_ARRAY(tmp_allowed_versions, tmp_nr);
> > +   copy_array(tmp_allowed_versions, allowed_versions, tmp_nr,
> > +  sizeof(enum protocol_version));
> > +   } else {
> > +   ALLOC_ARRAY(tmp_allowed_versions, 1);
> > +   tmp_nr = 1;
> > +   tmp_allowed_versions[0] = config_version;
> > +   }
> > +
> > +   if (tmp_allowed_versions[0] != config_version)
> > +   for (int i = 1; i < nr_allowed_versions; i++)
> > +   if (tmp_allowed_versions[i] == config_version) {
> > +   enum protocol_version swap =
> > +   tmp_allowed_versions[0];
> > +   tmp_allowed_versions[0] =
> > +   tmp_allowed_versions[i];
> > +   tmp_allowed_versions[i] = swap;
> > +   }
> > +
> > +   strbuf_addf(advert, "version=%s",
> > +   format_protocol_version(tmp_allowed_versions[0]));
> > +   for (int i = 1; i < tmp_nr; i++)
> > +   strbuf_addf(advert, ":version=%s",
> > +   format_protocol_version(tmp_allowed_versions[i]));
> > +}
> > +
> 
> So the idea is that the protocols the other end can talk come in
> advert in their preferred order, and we take an intersection of them
> and our "allowed-versions", but the preference is further skewed
> with the "swap" thing if we have our own preference specified via
> config?

We currently don't intersect with our own allowed list, we just accept
the first version that we recognize. This introduces its own version
negotiation bug; if we add v2 push in the future, a new client
talking to an old server would try to use v2 even though the server may
not have the corresponding v2 push implementation. I'll fix this in the
next version.

In any case, the ordering of the server's allowed versions won't matter;
we'll pick the the first version in the client's list which is also
allowed on the server.

> I am wondering if the code added by this patch outside this
> function, with if (strcmp(client_ad.buf, "version=0") sprinkled all
> over the place, works sensibly when the other side says "I prefer
> version=0 but I do not mind talking version=1".

Depends on what you mean by "sensibly" :). In the current case, if the
client prefers v0, it will always end up using v0. After the fixes
described above, it will always use v0 unless the server no longer
supports v0. Is that what you would expect?

> Isn't tmp_allowed_versions[] leaking when we return from this
> function?

Yes, sorry about that. Will fix.


Re: [PATCH] remote-curl: die on server-side errors

2018-11-13 Thread Josh Steadmon
On 2018.11.13 23:30, Junio C Hamano wrote:
> stead...@google.com writes:
> 
> > When a smart HTTP server sends an error message via pkt-line,
> > remote-curl will fail to detect the error (which usually results in
> > incorrectly falling back to dumb-HTTP mode).
> 
> OK, that is a valid thing to worry about.
> 
> >
> > This patch adds a check in discover_refs() for server-side error
> > messages, as well as a test case for this issue.
> 
> This makes me wonder if discoer_refs() is the only place where we
> ought to be checking for ERR packets but we are not.  Are there
> other places that we also need a similar fix?

Quite possibly; I'll review the other client code to see if there are
similar issues before sending v2.


Re: [PATCH] remote-curl: die on server-side errors

2018-11-13 Thread Josh Steadmon
On 2018.11.13 09:26, Jeff King wrote:
> On Mon, Nov 12, 2018 at 02:44:56PM -0800, stead...@google.com wrote:
> 
> > When a smart HTTP server sends an error message via pkt-line,
> > remote-curl will fail to detect the error (which usually results in
> > incorrectly falling back to dumb-HTTP mode).
> > 
> > This patch adds a check in discover_refs() for server-side error
> > messages, as well as a test case for this issue.
> 
> Aside from the reformatting of the conditional that Junio mentioned,
> this seems pretty good to me. But while looking at that, I found a few
> things, some old and some new. :)
> 
> > diff --git a/remote-curl.c b/remote-curl.c
> > index 762a55a75f..bb3a86505e 100644
> > --- a/remote-curl.c
> > +++ b/remote-curl.c
> > @@ -436,7 +436,9 @@ static struct discovery *discover_refs(const char 
> > *service, int for_push)
> > } else if (maybe_smart &&
> >last->len > 5 && starts_with(last->buf + 4, "version 2")) {
> > last->proto_git = 1;
> > -   }
> > +   } else if (maybe_smart && last->len > 5 &&
> > +  starts_with(last->buf + 4, "ERR "))
> > +   die(_("remote error: %s"), last->buf + 8);
> 
> The magic "4" here and in the existing "version 2" check is because we
> are expecting pkt-lines. The original conditional always calls
> packed_read_line_buf() and will complain if we didn't actually get a
> pkt-line.
> 
> Should we confirm that we got a real packet-line? Or at least that those
> first four are even plausible hex chars?
> 
> I admit that it's pretty unlikely that the server is going to fool us
> here. It would need something like "foo ERRORS ARE FUN!". And even then
> we'd report an error (whereas the correct behavior is falling back to
> dumb http, but we know that won't work anyway because that's not a valid
> ref advertisement). So I doubt this is really a bug per se, but it might
> make it easier to understand what's going on if we actually parsed the
> packet.

Unfortunately we can't just directly parse the data in last->buf,
because other parts of the code are expecting to see the raw pkt-line
data there. I tried adding a duplicate pointer and length variable for
this data and parsing that through packet_read_line_buf(), but even
without using the results this apparently has side-effects that break
all of t5550 (and probably other tests as well). It also fails if I
completely duplicate last->buf into a new char* and call
packet_readline_buf() on that, so there's clearly some interaction in
the pkt-line guts that I'm not properly accounting for.

> Similarly, we seem eager to accept "version 2" even if we are only
> expecting v0. I know you have another series working in that direction,
> but I don't think it touches this "proto_git". I guess accepting
> "version 2" as "we're speaking git protocol" and then barfing later with
> "wait, I didn't mean to speak v2" is probably OK.
> 
> -Peff


Re: [PATCH] remote-curl: die on server-side errors

2018-11-13 Thread Josh Steadmon
On 2018.11.13 12:02, Junio C Hamano wrote:
> stead...@google.com writes:
> 
> > When a smart HTTP server sends an error message via pkt-line,
> > remote-curl will fail to detect the error (which usually results in
> > incorrectly falling back to dumb-HTTP mode).
> >
> > This patch adds a check in discover_refs() for server-side error
> > messages, as well as a test case for this issue.
> >
> > Signed-off-by: Josh Steadmon 
> > ---
> 
> Forgot to mention one procedural comment.
> 
> As you can see in the To: line of this reply, your MUA is placing
> only the e-mail address without name on your From: line.
> 
> Preferrably I'd like to see the same string as your sign-off on the
> "From:" line in your messages for a bit of human touch ;-) Can you
> tweak your MUA to make that happen?
> 
> The second preference is to add an in-body header (i.e. as the first
> line of the body of the message) so that the body of the message
> starts like this:
> 
> From: Josh Steadmon 
> 
> When a smart HTTP server sends an error message via pkt-line,
> remote-curl will fail to detect the error (which usually results in
> incorrectly falling back to dumb-HTTP mode).
> 
> This patch adds a check in discover_refs() for server-side error
> messages, as well as a test case for this issue.
> 
> Signed-off-by: Josh Steadmon 
> ---
>  remote-curl.c   | 4 +++-
>  t/lib-httpd.sh  | 1 +
>  t/lib-httpd/apache.conf | 4 
> 
> Either way would make sure that the resulting patch's author line
> will be attributed correctly to the same identity as who is signing
> it off first as the author.
> 
> Thanks.

This should be fixed for future patch submissions. Thanks for the
heads-up.


Re: [PATCH] Makefile: use CXXFLAGS for linking fuzzers

2018-11-13 Thread Josh Steadmon
On 2018.11.13 12:12, Junio C Hamano wrote:
> stead...@google.com writes:
> 
> > OSS-Fuzz requires C++-specific flags to link fuzzers. Passing these in
> > CFLAGS causes lots of build warnings. Using separate CXXFLAGS avoids
> > this.
> 
> We are not a C++ shop, so allow me to show ignorance about how
> projects that are OSS-Fuzz-enabled work.  Do they use one set of
> CXXFLAGS when compiling the "real thing" and a separate set (perhaps
> one is subset of the other, or perhaps these two just have overlap)
> of CXXFLAGS when building to link with the fuzzer?
> 
> What I am trying to get at is if this should be CXXFLAGS or
> CXX_FUZZER_FLAGS.  If the OSS-Fuzz-enabled C++ projects use one set
> of flags for the "main" part of the project (to produce binaries to
> be run by the end users) and then link with extra flags to work with
> fuzzers, I would imagine that they won't call the latter CXXFLAGS
> but call it something else, and we probably should follow suit if
> that is the case.
> 
> Not that we plan to (re)write the maint part of Git in C++ ever, so
> I am personally OK with sacrificing the most generic CXXFLAGS macro
> for the sole use of OSS-Fuzz linkage, but I'd prefer to leave the
> door open so that other things like OSS-Fuzz that require C++ can be
> added like your work does to the project.
> 
> Thanks.

OSS-Fuzz only provides one set of CXXFLAGS for use on both compiling
project C++ project files as well linking the fuzzers themselves. So in
the event that Git ever added any C++ sources, they would need to use
the same set of CXXFLAGS.

Given that, do you agree with Stefan that it is more intuitive to define
CXXFLAGS next to the fuzzer build rules, since that's the only place
it's used for now?


Re: [PATCH 1/1] apply --recount: allow "no-op hunks"

2018-11-12 Thread Josh Steadmon
On 2018.11.12 12:54, Johannes Schindelin via GitGitGadget wrote:
> From: Johannes Schindelin 
> 
> When editing patches e.g. in `git add -e`, it is quite common that a
> hunk ends up having no -/+ lines, i.e. it is now supposed to do nothing.
> 
> This use case was broken by ad6e8ed37bc1 (apply: reject a hunk that does
> not do anything, 2015-06-01) with the good intention of catching a very
> real, different issue in hand-edited patches.
> 
> So let's use the `--recount` option as the tell-tale whether the user
> would actually be okay with no-op hunks.
> 
> Add a test case to make sure that this use case does not regress again.
> 
> Signed-off-by: Johannes Schindelin 

Looks good to me.

Reviewed-by: Josh Steadmon 


Re: [PATCH] coccicheck: introduce 'pending' semantic patches

2018-11-12 Thread Josh Steadmon
On 2018.11.09 16:10, Stefan Beller wrote:
> From: SZEDER Gábor 
> 
> Teach `make coccicheck` to avoid patches named "*.pending.cocci" and
> handle them separately in a new `make coccicheck-pending` instead.
> This means that we can separate "critical" patches from "FYI" patches.
> The former target can continue causing Travis to fail its static
> analysis job, while the latter can let us keep an eye on ongoing
> (pending) transitions without them causing too much fallout.
> 
> Document the intended use-cases around these two targets.
> As the process around the pending patches is not yet fully explored,
> leave that out.
> 
> Based-on-work-by: SZEDER Gábor 
> Signed-off-by: Stefan Beller 
> Signed-off-by: Junio C Hamano 
> ---
> 
> I dialed back on the workflow, as we may want to explore it first
> before writing it down.
> 
> Stefan

Looks good to me.

Reviewed-by: Josh Steadmon 


Re: [PATCH] upload-pack: fix broken if/else chain in config callback

2018-10-25 Thread Josh Steadmon
On 2018.10.24 10:50, Johannes Schindelin wrote:
> Maybe a lot of explanation, but definitely a good one. The explanation and
> the patch look good to me.
> 
> Thanks,
> Dscho

Agreed, as a newbie I definitely appreciate detailed explanations. Looks
good to me as well.

Reviewed-by: Josh Steadmon 


Re: [PATCH v3] archive: initialize archivers earlier

2018-10-25 Thread Josh Steadmon
On 2018.10.23 13:09, Junio C Hamano wrote:
> stead...@google.com writes:
> 
> > diff --git a/t/t5000-tar-tree.sh b/t/t5000-tar-tree.sh
> > index 2a97b27b0a..cfd5ca492f 100755
> > --- a/t/t5000-tar-tree.sh
> > +++ b/t/t5000-tar-tree.sh
> > @@ -39,6 +39,8 @@ test_lazy_prereq TAR_NEEDS_PAX_FALLBACK '
> >  
> >  test_lazy_prereq GZIP 'gzip --version'
> >  
> > +test_lazy_prereq ZIP 'zip --version'
> > +
> 
> There are a handful of zip implementations; Info-ZIP found on many
> Linux distros does support 'zip --version', but we may want to make
> sure this test covers different implementations of zip sufficiently.
> 
> Queuing this patch (or an update of it) on 'pu' and hoping those
> with zip from different origins to try it would not help very much,
> either, as zip implementations that do not react to "zip --version"
> would silently turn the prereq off without breaking anything.
> 
> In any case, please refrain from adding any ZIP prerequiste to t5000
> which is about tar; t5003-archive-zip may be a much better fit.  It
> has an already working machinery that validates the generated zip
> archive under UNZIP prerequisite, so we may not even have to invent
> our own ZIP prereq if we did so.

Ack. This has been removed in v4. V4 also has a test case in t5003 based
on Jeff's advice.

> > @@ -206,6 +208,19 @@ test_expect_success 'git archive with --output, 
> > override inferred format' '
> > test_cmp_bin b.tar d4.zip
> >  '
> >  
> > +test_expect_success GZIP 'git archive with --output and --remote creates 
> > .tgz' '
> > +   git archive --output=d5.tgz --remote=. HEAD &&
> > +   gzip -d -c < d5.tgz > d5.tar &&
> > +   test_cmp_bin b.tar d5.tar
> > +'
> 
> We try to write redirections without SP between redirection operator
> and target filename, i.e. "gzip -d -c d5.tar".

Fixed in v5.


Re: [PATCH v2] archive: initialize archivers earlier

2018-10-22 Thread Josh Steadmon
On 2018.10.22 20:06, Jeff King wrote:
> On Mon, Oct 22, 2018 at 04:51:27PM -0700, Josh Steadmon wrote:
> 
> > > > +test_expect_success GZIP 'git archive with --output and --remote uses 
> > > > expected format' '
> > > > +   git archive --output=d5.tgz --remote=. HEAD &&
> > > > +   gzip -d -c < d5.tgz > d5.tar &&
> > > > +   test_cmp_bin b.tar d5.tar
> > > > +'
> > > 
> > > This nicely tests the more-interesting tgz case. But unfortunately it
> > > won't run on machines without the GZIP prerequisite. I'd think that
> > > would really be _most_ machines, but is it worth having a separate zip
> > > test to cover machines without gzip? I guess that just creates the
> > > opposite problem: not everybody has ZIP.
> > 
> > Added a test to compare the file lists from the .zip file to the
> > reference .tar file. I'm not sure if this is the best way to do things,
> > but it at least verifies that a .zip is produced. However, it's brittle
> > if the output of "zip -sf" changes. Let me know if you have a better
> > idea.
> 
> I wonder if we could do something more black-box. What we really care
> about here is not the exact output, but rather that "-o foo.zip"
> produces the same output as "--format zip". Could we do that without
> even relying on ZIP?
> 
> I think it should follow even for tgz, because we use "-n" for a
> repeatable output. But there we are relying on an external gzip just to
> _create_ the file, so we'd still need the GZIP prereq.
> 
> Hmm. Looks like we already have a similar test in t5003. So maybe just:
> 
> diff --git a/t/t5003-archive-zip.sh b/t/t5003-archive-zip.sh
> index 55c7870997..cf19f56924 100755
> --- a/t/t5003-archive-zip.sh
> +++ b/t/t5003-archive-zip.sh
> @@ -158,11 +158,16 @@ test_expect_success 'git archive --format=zip with 
> --output' \
>  'git archive --format=zip --output=d2.zip HEAD &&
>  test_cmp_bin d.zip d2.zip'
>  
> -test_expect_success 'git archive with --output, inferring format' '
> +test_expect_success 'git archive with --output, inferring format (local)' '
>   git archive --output=d3.zip HEAD &&
>   test_cmp_bin d.zip d3.zip
>  '
>  
> +test_expect_success 'git archive with --output, ferring format (remote)' '
> + git archive --remote=. --output=d4.zip HEAD &&
> + test_cmp_bin d.zip d4.zip
> +'
> +
>  test_expect_success \
>  'git archive --format=zip with prefix' \
>  'git archive --format=zip --prefix=prefix/ HEAD >e.zip'
> 
> which I think exposes the bug and can run everywhere?

Makes sense, thanks!


Re: [PATCH v2] archive: initialize archivers earlier

2018-10-22 Thread Josh Steadmon
On 2018.10.22 18:35, Jeff King wrote:
> On Mon, Oct 22, 2018 at 02:48:11PM -0700, stead...@google.com wrote:
> 
> > Initialize archivers as soon as possible when running git-archive and
> > git-upload-archive. Various non-obvious behavior depends on having the
> > archivers initialized, such as determining the desired archival format
> > from the provided filename.
> > 
> > Since 08716b3c11 ("archive: refactor file extension format-guessing",
> > 2011-06-21), archive_format_from_filename() has used the registered
> > archivers to match filenames (provided via --output) to archival
> > formats. However, when git-archive is executed with --remote, format
> > detection happens before the archivers have been registered. This causes
> > archives from remotes to always be generated as TAR files, regardless of
> > the actual filename (unless an explicit --format is provided).
> > 
> > This patch fixes that behavior; archival format is determined properly
> > from the output filename, even when --remote is used.
> > 
> > Signed-off-by: Josh Steadmon 
> > Helped-by: Jeff King 
> 
> Thanks, this looks good overall.
> 
> A few minor comments (that I'm not even sure are worth re-rolling for):
> 
> > diff --git a/builtin/upload-archive.c b/builtin/upload-archive.c
> > index 25d9116356..3f35ebcfe8 100644
> > --- a/builtin/upload-archive.c
> > +++ b/builtin/upload-archive.c
> > @@ -43,6 +43,7 @@ int cmd_upload_archive_writer(int argc, const char 
> > **argv, const char *prefix)
> > }
> >  
> > /* parse all options sent by the client */
> > +   init_archivers();
> > return write_archive(sent_argv.argc, sent_argv.argv, prefix,
> >  the_repository, NULL, 1);
> >  }
> 
> This seems to separate the comment from what it describes. Any reason
> not to just init_archivers() closer to the top of the function here
> (probably after the enter_repo() call)?

Ack, fixed.


> > diff --git a/t/t5000-tar-tree.sh b/t/t5000-tar-tree.sh
> > index 2a97b27b0a..3e95fdf660 100755
> > --- a/t/t5000-tar-tree.sh
> > +++ b/t/t5000-tar-tree.sh
> > @@ -206,6 +206,12 @@ test_expect_success 'git archive with --output, 
> > override inferred format' '
> > test_cmp_bin b.tar d4.zip
> >  '
> >  
> > +test_expect_success GZIP 'git archive with --output and --remote uses 
> > expected format' '
> > +   git archive --output=d5.tgz --remote=. HEAD &&
> > +   gzip -d -c < d5.tgz > d5.tar &&
> > +   test_cmp_bin b.tar d5.tar
> > +'
> 
> This nicely tests the more-interesting tgz case. But unfortunately it
> won't run on machines without the GZIP prerequisite. I'd think that
> would really be _most_ machines, but is it worth having a separate zip
> test to cover machines without gzip? I guess that just creates the
> opposite problem: not everybody has ZIP.

Added a test to compare the file lists from the .zip file to the
reference .tar file. I'm not sure if this is the best way to do things,
but it at least verifies that a .zip is produced. However, it's brittle
if the output of "zip -sf" changes. Let me know if you have a better
idea.


Re: [PATCH v2 1/1] protocol: advertise multiple supported versions

2018-10-22 Thread Josh Steadmon
On 2018.10.12 15:30, Stefan Beller wrote:
> On Thu, Oct 11, 2018 at 6:02 PM  wrote:
> >
> > From: Josh Steadmon 
> >
> > Currently the client advertises that it supports the wire protocol
> > version set in the protocol.version config. However, not all services
> > support the same set of protocol versions. When connecting to
> > git-receive-pack, the client automatically downgrades to v0 if
> > config.protocol is set to v2, but this check is not performed for other
> > services.
> >
> > This patch creates a protocol version registry. Individual commands
> > register all the protocol versions they support prior to communicating
> > with a server. Versions should be listed in preference order; the
> > version specified in protocol.version will automatically be moved to the
> > front of the registry.
> >
> > The protocol version registry is passed to remote helpers via the
> > GIT_PROTOCOL environment variable.
> >
> > Clients now advertise the full list of registered versions. Servers
> > select the first recognized version from this advertisement.
> 
> I like this patch from the users point of view (i.e. inside fetch/push etc),
> and I need to through the details in connect/protocol as that seems
> to be a lot of new code, but hides the complexity very well.
> 
> > +   register_allowed_protocol_version(protocol_v2);
> > +   register_allowed_protocol_version(protocol_v1);
> > +   register_allowed_protocol_version(protocol_v0);
> 
> Would it make sense to have a
> 
> register_allowed_protocol_versions(protocol_v2, protocol_v1,
> protocol_v0, NULL);
> 
> similar to argv_array_pushl  or would that be overengineered?

Hmm, well it wouldn't be quite as clean as the argv_* versions, since we
can't take the pointer of an enum value, so we don't have a good
stop-value for the va_list. I suppose we could use
protocol_unknown_version, but that seems unintuitive to me. We could
also pass in an explicit argument count, but... ugh.

Am I missing some other way to do this cleanly? I'll admit I'm not very
familiar with va_lists.

> > @@ -1085,10 +1085,9 @@ static struct child_process *git_connect_git(int 
> > fd[2], char *hostandport,
> > target_host, 0);
> >
> > /* If using a new version put that stuff here after a second null 
> > byte */
> > -   if (version > 0) {
> > +   if (strcmp(version_advert->buf, "version=0")) {
> > strbuf_addch(, '\0');
> > -   strbuf_addf(, "version=%d%c",
> > -   version, '\0');
> > +   strbuf_addf(, "%s%c", version_advert->buf, '\0');
> 
> It's a bit unfortunate to pass around the string, but reading through
> the following
> lines seems like it is easiest.
> 
> 
> > @@ -1226,16 +1226,10 @@ struct child_process *git_connect(int fd[2], const 
> > char *url,
> >  {
> > char *hostandport, *path;
> > struct child_process *conn;
> > +   struct strbuf version_advert = STRBUF_INIT;
> > enum protocol protocol;
> > -   enum protocol_version version = get_protocol_version_config();
> >
> > -   /*
> > -* NEEDSWORK: If we are trying to use protocol v2 and we are 
> > planning
> > -* to perform a push, then fallback to v0 since the client doesn't 
> > know
> > -* how to push yet using v2.
> > -*/
> > -   if (version == protocol_v2 && !strcmp("git-receive-pack", prog))
> > -   version = protocol_v0;
> > +   get_client_protocol_version_advertisement(_advert);
> 
> Nice to have this special casing gone!
> 
> > diff --git a/protocol.c b/protocol.c
> > index 5e636785d1..f788269c47 100644
> > +
> > +static const char protocol_v0_string[] = "0";
> > +static const char protocol_v1_string[] = "1";
> > +static const char protocol_v2_string[] = "2";
> > +
> ...
> > +/* Return the text representation of a wire protocol version. */
> > +static const char *format_protocol_version(enum protocol_version version)
> > +{
> > +   switch (version) {
> > +   case protocol_v0:
> > +   return protocol_v0_string;
> > +   case protocol_v1:
> > +   return protocol_v1_string;
> > +   case protocol_v2:
> > +   return protocol_v2_string;
> > +   case protocol_unknown_version:
> > +   die(_("Unrecognized protocol version"));
> > +   }
> > +   die

Re: [PATCH 1/1] archive: init archivers before determining format

2018-10-22 Thread Josh Steadmon
On 2018.10.19 19:59, Jeff King wrote:
> On Fri, Oct 19, 2018 at 04:19:28PM -0700, stead...@google.com wrote:
> 
> > diff --git a/builtin/archive.c b/builtin/archive.c
> > index e74f675390..dd3283a247 100644
> > --- a/builtin/archive.c
> > +++ b/builtin/archive.c
> > @@ -45,7 +45,10 @@ static int run_remote_archiver(int argc, const char 
> > **argv,
> >  * it.
> >  */
> > if (name_hint) {
> > -   const char *format = archive_format_from_filename(name_hint);
> > +   const char *format;
> > +   init_tar_archiver();
> > +   init_zip_archiver();
> > +   format = archive_format_from_filename(name_hint);
> > if (format)
> > packet_write_fmt(fd[1], "argument --format=%s\n", 
> > format);
> 
> Hrm. This code was added back in 56baa61d01 (archive: move file
> extension format-guessing lower, 2011-06-21), and your example
> invocation worked back then!
> 
> Unfortunately it was broken by the very next patch in the series,
> 08716b3c11 (archive: refactor file extension format-guessing,
> 2011-06-21). I guess that's what I get for not adding regression tests.
> 
> It's probably worth mentioning those points in the commit message.

Done.

> Does this work with configured archiver extensions, too? I think so,
> because we load them via init_tar_archiver().

If you mean things like .tgz and .tar.gz, then yes, they are affected by
the bug as well, and this patch fixes them. The test included in v2 uses
a .tgz file.

> Can we avoid repeating the list of archivers here? This needs to stay in
> sync with the list in write_archive(). I know there are only two, but
> can we factor out an init_archivers() call or something?

Done.

> We also should probably just call it unconditionally when we start the
> archiver command (I don't think there are any other bugs like this
> lurking, but it doesn't cost very much to initialize these; it makes
> sense to just do it early).

Done.

> Other than those minor points (and the lack of test), your fix looks
> good to me.

Thanks for the review!


Re: [PATCH 0/1] Fix format detection when archiving remotely

2018-10-22 Thread Josh Steadmon
On 2018.10.19 19:41, Jeff King wrote:
> On Fri, Oct 19, 2018 at 04:19:27PM -0700, stead...@google.com wrote:
> 
> > Currently, git-archive does not properly determine the desired archive
> > format when both --output and --remote are provided, because
> > run_remote_archiver() does not initialize the archivers prior to calling
> > archive_format_from_filename(). This results in the remote archiver
> > always returning a TAR file, regardless of the requested format.
> > 
> > This patch initializes the TAR and ZIP archivers before calling
> > archive_format_from_filename(), which fixes format detection.
> 
> It seems like some of this content could be in the commit message of the
> actual patch.

Ack. I'll be sending v2 shortly, please let me know if I've missed
anything that should be included.

> > Steps to reproduce:
> > 
> > ∫ git version
> > git version 2.19.1.568.g152ad8e336-goog
> > ∫ cd ~/src/git
> > ∫ git archive --output ~/good.zip HEAD
> > ∫ file ~/good.zip
> > /home/steadmon/good.zip: Zip archive data, at least v1.0 to extract
> > ∫ git archive --output ~/bad.zip --remote=. HEAD
> > ∫ file ~/bad.zip
> > /home/steadmon/bad.zip: POSIX tar archive
> 
> And this could be in a test script in the actual patch. :)

Done.


Re: What's cooking in git.git (Oct 2018, #02; Sat, 13)

2018-10-16 Thread Josh Steadmon
On 2018.10.12 23:53, Junio C Hamano wrote:
> * js/remote-archive-v2 (2018-09-28) 4 commits
>   (merged to 'next' on 2018-10-12 at 5f34377f60)
>  + archive: allow archive over HTTP(S) with proto v2
>  + archive: implement protocol v2 archive command
>  + archive: use packet_reader for communications
>  + archive: follow test standards around assertions
> 
>  The original implementation of "git archive --remote" more or less
>  bypassed the transport layer and did not work over http(s).  The
>  version 2 of the protocol is defined to allow going over http(s) as
>  well as Git native transport.
> 
>  Will merge to 'master'.

The first two patches (test cleanup and packet_reader refactor) are OK,
but the latter two will break the archive command when an old client
attempts to talk to a new server (due to the version advertisement
problem noted in [1]). Sorry that I didn't catch that these had made it
into next.

[1]: https://public-inbox.org/git/20180927223314.ga230...@google.com/#t


Re: [RFC PATCH 1/2] fuzz: Add basic fuzz testing target.

2018-10-12 Thread Josh Steadmon
On 2018.10.10 11:14, Junio C Hamano wrote:
> Josh Steadmon  writes:
> 
> > +FUZZ_OBJS += fuzz-pack-headers.o
> > +
> > +FUZZ_PROGRAMS += $(patsubst %.o,%,$(FUZZ_OBJS))
> > +
> > ...
> > +### Fuzz testing
> > +#
> > +.PHONY: fuzz-clean fuzz-objs fuzz-compile
> 
> I take it that you anticipate the fuzz programs in the future all
> be named fuzz-$(blah), whose source is fuzz-$(blah).o (even though
> we may grow some common code that may be linked with them, which can
> be done by tweaking the rule for the $(FUZZ_PROGRAMS) target).  Am I
> reading you correctly?  Would fuzz-{clean,objs,compile} risk squatting
> on nicer names we may want to use for $(blah) down the line?

Yes, that's correct. I've reworked the rules to be more compatible with
how OSS-Fuzz expects to build these targets, and now "fuzz-all" is
the only remaining special target.

> > + ...
> > +$(FUZZ_PROGRAMS): fuzz-compile
> > +   clang++ $(FUZZ_LDFLAGS) $(LIB_OBJS) $(BUILTIN_OBJS) $(XDIFF_OBJS) \
> > +   $(EXTLIBS) git.o $@.o /usr/lib/llvm-4.0/lib/libFuzzer.a -o $@
> 
> Is the expected usage pattern to know a single fuzz-* program the
> builder wants to build, to run "make fuzz-pack-headers"?  If not, it
> also would be a good idea to have something like
> 
> fuzz-build-all:: $(FUZZ_PROGRAMS)
> .PHONY: fuzz-build-all
> 
> perhaps?
> 
> Also, in the final version we unleash to general developer audience,
> we'd want to support "make V=1" (and "make" that is "$(QUIET)").

Done and done.


Re: [PATCH 1/1] protocol: limit max protocol version per service

2018-10-12 Thread Josh Steadmon
On 2018.10.12 16:32, Jonathan Nieder wrote:
> Josh Steadmon wrote:
> 
> > For now, I'm going to try adding an --allowed_versions flag for the
> > remote helpers, but if anyone has a better idea, let me know.
> 
> I forgot to mention: the stateless-connect remote helper capability is
> still experimental so you're free to change its interface (e.g. to
> change the syntax of the stateless-connect command it provides).

For v2 of this patch, I ended up using GIT_PROTOCOL to pass the version
string to the remote helpers.


Re: [PATCH 1/1] protocol: limit max protocol version per service

2018-10-10 Thread Josh Steadmon
On 2018.10.05 12:25, Stefan Beller wrote:
> > > > I suppose if we are strict about serving from a single endpoint, the
> > > > version registry makes more sense, and individual operations can declare
> > > > acceptable version numbers before calling any network code?
> > >
> > > Ah yeah, that makes sense!
> >
> > Thinking out loud here. Please let me know if I say something stupid :)
> >
> > So we'll have (up to) three pieces of version information we'll care
> > about for version negotiation:
> >
> > 1. (maybe) a client-side protocol.version config entry
> 
> (and in case we don't, we have it implicit ly hardcoded, as
> we have to choose the default for users that don't care themselves about
> this setting)
> 
> > 2. a list of acceptable proto versions for the currently running
> >operation on the client
> > 3. a list of acceptable proto versions for the server endpoint that
> >handles the request
> 
> Yes that matches my understanding. The issue is between (1) and (2)
> as (1) is in a generic config, whereas (2) is specific to the command,
> such that it may differ. And as a user you may want to express things
> like: "use the highest version", which is done by setting (1) to "version 2"
> despite (2) not having support of all commands for v2.
> 
> > According to the doc on protocol.version: "If set, clients will attempt
> > to communicate with a server using the specified protocol version. If
> > unset, no attempt will be made by the client to communicate using a
> > particular protocol version, this results in protocol version 0 being
> > used."
> >
> > So, if protocol.version is not set, or set to 0, the client should not
> > attempt any sort of version negotiation.
> 
> Yes, as version 0 is unaware of versions, i.e. there are old installations
> out there where all the versioning code is not there, so in case of an
> old client the new server *must* speak v0 to be able to communicate
> (and vice versa).
> 
> 
> > Otherwise, the client prefers a
> > particular version, but we don't guarantee that they will actually use
> > that version after the (unspecified) version negotiation procedure.
> >
> > If protocol.version is set to something other than 0, we construct a
> > list of acceptable versions for the given operation. If the
> > protocol.version entry is present in that list, we move it to the front
> > of the list to note that it is the preferred version. We send all of
> > these, in order, in the request.
> >
> > When the server endpoint begins to handle a request, it first constructs
> > a list of acceptable versions. If the client specifies a list of
> > versions, we check them one-by-one to see if they are acceptable. If we
> > find a match, we use that version. If we don't find any matches or if
> > the client did not send a version list, we default to v0.
> >
> > Seem reasonable?
> 
> This sounds super reasonable!

So this runs into problems with remote-curl (and possibly other remote
helpers):

builtin/push.c can declare whatever allowed versions it wants, but those
are not carried over when remote-curl is started to actually talk to the
remote. What's worse, remote-curl starts its HTTP connection before it
knows what command it's actually acting as a helper for.

For now, I'm going to try adding an --allowed_versions flag for the
remote helpers, but if anyone has a better idea, let me know.


Thanks,
Josh


Re: [PATCH 1/1] protocol: limit max protocol version per service

2018-10-04 Thread Josh Steadmon
On 2018.10.03 15:47, Stefan Beller wrote:
> On Wed, Oct 3, 2018 at 2:34 PM Josh Steadmon  wrote:
> >
> > Is there a method or design for advertising multiple acceptable versions
> > from the client?
> 
> I think the client can send multiple versions, looking through protocol.c
> (and not the documentation as I should for this:)
> 
>   /*
>* Determine which protocol version the client has requested.  Since
>* multiple 'version' keys can be sent by the client, indicating that
>* the client is okay to speak any of them, select the greatest version
>* that the client has requested.  This is due to the assumption that
>* the most recent protocol version will be the most state-of-the-art.
>*/
> ...
> const char *git_protocol = getenv(GIT_PROTOCOL_ENVIRONMENT);
> string_list_split(, git_protocol, ':', -1);
> ...
> for_each_string_list_item(item, ) {
> ...
> if (skip_prefix(item->string, "version=", ))
> ...
> 
> in determine_protocol_version_server which already had the client
> speak to it, so I think at least the server can deal with multiple versions.
> 
> But given that we transport the version in env variables, we'd
> need to be extra careful if we just did not see the version
> in the --remote=. above?

Sorry, I'm not sure I understand this. What about env variables requires
caution?


> > From my understanding, we can only add a single
> > version=X field in the advertisement, but IIUC we can extend this fairly
> > easily? Perhaps we can have "version=X" to mean the preferred version,
> > and then a repeatable "acceptable_version=Y" field or similar?
> 
> Just re-use "version X", separated by colons as above.
> 
> > > From a maintenance perspective, do we want to keep
> > > this part of the code central, as it ties protocol (as proxied
> > > by service name) to the max version number?
> > > I would think that we'd rather have the decision local to the
> > > code, i.e. builtin/fetch would need to tell protocol.c that it
> > > can do (0,1,2) and builtin/push can do (0,1), and then the
> > > networking layers of code would figure out by the input
> > > from the caller and the input from the user (configured
> > > protocol.version) what is the best to go forward from
> > > then on.
> >
> > I like having it centralized, because enforcing this in git_connect()
> > and discover_refs() catches all the outgoing version advertisements, but
> > there's lots of code paths that lead to those two functions that would
> > all have to have the acceptable version numbers plumbed through.
> 
> Makes sense.
> 
> > I suppose we could also have a registry of services to version numbers,
> > but I tend to dislike non-local sources of data. But if the list likes
> > that approach better, I'll be happy to implement it.
> 
> > > But I guess having the central place here is not to
> > > bad either. How will it cope with the desire of protocol v2
> > > to have only one end point (c.f. serve.{c,h} via builtin/serve
> > > as "git serve") ?
> >
> > I'm not sure about this. In my series to add a v2 archive command, I
> > added support for a new endpoint for proto v2 and I don't recall seeing
> > any complaints, but that is still open for review.
> 
> Ah I guess new end points would imply that you can speak at least
> a given version N.
> 
> > I suppose if we are strict about serving from a single endpoint, the
> > version registry makes more sense, and individual operations can declare
> > acceptable version numbers before calling any network code?
> 
> Ah yeah, that makes sense!

Thinking out loud here. Please let me know if I say something stupid :)

So we'll have (up to) three pieces of version information we'll care
about for version negotiation:

1. (maybe) a client-side protocol.version config entry
2. a list of acceptable proto versions for the currently running
   operation on the client
3. a list of acceptable proto versions for the server endpoint that
   handles the request

According to the doc on protocol.version: "If set, clients will attempt
to communicate with a server using the specified protocol version. If
unset, no attempt will be made by the client to communicate using a
particular protocol version, this results in protocol version 0 being
used."

So, if protocol.version is not set, or set to 0, the client should not
attempt any sort of version negotiation. Otherwise, the client prefers a
particular version, but we don't guarantee that they will actually use
that version after the (unspecified) version nego

[RFC PATCH 2/2] fuzz: Add fuzz testing for packfile indices.

2018-10-04 Thread Josh Steadmon
Signed-off-by: Josh Steadmon 
---
 .gitignore  |  1 +
 Makefile|  5 -
 fuzz-pack-idx.c | 13 +
 packfile.c  | 44 +---
 packfile.h  | 13 +
 5 files changed, 56 insertions(+), 20 deletions(-)
 create mode 100644 fuzz-pack-idx.c

diff --git a/.gitignore b/.gitignore
index 87a28b3115..64b3377d40 100644
--- a/.gitignore
+++ b/.gitignore
@@ -1,5 +1,6 @@
 /fuzz_corpora
 /fuzz-pack-headers
+/fuzz-pack-idx
 /GIT-BUILD-OPTIONS
 /GIT-CFLAGS
 /GIT-LDFLAGS
diff --git a/Makefile b/Makefile
index 10bb82b115..f7e6be8f19 100644
--- a/Makefile
+++ b/Makefile
@@ -685,6 +685,7 @@ SCRIPTS = $(SCRIPT_SH_INS) \
 ETAGS_TARGET = TAGS
 
 FUZZ_OBJS += fuzz-pack-headers.o
+FUZZ_OBJS += fuzz-pack-idx.o
 
 FUZZ_PROGRAMS += $(patsubst %.o,%,$(FUZZ_OBJS))
 
@@ -3071,7 +3072,7 @@ cover_db_html: cover_db
 
 ### Fuzz testing
 #
-.PHONY: fuzz-clean fuzz-objs fuzz-compile
+.PHONY: fuzz-clean fuzz-objs fuzz-compile fuzz-all
 
 FUZZ_CFLAGS = $(CFLAGS) -fsanitize-coverage=trace-pc-guard -fsanitize=address
 FUZZ_LDFLAGS = $(FUZZ_CFLAGS)
@@ -3089,3 +3090,5 @@ fuzz-compile:
 $(FUZZ_PROGRAMS): fuzz-compile
clang++ $(FUZZ_LDFLAGS) $(LIB_OBJS) $(BUILTIN_OBJS) $(XDIFF_OBJS) \
$(EXTLIBS) git.o $@.o /usr/lib/llvm-4.0/lib/libFuzzer.a -o $@
+
+fuzz-all: $(FUZZ_PROGRAMS)
diff --git a/fuzz-pack-idx.c b/fuzz-pack-idx.c
new file mode 100644
index 00..0c3d777aac
--- /dev/null
+++ b/fuzz-pack-idx.c
@@ -0,0 +1,13 @@
+#include "object-store.h"
+#include "packfile.h"
+
+int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size);
+
+int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size)
+{
+   struct packed_git p;
+
+   load_idx("fuzz-input", GIT_SHA1_RAWSZ, (void *)data, size, );
+
+   return 0;
+}
diff --git a/packfile.c b/packfile.c
index 841b36182f..86074a76e9 100644
--- a/packfile.c
+++ b/packfile.c
@@ -80,10 +80,8 @@ void pack_report(void)
 static int check_packed_git_idx(const char *path, struct packed_git *p)
 {
void *idx_map;
-   struct pack_idx_header *hdr;
size_t idx_size;
-   uint32_t version, nr, i, *index;
-   int fd = git_open(path);
+   int fd = git_open(path), ret;
struct stat st;
const unsigned int hashsz = the_hash_algo->rawsz;
 
@@ -101,16 +99,32 @@ static int check_packed_git_idx(const char *path, struct 
packed_git *p)
idx_map = xmmap(NULL, idx_size, PROT_READ, MAP_PRIVATE, fd, 0);
close(fd);
 
-   hdr = idx_map;
+   ret = load_idx(path, hashsz, idx_map, idx_size, p);
+
+   if (ret)
+   munmap(idx_map, idx_size);
+
+   return ret;
+}
+
+int load_idx(const char *path, const unsigned int hashsz, void *idx_map,
+size_t idx_size, struct packed_git *p)
+{
+   struct pack_idx_header *hdr = idx_map;
+   uint32_t version, nr, i, *index;
+
+   if (idx_size < 4 * 256 + hashsz + hashsz)
+   return error("index file %s is too small", path);
+   if (idx_map == NULL)
+   return error("empty data");
+
if (hdr->idx_signature == htonl(PACK_IDX_SIGNATURE)) {
version = ntohl(hdr->idx_version);
-   if (version < 2 || version > 2) {
-   munmap(idx_map, idx_size);
+   if (version < 2 || version > 2)
return error("index file %s is version %"PRIu32
 " and is not supported by this binary"
 " (try upgrading GIT to a newer version)",
 path, version);
-   }
} else
version = 1;
 
@@ -120,10 +134,8 @@ static int check_packed_git_idx(const char *path, struct 
packed_git *p)
index += 2;  /* skip index header */
for (i = 0; i < 256; i++) {
uint32_t n = ntohl(index[i]);
-   if (n < nr) {
-   munmap(idx_map, idx_size);
+   if (n < nr)
return error("non-monotonic index %s", path);
-   }
nr = n;
}
 
@@ -135,10 +147,8 @@ static int check_packed_git_idx(const char *path, struct 
packed_git *p)
 *  - hash of the packfile
 *  - file checksum
 */
-   if (idx_size != 4*256 + nr * (hashsz + 4) + hashsz + hashsz) {
-   munmap(idx_map, idx_size);
+   if (idx_size != 4 * 256 + nr * (hashsz + 4) + hashsz + hashsz)
return error("wrong index v1 file size in %s", path);
-   }
} else if (version == 2) {
/*
 * Minimum size:
@@ -157,20 +167,16 @@ static int check_packed_git_idx(const char *path, struct 
packed_git *p)
unsigned long

[RFC PATCH 0/2] add fuzzing targets for use with LLVM libFuzzer

2018-10-04 Thread Josh Steadmon
libFuzzer[1] is a fuzzing engine included in recent versions of LLVM. It
is used by OSS-Fuzz[2] for continuous fuzzing of OSS projects.

This series adds two basic fuzzing targets covering packfile header and
index code. It is not particularly portable, and requires the use of
LLVM v4.0 (the latest version available on my workstation). I would
particularly appreciate advice on how to make the Makefile more
portable.

[1]: https://llvm.org/docs/LibFuzzer.html
[2]: https://github.com/google/oss-fuzz

Josh Steadmon (2):
  fuzz: Add basic fuzz testing target.
  fuzz: Add fuzz testing for packfile indices.

 .gitignore  |  3 +++
 Makefile| 33 -
 fuzz-pack-headers.c | 14 ++
 fuzz-pack-idx.c | 13 +
 packfile.c  | 44 +---
 packfile.h  | 13 +
 6 files changed, 100 insertions(+), 20 deletions(-)
 create mode 100644 fuzz-pack-headers.c
 create mode 100644 fuzz-pack-idx.c

-- 
2.19.0.605.g01d371f741-goog



[RFC PATCH 1/2] fuzz: Add basic fuzz testing target.

2018-10-04 Thread Josh Steadmon
Signed-off-by: Josh Steadmon 
---
 .gitignore  |  2 ++
 Makefile| 30 +-
 fuzz-pack-headers.c | 14 ++
 3 files changed, 45 insertions(+), 1 deletion(-)
 create mode 100644 fuzz-pack-headers.c

diff --git a/.gitignore b/.gitignore
index 9d1363a1eb..87a28b3115 100644
--- a/.gitignore
+++ b/.gitignore
@@ -1,3 +1,5 @@
+/fuzz_corpora
+/fuzz-pack-headers
 /GIT-BUILD-OPTIONS
 /GIT-CFLAGS
 /GIT-LDFLAGS
diff --git a/Makefile b/Makefile
index 13e1c52478..10bb82b115 100644
--- a/Makefile
+++ b/Makefile
@@ -590,6 +590,8 @@ XDIFF_OBJS =
 VCSSVN_OBJS =
 GENERATED_H =
 EXTRA_CPPFLAGS =
+FUZZ_OBJS =
+FUZZ_PROGRAMS =
 LIB_OBJS =
 PROGRAM_OBJS =
 PROGRAMS =
@@ -682,6 +684,10 @@ SCRIPTS = $(SCRIPT_SH_INS) \
 
 ETAGS_TARGET = TAGS
 
+FUZZ_OBJS += fuzz-pack-headers.o
+
+FUZZ_PROGRAMS += $(patsubst %.o,%,$(FUZZ_OBJS))
+
 # Empty...
 EXTRA_PROGRAMS =
 
@@ -2250,6 +2256,7 @@ TEST_OBJS := $(patsubst %$X,%.o,$(TEST_PROGRAMS)) 
$(patsubst %,t/helper/%,$(TEST
 OBJECTS := $(LIB_OBJS) $(BUILTIN_OBJS) $(PROGRAM_OBJS) $(TEST_OBJS) \
$(XDIFF_OBJS) \
$(VCSSVN_OBJS) \
+   $(FUZZ_OBJS) \
common-main.o \
git.o
 ifndef NO_CURL
@@ -2931,7 +2938,7 @@ profile-clean:
 cocciclean:
$(RM) contrib/coccinelle/*.cocci.patch*
 
-clean: profile-clean coverage-clean cocciclean
+clean: profile-clean coverage-clean cocciclean fuzz-clean
$(RM) *.res
$(RM) $(OBJECTS)
$(RM) $(LIB_FILE) $(XDIFF_LIB) $(VCSSVN_LIB)
@@ -3061,3 +3068,24 @@ cover_db: coverage-report
 cover_db_html: cover_db
cover -report html -outputdir cover_db_html cover_db
 
+
+### Fuzz testing
+#
+.PHONY: fuzz-clean fuzz-objs fuzz-compile
+
+FUZZ_CFLAGS = $(CFLAGS) -fsanitize-coverage=trace-pc-guard -fsanitize=address
+FUZZ_LDFLAGS = $(FUZZ_CFLAGS)
+
+
+fuzz-clean:
+   $(RM) $(FUZZ_PROGRAMS) $(FUZZ_OBJS)
+
+fuzz-objs: $(FUZZ_OBJS)
+
+fuzz-compile:
+   $(MAKE) CC=clang LD=clang CFLAGS="$(FUZZ_CFLAGS)" \
+   LDFLAGS="$(FUZZ_LDFLAGS)" all fuzz-objs
+
+$(FUZZ_PROGRAMS): fuzz-compile
+   clang++ $(FUZZ_LDFLAGS) $(LIB_OBJS) $(BUILTIN_OBJS) $(XDIFF_OBJS) \
+   $(EXTLIBS) git.o $@.o /usr/lib/llvm-4.0/lib/libFuzzer.a -o $@
diff --git a/fuzz-pack-headers.c b/fuzz-pack-headers.c
new file mode 100644
index 00..99da1d0fd3
--- /dev/null
+++ b/fuzz-pack-headers.c
@@ -0,0 +1,14 @@
+#include "packfile.h"
+
+int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size);
+
+int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size)
+{
+   enum object_type type;
+   unsigned long len;
+
+   unpack_object_header_buffer((const unsigned char *)data,
+   (unsigned long)size, , );
+
+   return 0;
+}
-- 
2.19.0.605.g01d371f741-goog



Re: [PATCH 1/1] protocol: limit max protocol version per service

2018-10-03 Thread Josh Steadmon
On 2018.10.02 15:28, Stefan Beller wrote:
> On Tue, Oct 2, 2018 at 3:00 PM Josh Steadmon  wrote:
> >
> > For services other than git-receive-pack, clients currently advertise
> > that they support the version set in the protocol.version config,
> > regardless of whether or not there is actually an implementation of that
> > service for the given protocol version. This causes backwards-
> > compatibility problems when a new implementation for the given
> > protocol version is added.
> >
> > This patch sets maximum allowed protocol versions for git-receive-pack,
> > git-upload-archive, and git-upload-pack.
> >
> > Previously, git-receive-pack would downgrade from v2 to v0, but would
> > allow v1 if set in protocol.version. Now, it will downgrade from v2 to
> > v1.
> 
> But does git-receive-pack understand v1?
> As to my understanding we have not even defined v1
> for push (receive-pack) and archive --remote (upload-archive).
> v1 is only known to fetch (upload-pack).
> 
> > +enum protocol_version determine_maximum_protocol_version(
> > +   const char *service, enum protocol_version default_version)
> > +{
> > +   if (!strcmp(service, "git-receive-pack"))
> > +   return protocol_v1;
> > +   else if (!strcmp(service, "git-upload-archive"))
> > +   return protocol_v1;
> 
> so I would think these two would be _v0.
> ... goes and checks ...
> aa9bab29b8 (upload-pack, receive-pack: introduce protocol version 1,
> 2017-10-16) seems to actually teach v1 to receive-pack as well,
> but upload-archive was completely off radar, so I think returning
> (v1, v0, v2 in the order as in the code) would make sense?

I believe that git-upload-archive can still speak version 1 without any
trouble, but it at least doesn't break anything in the test suite to
limit this to v0 either.


> Asides from this, I thought there was a deliberate decision
> that we'd want to avoid a strict order on the protocol versions,
> but I could not find prior discussion on list to back up this claim. :/
> 
> For example we'd go with e.g. enums instead of integers
> for version numbers, as then some internal setup could
> also have things like protocol_v2018-10-02 or protocol_vWhatever;
> some protocol version may be advantageous to the client, some to
> the server, and we'd need to negotiate the best version that both
> are happy with. (e.g. the server may like version 0, 2 and 3, and
> the client may like 0,2,4 as 3 is bad security wise for the client,
> so both would negotiate to 2 as their best case)

Is there a method or design for advertising multiple acceptable versions
from the client? From my understanding, we can only add a single
version=X field in the advertisement, but IIUC we can extend this fairly
easily? Perhaps we can have "version=X" to mean the preferred version,
and then a repeatable "acceptable_version=Y" field or similar?


> From a maintenance perspective, do we want to keep
> this part of the code central, as it ties protocol (as proxied
> by service name) to the max version number?
> I would think that we'd rather have the decision local to the
> code, i.e. builtin/fetch would need to tell protocol.c that it
> can do (0,1,2) and builtin/push can do (0,1), and then the
> networking layers of code would figure out by the input
> from the caller and the input from the user (configured
> protocol.version) what is the best to go forward from
> then on.

I like having it centralized, because enforcing this in git_connect()
and discover_refs() catches all the outgoing version advertisements, but
there's lots of code paths that lead to those two functions that would
all have to have the acceptable version numbers plumbed through.

I suppose we could also have a registry of services to version numbers,
but I tend to dislike non-local sources of data. But if the list likes
that approach better, I'll be happy to implement it.


> But I guess having the central place here is not to
> bad either. How will it cope with the desire of protocol v2
> to have only one end point (c.f. serve.{c,h} via builtin/serve
> as "git serve") ?

I'm not sure about this. In my series to add a v2 archive command, I
added support for a new endpoint for proto v2 and I don't recall seeing
any complaints, but that is still open for review.

I suppose if we are strict about serving from a single endpoint, the
version registry makes more sense, and individual operations can declare
acceptable version numbers before calling any network code?



Thanks for the review,
Josh


[PATCH 1/1] protocol: limit max protocol version per service

2018-10-02 Thread Josh Steadmon
For services other than git-receive-pack, clients currently advertise
that they support the version set in the protocol.version config,
regardless of whether or not there is actually an implementation of that
service for the given protocol version. This causes backwards-
compatibility problems when a new implementation for the given
protocol version is added.

This patch sets maximum allowed protocol versions for git-receive-pack,
git-upload-archive, and git-upload-pack.

Previously, git-receive-pack would downgrade from v2 to v0, but would
allow v1 if set in protocol.version. Now, it will downgrade from v2 to
v1.

Signed-off-by: Josh Steadmon 
---
 connect.c | 11 ---
 protocol.c| 13 +
 protocol.h|  7 +++
 remote-curl.c | 11 ---
 4 files changed, 28 insertions(+), 14 deletions(-)

diff --git a/connect.c b/connect.c
index 94547e5056..4a8cd78239 100644
--- a/connect.c
+++ b/connect.c
@@ -1228,14 +1228,11 @@ struct child_process *git_connect(int fd[2], const char 
*url,
struct child_process *conn;
enum protocol protocol;
enum protocol_version version = get_protocol_version_config();
+   enum protocol_version max_version;
 
-   /*
-* NEEDSWORK: If we are trying to use protocol v2 and we are planning
-* to perform a push, then fallback to v0 since the client doesn't know
-* how to push yet using v2.
-*/
-   if (version == protocol_v2 && !strcmp("git-receive-pack", prog))
-   version = protocol_v0;
+   max_version = determine_maximum_protocol_version(prog, version);
+   if (version > max_version)
+   version = max_version;
 
/* Without this we cannot rely on waitpid() to tell
 * what happened to our children.
diff --git a/protocol.c b/protocol.c
index 5e636785d1..1c553d8b99 100644
--- a/protocol.c
+++ b/protocol.c
@@ -79,3 +79,16 @@ enum protocol_version 
determine_protocol_version_client(const char *server_respo
 
return version;
 }
+
+enum protocol_version determine_maximum_protocol_version(
+   const char *service, enum protocol_version default_version)
+{
+   if (!strcmp(service, "git-receive-pack"))
+   return protocol_v1;
+   else if (!strcmp(service, "git-upload-archive"))
+   return protocol_v1;
+   else if (!strcmp(service, "git-upload-pack"))
+   return protocol_v2;
+
+   return default_version;
+}
diff --git a/protocol.h b/protocol.h
index 2ad35e433c..eabc0c5fab 100644
--- a/protocol.h
+++ b/protocol.h
@@ -31,4 +31,11 @@ extern enum protocol_version 
determine_protocol_version_server(void);
  */
 extern enum protocol_version determine_protocol_version_client(const char 
*server_response);
 
+/*
+ * Used by a client to determine the maximum protocol version to advertise for 
a
+ * particular service. If the service is unrecognized, return default_version.
+ */
+extern enum protocol_version determine_maximum_protocol_version(
+   const char *service, enum protocol_version default_version);
+
 #endif /* PROTOCOL_H */
diff --git a/remote-curl.c b/remote-curl.c
index fb28309e85..028adb76ae 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -344,6 +344,7 @@ static struct discovery *discover_refs(const char *service, 
int for_push)
int http_ret, maybe_smart = 0;
struct http_get_options http_options;
enum protocol_version version = get_protocol_version_config();
+   enum protocol_version max_version;
 
if (last && !strcmp(service, last->service))
return last;
@@ -360,13 +361,9 @@ static struct discovery *discover_refs(const char 
*service, int for_push)
strbuf_addf(_url, "service=%s", service);
}
 
-   /*
-* NEEDSWORK: If we are trying to use protocol v2 and we are planning
-* to perform a push, then fallback to v0 since the client doesn't know
-* how to push yet using v2.
-*/
-   if (version == protocol_v2 && !strcmp("git-receive-pack", service))
-   version = protocol_v0;
+   max_version = determine_maximum_protocol_version(service, version);
+   if (version > max_version)
+   version = max_version;
 
/* Add the extra Git-Protocol header */
if (get_protocol_http_header(version, _header))
-- 
2.19.0.605.g01d371f741-goog



[PATCH 0/1] Limit client version advertisements

2018-10-02 Thread Josh Steadmon
As discussed in [1], clients will incorrectly advertise support for
protocol version 2 even when the service in question does not have a v2
implementation. This patch sets maximum protocol versions for
git-receive-pack, git-upload-archive, and git-upload-pack.

[1]: https://public-inbox.org/git/20180927012455.234876-1-stead...@google.com/

Josh Steadmon (1):
  protocol: limit max protocol version per service

 connect.c | 11 ---
 protocol.c| 13 +
 protocol.h|  7 +++
 remote-curl.c | 11 ---
 4 files changed, 28 insertions(+), 14 deletions(-)

-- 
2.19.0.605.g01d371f741-goog



Re: [PATCH v2 0/4] Add proto v2 archive command with HTTP support

2018-09-27 Thread Josh Steadmon
On 2018.09.27 15:20, Junio C Hamano wrote:
> Jonathan Nieder  writes:
> 
> >  1. Clients sending version=2 when they do not, in fact, speak protocol
> > v2 for a service is a (serious) bug.  (Separately from this
> > series) we should fix it.
> >
> >  2. That bug is already in the wild, alas.  Fortunately the semantics of
> > GIT_PROTOCOL as a list of key/value pairs is well defined.  So we
> > have choices of (a) bump version to version=3 (b) pass another
> > value 'version=2:yesreallyversion=2' (c) etc.
> >
> >  3. This is likely to affect push, too.
> 
> Do you mean that existing "git push", "git fetch" and "git archive"
> sends version=2 even when they are not capable of speaking protocol
> v2?  I thought that "git archive [--remote]" was left outside of the
> protocol update (that was the reason why the earlier attempt took a
> hacky route of "shallow clone followed by local archive"), so there
> is no "git archive" in the wild that can even say "version=$n"
> (which requires you to be at least version=1)?

Yes, the version on my desktop sends version=2 when archiving:

∫ which git
/usr/bin/git
∫ git --version
git version 2.19.0.605.g01d371f741-goog
∫ GIT_TRACE_PACKET=${HOME}/server_trace git daemon \
  --enable=upload-archive \
  --base-path=${HOME}/src/bare-repos &
[1] 258496
∫ git archive --remote git://localhost/test-repo.git HEAD >! test.tar
∫ grep version ~/server_trace
15:31:22.377869 pkt-line.c:80   packet:  git< 
git-upload-archive /test-repo.git\0host=localhost\0\0version=2\0


Re: [PATCH 3/3] archive: allow archive over HTTP(S) with proto v2

2018-09-27 Thread Josh Steadmon
On 2018.09.13 09:47, Junio C Hamano wrote:
> Josh Steadmon  writes:
> 
> > Signed-off-by: Josh Steadmon 
> > ---
> >  builtin/archive.c  |  8 +++-
> >  http-backend.c | 10 +-
> >  transport-helper.c |  5 +++--
> >  3 files changed, 19 insertions(+), 4 deletions(-)
> >
> > diff --git a/builtin/archive.c b/builtin/archive.c
> > index 73831887d..5fa75b3f7 100644
> > --- a/builtin/archive.c
> > +++ b/builtin/archive.c
> > @@ -87,7 +87,13 @@ static int run_remote_archiver(int argc, const char 
> > **argv,
> > status = packet_reader_read();
> > if (status != PACKET_READ_FLUSH)
> > die(_("git archive: expected a flush"));
> > -   }
> > +   } else if (version == protocol_v2 &&
> > +  starts_with(transport->url, "http"))
> > +   /*
> > +* Commands over HTTP require two requests, so there's an
> > +* additional server response to parse.
> > +*/
> > +   discover_version();
> 
> What should happen if the version discovered here is different from
> v2 or the capabilities offered are different from what we saw
> before?  Perhaps we need some sanity checks, as we on this side of
> the connection know we are making two requests, and may even end up
> talking with an instance of "upload-archive" that is different from
> the one we talked with earlier.
> 
> > diff --git a/http-backend.c b/http-backend.c
> > index 458642ef7..d62d583c7 100644
> > --- a/http-backend.c
> > +++ b/http-backend.c
> > @@ -32,6 +32,7 @@ struct rpc_service {
> >  static struct rpc_service rpc_service[] = {
> > { "upload-pack", "uploadpack", 1, 1 },
> > { "receive-pack", "receivepack", 0, -1 },
> > +   { "upload-archive", "uploadarchive", 1, 1 },
> >  };
> >  
> >  static struct string_list *get_parameters(void)
> > @@ -637,6 +638,12 @@ static void service_rpc(struct strbuf *hdr, char 
> > *service_name)
> > struct rpc_service *svc = select_service(hdr, service_name);
> > struct strbuf buf = STRBUF_INIT;
> >  
> > +   if (!strcmp(service_name, "git-upload-archive")) {
> > +   /* git-upload-archive doesn't need --stateless-rpc */
> > +   argv[1] = ".";
> > +   argv[2] = NULL;
> > +   }
> > +
> > strbuf_reset();
> > strbuf_addf(, "application/x-git-%s-request", svc->name);
> > check_content_type(hdr, buf.buf);
> > @@ -713,7 +720,8 @@ static struct service_cmd {
> > {"GET", "/objects/pack/pack-[0-9a-f]{40}\\.idx$", get_idx_file},
> >  
> > {"POST", "/git-upload-pack$", service_rpc},
> > -   {"POST", "/git-receive-pack$", service_rpc}
> > +   {"POST", "/git-receive-pack$", service_rpc},
> > +   {"POST", "/git-upload-archive$", service_rpc},
> >  };
> >  
> >  static int bad_request(struct strbuf *hdr, const struct service_cmd *c)
> > diff --git a/transport-helper.c b/transport-helper.c
> > index 143ca008c..b4b96fc89 100644
> > --- a/transport-helper.c
> > +++ b/transport-helper.c
> > @@ -605,7 +605,8 @@ static int process_connect_service(struct transport 
> > *transport,
> > ret = run_connect(transport, );
> > } else if (data->stateless_connect &&
> >(get_protocol_version_config() == protocol_v2) &&
> > -  !strcmp("git-upload-pack", name)) {
> > +  (!strcmp("git-upload-pack", name) ||
> > +   !strcmp("git-upload-archive", name))) {
> > strbuf_addf(, "stateless-connect %s\n", name);
> > ret = run_connect(transport, );
> > if (ret)
> > @@ -639,7 +640,7 @@ static int connect_helper(struct transport *transport, 
> > const char *name,
> >  
> > /* Get_helper so connect is inited. */
> > get_helper(transport);
> > -   if (!data->connect)
> > +   if (!data->connect && !data->stateless_connect)
> > die(_("operation not supported by protocol"));
> 
> This is somewhat curious.  The upload-pack going over HTTP also is
> triggered by the same "stateless-connect" remote helper command, as
> we just saw in the previous hunk, and that support is not new.  Why
> do we need this change then?  What's different between the "data"
> that is obtained by get_helper(transport) for driving upload-pack
> and upload-archive?  Presumably upload-pack was working without this
> change because it also sets the connect bit on, and upload-archive
> does not work without this change because it does not?  Why do these
> two behave differently?

The data struct is not different between upload-pack vs. upload-archive.
Neither of them set the connect bit. The difference is that upload-pack
doesn't need to call transport_connect(), so it never hits
connect_helper().


Re: [PATCH v2 0/4] Add proto v2 archive command with HTTP support

2018-09-27 Thread Josh Steadmon
On 2018.09.27 11:20, Stefan Beller wrote:
> On Wed, Sep 26, 2018 at 6:25 PM Josh Steadmon  wrote:
> >
> > This is the second version of my series to add a new protocol v2 command
> > for archiving, with support for HTTP(S).
> >
> > NEEDSWORK: a server built with this series is not backwards-compatible
> > with clients that set GIT_PROTOCOL=version=2 or configure
> > protocol.version=2. The old client will unconditionally send "argument
> > ..." packet lines, which breaks the server's expectations of a
> > "command=archive" request,
> 
> So if an old client sets protocol to v2, it would only apply that
> protocol version
> to fetch, not archive, so it would start following a v0 conversation, but
> as the protocol version is set, it would be transmitted to the server.
> This sounds like a bug in the client?

Yeah, basically. We're telling the server we support v2, even if the
specific operation we're trying to do doesn't have a v2 implementation
on the client. So this is going to make it ugly to replace existing
commands.

> >  while the server's capability advertisement
> > in turn breaks the clients expectation of either an ACK or NACK.
> 
> Could a modern client send either another protocol version (3?)
> or a special capability along the protocol version ("fixed_archive")
> 
> > I've been discussing workarounds for this with Jonathan Nieder, but
> > please let me know if you have any suggestions for v3 of this series.
> 
> Care to open the discussion to the list? What are the different
> approaches, what are the pros/cons?

Jonathan suggested something along the lines of what you said above,
adding a new field in GIT_PROTOCOL. So we'd send something like
"version=2:archive_version=2" and have the server detect the latter.

I'm not sure if that's the best way to go about this since I'm not
familiar with the version detection code for other parts of the system.
I worry that it will lead us down the path of having to specify a
version for every command that we eventually convert to protocol v2. On
the other hand, I don't see any other way to work around this, at least
in the archive case. We can't peek at the client's transmissions on the
server, because v2 requires that the server speaks first...


[PATCH v2 4/4] archive: allow archive over HTTP(S) with proto v2

2018-09-26 Thread Josh Steadmon
Signed-off-by: Josh Steadmon 
---
 builtin/archive.c  | 12 +++-
 http-backend.c | 13 -
 transport-helper.c |  7 ---
 3 files changed, 27 insertions(+), 5 deletions(-)

diff --git a/builtin/archive.c b/builtin/archive.c
index f91d222677..78a259518d 100644
--- a/builtin/archive.c
+++ b/builtin/archive.c
@@ -87,7 +87,17 @@ static int run_remote_archiver(int argc, const char **argv,
status = packet_reader_read();
if (status == PACKET_READ_NORMAL && reader.pktlen > 0)
die(_("git archive: expected a flush"));
-   }
+   } else if (version == protocol_v2 &&
+  (starts_with(transport->url, "http://;) ||
+   starts_with(transport->url, "https://;)))
+   /*
+* Commands over HTTP require two requests, so there's an
+* additional server response to parse. We do only basic sanity
+* checking here that the versions presented match across
+* requests.
+*/
+   if (version != discover_version())
+   die(_("git archive: received different protocol 
versions in subsequent requests"));
 
/* Now, start reading from fd[0] and spit it out to stdout */
rv = recv_sideband("archive", fd[0], 1);
diff --git a/http-backend.c b/http-backend.c
index 9e894f197f..8e262d50e0 100644
--- a/http-backend.c
+++ b/http-backend.c
@@ -32,6 +32,7 @@ struct rpc_service {
 static struct rpc_service rpc_service[] = {
{ "upload-pack", "uploadpack", 1, 1 },
{ "receive-pack", "receivepack", 0, -1 },
+   { "upload-archive", "uploadarchive", 1, 1 },
 };
 
 static struct string_list *get_parameters(void)
@@ -637,6 +638,15 @@ static void service_rpc(struct strbuf *hdr, char 
*service_name)
struct rpc_service *svc = select_service(hdr, service_name);
struct strbuf buf = STRBUF_INIT;
 
+   if (!strcmp(service_name, "git-upload-archive")) {
+   /*
+* git-upload-archive doesn't need --stateless-rpc, because it
+* always handles only a single request.
+*/
+   argv[1] = ".";
+   argv[2] = NULL;
+   }
+
strbuf_reset();
strbuf_addf(, "application/x-git-%s-request", svc->name);
check_content_type(hdr, buf.buf);
@@ -713,7 +723,8 @@ static struct service_cmd {
{"GET", "/objects/pack/pack-[0-9a-f]{40}\\.idx$", get_idx_file},
 
{"POST", "/git-upload-pack$", service_rpc},
-   {"POST", "/git-receive-pack$", service_rpc}
+   {"POST", "/git-receive-pack$", service_rpc},
+   {"POST", "/git-upload-archive$", service_rpc},
 };
 
 static int bad_request(struct strbuf *hdr, const struct service_cmd *c)
diff --git a/transport-helper.c b/transport-helper.c
index 143ca008c8..c41c3dfcff 100644
--- a/transport-helper.c
+++ b/transport-helper.c
@@ -604,8 +604,9 @@ static int process_connect_service(struct transport 
*transport,
strbuf_addf(, "connect %s\n", name);
ret = run_connect(transport, );
} else if (data->stateless_connect &&
-  (get_protocol_version_config() == protocol_v2) &&
-  !strcmp("git-upload-pack", name)) {
+  get_protocol_version_config() == protocol_v2 &&
+  (!strcmp("git-upload-pack", name) ||
+   !strcmp("git-upload-archive", name))) {
strbuf_addf(, "stateless-connect %s\n", name);
ret = run_connect(transport, );
if (ret)
@@ -639,7 +640,7 @@ static int connect_helper(struct transport *transport, 
const char *name,
 
/* Get_helper so connect is inited. */
get_helper(transport);
-   if (!data->connect)
+   if (!data->connect && !data->stateless_connect)
die(_("operation not supported by protocol"));
 
if (!process_connect_service(transport, name, exec))
-- 
2.19.0.605.g01d371f741-goog



[PATCH v2 3/4] archive: implement protocol v2 archive command

2018-09-26 Thread Josh Steadmon
This adds a new archive command for protocol v2. The command expects
arguments in the form "argument X" which are passed unmodified to
git-upload-archive--writer.

This command works over the file://, Git, and SSH transports. HTTP
support will be added in a separate patch.

NEEDSWORK: this is not backwards-compatible with older clients that set
GIT_PROTOCOL=version=2 or configure protocol.version=2.

Signed-off-by: Josh Steadmon 
---
 Documentation/technical/protocol-v2.txt | 21 +++-
 builtin/archive.c   | 45 +
 builtin/upload-archive.c| 27 +--
 serve.c |  7 
 t/t5000-tar-tree.sh |  7 
 t/t5701-git-serve.sh|  1 +
 6 files changed, 90 insertions(+), 18 deletions(-)

diff --git a/Documentation/technical/protocol-v2.txt 
b/Documentation/technical/protocol-v2.txt
index 09e4e0273f..6126fc5738 100644
--- a/Documentation/technical/protocol-v2.txt
+++ b/Documentation/technical/protocol-v2.txt
@@ -77,7 +77,8 @@ A v2 server would reply:
S: 
 
 Subsequent requests are then made directly to the service
-`$GIT_URL/git-upload-pack`. (This works the same for git-receive-pack).
+`$GIT_URL/git-upload-pack`. (This works the same for git-receive-pack or
+git-upload-archive).
 
  Capability Advertisement
 --
@@ -168,6 +169,24 @@ printable ASCII characters except space (i.e., the byte 
range 32 < x <
 and debugging purposes, and MUST NOT be used to programmatically assume
 the presence or absence of particular features.
 
+ archive
+~
+
+`archive` is the command to request an archive (such as a tarball) from a 
server
+over v2.
+
+archive takes the following optional arguments:
+
+argument 
+   This specifies that  should be passed as an argument to the
+   underlying archive writer.  This option can be repeated to pass
+   additional arguments to the archive writer.
+
+See linkgit:git-archive[1] for allowed values of ``.
+
+The output of archive is a packet-line stream of the resulting archive, with
+error messages transferred over the sideband channel.
+
  ls-refs
 ~
 
diff --git a/builtin/archive.c b/builtin/archive.c
index 4eb547c5b7..f91d222677 100644
--- a/builtin/archive.c
+++ b/builtin/archive.c
@@ -5,9 +5,11 @@
 #include "cache.h"
 #include "builtin.h"
 #include "archive.h"
+#include "connect.h"
 #include "transport.h"
 #include "parse-options.h"
 #include "pkt-line.h"
+#include "protocol.h"
 #include "sideband.h"
 
 static void create_output_file(const char *output_file)
@@ -23,6 +25,13 @@ static void create_output_file(const char *output_file)
}
 }
 
+static void do_v2_command_and_cap(int out)
+{
+   packet_write_fmt(out, "command=archive\n");
+   /* Capability list would go here, if we wanted to request any. */
+   packet_delim(out);
+}
+
 static int run_remote_archiver(int argc, const char **argv,
   const char *remote, const char *exec,
   const char *name_hint)
@@ -32,6 +41,7 @@ static int run_remote_archiver(int argc, const char **argv,
struct remote *_remote;
struct packet_reader reader;
enum packet_read_status status;
+   enum protocol_version version;
 
_remote = remote_get(remote);
if (!_remote->url[0])
@@ -41,6 +51,11 @@ static int run_remote_archiver(int argc, const char **argv,
 
packet_reader_init(, fd[0], NULL, 0, PACKET_READ_CHOMP_NEWLINE);
 
+   version = discover_version();
+
+   if (version == protocol_v2 && server_supports_v2("archive", 0))
+   do_v2_command_and_cap(fd[1]);
+
/*
 * Inject a fake --format field at the beginning of the
 * arguments, with the format inferred from our output
@@ -56,22 +71,24 @@ static int run_remote_archiver(int argc, const char **argv,
packet_write_fmt(fd[1], "argument %s\n", argv[i]);
packet_flush(fd[1]);
 
-   status = packet_reader_read();
-
-   if (status != PACKET_READ_NORMAL || reader.pktlen <= 0)
-   die(_("git archive: expected ACK/NAK, got a flush packet"));
-   if (strcmp(reader.line, "ACK")) {
-   if (starts_with(reader.line, "NACK "))
-   die(_("git archive: NACK %s"), reader.line + 5);
-   if (starts_with(reader.line, "ERR "))
-   die(_("remote error: %s"), reader.line + 4);
-   die(_("git archive: protocol error"));
+   if (version == protocol_v0) {
+   status = packet_reader_read();
+
+   if (status != PACKET_READ_NORMAL || reader.pktlen <= 0)
+   die(_("git archive: expec

[PATCH v2 1/4] archive: follow test standards around assertions

2018-09-26 Thread Josh Steadmon
Move assertions outside of the check_tar function so that all top-level
code is wrapped in a test_expect_* assertion.

Signed-off-by: Josh Steadmon 
---
 t/t5000-tar-tree.sh | 26 ++
 1 file changed, 10 insertions(+), 16 deletions(-)

diff --git a/t/t5000-tar-tree.sh b/t/t5000-tar-tree.sh
index 2a97b27b0a..c408e3a23d 100755
--- a/t/t5000-tar-tree.sh
+++ b/t/t5000-tar-tree.sh
@@ -62,11 +62,9 @@ check_tar() {
dir=$1
dir_with_prefix=$dir/$2
 
-   test_expect_success ' extract tar archive' '
-   (mkdir $dir && cd $dir && "$TAR" xf -) <$tarfile
-   '
+   (mkdir $dir && cd $dir && "$TAR" xf -) <$tarfile &&
 
-   test_expect_success TAR_NEEDS_PAX_FALLBACK ' interpret pax headers' '
+   if test_have_prereq TAR_NEEDS_PAX_FALLBACK ; then
(
cd $dir &&
for header in *.paxheader
@@ -82,16 +80,11 @@ check_tar() {
fi
done
)
-   '
+   fi &&
 
-   test_expect_success ' validate filenames' '
-   (cd ${dir_with_prefix}a && find .) | sort >$listfile &&
-   test_cmp a.lst $listfile
-   '
-
-   test_expect_success ' validate file contents' '
-   diff -r a ${dir_with_prefix}a
-   '
+   (cd ${dir_with_prefix}a && find .) | sort >$listfile &&
+   test_cmp a.lst $listfile &&
+   diff -r a ${dir_with_prefix}a
 }
 
 test_expect_success \
@@ -143,19 +136,20 @@ test_expect_success \
 'git archive' \
 'git archive HEAD >b.tar'
 
-check_tar b
+test_expect_success 'extract archive' 'check_tar b'
 
 test_expect_success 'git archive --prefix=prefix/' '
git archive --prefix=prefix/ HEAD >with_prefix.tar
 '
 
-check_tar with_prefix prefix/
+test_expect_success 'extract with prefix' 'check_tar with_prefix prefix/'
 
 test_expect_success 'git-archive --prefix=olde-' '
git archive --prefix=olde- HEAD >with_olde-prefix.tar
 '
 
-check_tar with_olde-prefix olde-
+test_expect_success 'extract with olde- prefix' \
+   'check_tar with_olde-prefix olde-'
 
 test_expect_success 'git archive on large files' '
 test_config core.bigfilethreshold 1 &&
-- 
2.19.0.605.g01d371f741-goog



[PATCH v2 2/4] archive: use packet_reader for communications

2018-09-26 Thread Josh Steadmon
Using packet_reader will simplify version detection and capability
handling, which will make implementation of protocol v2 support in
git-archive easier.

This refactoring does not change the behavior of "git archive".

Signed-off-by: Josh Steadmon 
---
 builtin/archive.c | 23 ++-
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/builtin/archive.c b/builtin/archive.c
index e74f675390..4eb547c5b7 100644
--- a/builtin/archive.c
+++ b/builtin/archive.c
@@ -27,10 +27,11 @@ static int run_remote_archiver(int argc, const char **argv,
   const char *remote, const char *exec,
   const char *name_hint)
 {
-   char *buf;
int fd[2], i, rv;
struct transport *transport;
struct remote *_remote;
+   struct packet_reader reader;
+   enum packet_read_status status;
 
_remote = remote_get(remote);
if (!_remote->url[0])
@@ -38,6 +39,8 @@ static int run_remote_archiver(int argc, const char **argv,
transport = transport_get(_remote, _remote->url[0]);
transport_connect(transport, "git-upload-archive", exec, fd);
 
+   packet_reader_init(, fd[0], NULL, 0, PACKET_READ_CHOMP_NEWLINE);
+
/*
 * Inject a fake --format field at the beginning of the
 * arguments, with the format inferred from our output
@@ -53,18 +56,20 @@ static int run_remote_archiver(int argc, const char **argv,
packet_write_fmt(fd[1], "argument %s\n", argv[i]);
packet_flush(fd[1]);
 
-   buf = packet_read_line(fd[0], NULL);
-   if (!buf)
+   status = packet_reader_read();
+
+   if (status != PACKET_READ_NORMAL || reader.pktlen <= 0)
die(_("git archive: expected ACK/NAK, got a flush packet"));
-   if (strcmp(buf, "ACK")) {
-   if (starts_with(buf, "NACK "))
-   die(_("git archive: NACK %s"), buf + 5);
-   if (starts_with(buf, "ERR "))
-   die(_("remote error: %s"), buf + 4);
+   if (strcmp(reader.line, "ACK")) {
+   if (starts_with(reader.line, "NACK "))
+   die(_("git archive: NACK %s"), reader.line + 5);
+   if (starts_with(reader.line, "ERR "))
+   die(_("remote error: %s"), reader.line + 4);
die(_("git archive: protocol error"));
}
 
-   if (packet_read_line(fd[0], NULL))
+   status = packet_reader_read();
+   if (status == PACKET_READ_NORMAL && reader.pktlen > 0)
die(_("git archive: expected a flush"));
 
/* Now, start reading from fd[0] and spit it out to stdout */
-- 
2.19.0.605.g01d371f741-goog



[PATCH v2 0/4] Add proto v2 archive command with HTTP support

2018-09-26 Thread Josh Steadmon
This is the second version of my series to add a new protocol v2 command
for archiving, with support for HTTP(S).

NEEDSWORK: a server built with this series is not backwards-compatible
with clients that set GIT_PROTOCOL=version=2 or configure
protocol.version=2. The old client will unconditionally send "argument
..." packet lines, which breaks the server's expectations of a
"command=archive" request, while the server's capability advertisement
in turn breaks the clients expectation of either an ACK or NACK.

I've been discussing workarounds for this with Jonathan Nieder, but
please let me know if you have any suggestions for v3 of this series.


Josh Steadmon (4):
  archive: follow test standards around assertions
  archive: use packet_reader for communications
  archive: implement protocol v2 archive command
  archive: allow archive over HTTP(S) with proto v2

 Documentation/technical/protocol-v2.txt | 21 -
 builtin/archive.c   | 58 +++--
 builtin/upload-archive.c| 27 ++--
 http-backend.c  | 13 +-
 serve.c |  7 +++
 t/t5000-tar-tree.sh | 33 +++---
 t/t5701-git-serve.sh|  1 +
 transport-helper.c  |  7 +--
 8 files changed, 130 insertions(+), 37 deletions(-)

Range-diff against v1:
-:  -- > 1:  c2e371ad24 archive: follow test standards around assertions
1:  b514184273 ! 2:  a65f73f627 archive: use packet_reader for communications
@@ -6,7 +6,10 @@
 handling, which will make implementation of protocol v2 support in
 git-archive easier.
 
+This refactoring does not change the behavior of "git archive".
    +
 Signed-off-by: Josh Steadmon 
+Reviewed-by: Stefan Beller 
 
  diff --git a/builtin/archive.c b/builtin/archive.c
@@ -42,24 +45,24 @@
 -  if (!buf)
 +  status = packet_reader_read();
 +
-+  if (status == PACKET_READ_FLUSH)
++  if (status != PACKET_READ_NORMAL || reader.pktlen <= 0)
die(_("git archive: expected ACK/NAK, got a flush packet"));
 -  if (strcmp(buf, "ACK")) {
 -  if (starts_with(buf, "NACK "))
 -  die(_("git archive: NACK %s"), buf + 5);
 -  if (starts_with(buf, "ERR "))
 -  die(_("remote error: %s"), buf + 4);
-+  if (strcmp(reader.buffer, "ACK")) {
-+  if (starts_with(reader.buffer, "NACK "))
-+  die(_("git archive: NACK %s"), reader.buffer + 5);
-+  if (starts_with(reader.buffer, "ERR "))
-+  die(_("remote error: %s"), reader.buffer + 4);
++  if (strcmp(reader.line, "ACK")) {
++  if (starts_with(reader.line, "NACK "))
++  die(_("git archive: NACK %s"), reader.line + 5);
++  if (starts_with(reader.line, "ERR "))
++  die(_("remote error: %s"), reader.line + 4);
die(_("git archive: protocol error"));
}
  
 -  if (packet_read_line(fd[0], NULL))
 +  status = packet_reader_read();
-+  if (status != PACKET_READ_FLUSH)
++  if (status == PACKET_READ_NORMAL && reader.pktlen > 0)
die(_("git archive: expected a flush"));
  
/* Now, start reading from fd[0] and spit it out to stdout */
2:  1518c15dc1 < -:  -- archive: implement protocol v2 archive command
-:  -- > 3:  0a8cc5e331 archive: implement protocol v2 archive command
3:  1b7ad8d8f6 ! 4:  97a1424f32 archive: allow archive over HTTP(S) with proto 
v2
@@ -10,16 +10,20 @@
  +++ b/builtin/archive.c
 @@
status = packet_reader_read();
-   if (status != PACKET_READ_FLUSH)
+   if (status == PACKET_READ_NORMAL && reader.pktlen > 0)
die(_("git archive: expected a flush"));
 -  }
 +  } else if (version == protocol_v2 &&
-+ starts_with(transport->url, "http"))
++ (starts_with(transport->url, "http://;) ||
++  starts_with(transport->url, "https://;)))
 +  /*
 +   * Commands over HTTP require two requests, so there's an
-+   * additional server response to parse.
++   * additional server response to parse. We do only basic sanity
++   * checking here that the versions presented match across
++   * requests.
 +   */
-+  discover_version();
++  if (version != discover_version())
++  die(_("git archive: received different 

[PATCH 2/3] archive: implement protocol v2 archive command

2018-09-11 Thread Josh Steadmon
This adds a new archive command for protocol v2. The command expects
arguments in the form "argument X" which are passed unmodified to
git-upload-archive--writer.

This command works over the file://, Git, and SSH transports. HTTP
support will be added in a separate patch.

Signed-off-by: Josh Steadmon 
---
 builtin/archive.c| 45 +++-
 builtin/upload-archive.c | 44 ---
 t/t5000-tar-tree.sh  |  5 +
 3 files changed, 77 insertions(+), 17 deletions(-)

diff --git a/builtin/archive.c b/builtin/archive.c
index e54fc39ad..73831887d 100644
--- a/builtin/archive.c
+++ b/builtin/archive.c
@@ -5,9 +5,11 @@
 #include "cache.h"
 #include "builtin.h"
 #include "archive.h"
+#include "connect.h"
 #include "transport.h"
 #include "parse-options.h"
 #include "pkt-line.h"
+#include "protocol.h"
 #include "sideband.h"
 
 static void create_output_file(const char *output_file)
@@ -23,6 +25,13 @@ static void create_output_file(const char *output_file)
}
 }
 
+static int do_v2_command_and_cap(int out)
+{
+   packet_write_fmt(out, "command=archive\n");
+   /* Capability list would go here, if we had any. */
+   packet_delim(out);
+}
+
 static int run_remote_archiver(int argc, const char **argv,
   const char *remote, const char *exec,
   const char *name_hint)
@@ -32,6 +41,7 @@ static int run_remote_archiver(int argc, const char **argv,
struct remote *_remote;
struct packet_reader reader;
enum packet_read_status status;
+   enum protocol_version version;
 
_remote = remote_get(remote);
if (!_remote->url[0])
@@ -41,6 +51,11 @@ static int run_remote_archiver(int argc, const char **argv,
 
packet_reader_init(, fd[0], NULL, 0, PACKET_READ_CHOMP_NEWLINE);
 
+   version = discover_version();
+
+   if (version == protocol_v2)
+   do_v2_command_and_cap(fd[1]);
+
/*
 * Inject a fake --format field at the beginning of the
 * arguments, with the format inferred from our output
@@ -56,22 +71,24 @@ static int run_remote_archiver(int argc, const char **argv,
packet_write_fmt(fd[1], "argument %s\n", argv[i]);
packet_flush(fd[1]);
 
-   status = packet_reader_read();
-
-   if (status == PACKET_READ_FLUSH)
-   die(_("git archive: expected ACK/NAK, got a flush packet"));
-   if (strcmp(reader.buffer, "ACK")) {
-   if (starts_with(reader.buffer, "NACK "))
-   die(_("git archive: NACK %s"), reader.buffer + 5);
-   if (starts_with(reader.buffer, "ERR "))
-   die(_("remote error: %s"), reader.buffer + 4);
-   die(_("git archive: protocol error"));
+   if (version == protocol_v0) {
+   status = packet_reader_read();
+
+   if (status == PACKET_READ_FLUSH)
+   die(_("git archive: expected ACK/NAK, got a flush 
packet"));
+   if (strcmp(reader.buffer, "ACK")) {
+   if (starts_with(reader.buffer, "NACK "))
+   die(_("git archive: NACK %s"), reader.buffer + 
5);
+   if (starts_with(reader.buffer, "ERR "))
+   die(_("remote error: %s"), reader.buffer + 4);
+   die(_("git archive: protocol error"));
+   }
+
+   status = packet_reader_read();
+   if (status != PACKET_READ_FLUSH)
+   die(_("git archive: expected a flush"));
}
 
-   status = packet_reader_read();
-   if (status != PACKET_READ_FLUSH)
-   die(_("git archive: expected a flush"));
-
/* Now, start reading from fd[0] and spit it out to stdout */
rv = recv_sideband("archive", fd[0], 1);
rv |= transport_disconnect(transport);
diff --git a/builtin/upload-archive.c b/builtin/upload-archive.c
index 25d911635..534e8fd56 100644
--- a/builtin/upload-archive.c
+++ b/builtin/upload-archive.c
@@ -5,6 +5,7 @@
 #include "builtin.h"
 #include "archive.h"
 #include "pkt-line.h"
+#include "protocol.h"
 #include "sideband.h"
 #include "run-command.h"
 #include "argv-array.h"
@@ -73,13 +74,53 @@ static ssize_t process_input(int child_fd, int band)
return sz;
 }
 
+static int handle_v2_command_and_cap(void)
+{
+   struct packet_reader reader;
+   enum packet_read_status status;
+
+   packet_reader_init(, 0, NULL, 0, PACKET_READ_CHOMP_NEWLINE);
+
+   packet_write_fmt(1, "versi

[PATCH 3/3] archive: allow archive over HTTP(S) with proto v2

2018-09-11 Thread Josh Steadmon
Signed-off-by: Josh Steadmon 
---
 builtin/archive.c  |  8 +++-
 http-backend.c | 10 +-
 transport-helper.c |  5 +++--
 3 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/builtin/archive.c b/builtin/archive.c
index 73831887d..5fa75b3f7 100644
--- a/builtin/archive.c
+++ b/builtin/archive.c
@@ -87,7 +87,13 @@ static int run_remote_archiver(int argc, const char **argv,
status = packet_reader_read();
if (status != PACKET_READ_FLUSH)
die(_("git archive: expected a flush"));
-   }
+   } else if (version == protocol_v2 &&
+  starts_with(transport->url, "http"))
+   /*
+* Commands over HTTP require two requests, so there's an
+* additional server response to parse.
+*/
+   discover_version();
 
/* Now, start reading from fd[0] and spit it out to stdout */
rv = recv_sideband("archive", fd[0], 1);
diff --git a/http-backend.c b/http-backend.c
index 458642ef7..d62d583c7 100644
--- a/http-backend.c
+++ b/http-backend.c
@@ -32,6 +32,7 @@ struct rpc_service {
 static struct rpc_service rpc_service[] = {
{ "upload-pack", "uploadpack", 1, 1 },
{ "receive-pack", "receivepack", 0, -1 },
+   { "upload-archive", "uploadarchive", 1, 1 },
 };
 
 static struct string_list *get_parameters(void)
@@ -637,6 +638,12 @@ static void service_rpc(struct strbuf *hdr, char 
*service_name)
struct rpc_service *svc = select_service(hdr, service_name);
struct strbuf buf = STRBUF_INIT;
 
+   if (!strcmp(service_name, "git-upload-archive")) {
+   /* git-upload-archive doesn't need --stateless-rpc */
+   argv[1] = ".";
+   argv[2] = NULL;
+   }
+
strbuf_reset();
strbuf_addf(, "application/x-git-%s-request", svc->name);
check_content_type(hdr, buf.buf);
@@ -713,7 +720,8 @@ static struct service_cmd {
{"GET", "/objects/pack/pack-[0-9a-f]{40}\\.idx$", get_idx_file},
 
{"POST", "/git-upload-pack$", service_rpc},
-   {"POST", "/git-receive-pack$", service_rpc}
+   {"POST", "/git-receive-pack$", service_rpc},
+   {"POST", "/git-upload-archive$", service_rpc},
 };
 
 static int bad_request(struct strbuf *hdr, const struct service_cmd *c)
diff --git a/transport-helper.c b/transport-helper.c
index 143ca008c..b4b96fc89 100644
--- a/transport-helper.c
+++ b/transport-helper.c
@@ -605,7 +605,8 @@ static int process_connect_service(struct transport 
*transport,
ret = run_connect(transport, );
} else if (data->stateless_connect &&
   (get_protocol_version_config() == protocol_v2) &&
-  !strcmp("git-upload-pack", name)) {
+  (!strcmp("git-upload-pack", name) ||
+   !strcmp("git-upload-archive", name))) {
strbuf_addf(, "stateless-connect %s\n", name);
ret = run_connect(transport, );
if (ret)
@@ -639,7 +640,7 @@ static int connect_helper(struct transport *transport, 
const char *name,
 
/* Get_helper so connect is inited. */
get_helper(transport);
-   if (!data->connect)
+   if (!data->connect && !data->stateless_connect)
die(_("operation not supported by protocol"));
 
if (!process_connect_service(transport, name, exec))
-- 
2.19.0.397.gdd90340f6a-goog



Add proto v2 archive command with HTTP support

2018-09-11 Thread Josh Steadmon
This series adds a new protocol v2 command for archiving, and allows
this command to work over HTTP(S). This was previously discussed in [1].
I've CCed everyone who participated in that discussion.

[1]: 
https://public-inbox.org/git/CANq=j3tk7qebjoc7vnwkh4+wbnibmjjp5yukd9te5naywuk...@mail.gmail.com/




[PATCH 1/3] archive: use packet_reader for communications

2018-09-11 Thread Josh Steadmon
Using packet_reader will simplify version detection and capability
handling, which will make implementation of protocol v2 support in
git-archive easier.

Signed-off-by: Josh Steadmon 
---
 builtin/archive.c | 23 ++-
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/builtin/archive.c b/builtin/archive.c
index e74f67539..e54fc39ad 100644
--- a/builtin/archive.c
+++ b/builtin/archive.c
@@ -27,10 +27,11 @@ static int run_remote_archiver(int argc, const char **argv,
   const char *remote, const char *exec,
   const char *name_hint)
 {
-   char *buf;
int fd[2], i, rv;
struct transport *transport;
struct remote *_remote;
+   struct packet_reader reader;
+   enum packet_read_status status;
 
_remote = remote_get(remote);
if (!_remote->url[0])
@@ -38,6 +39,8 @@ static int run_remote_archiver(int argc, const char **argv,
transport = transport_get(_remote, _remote->url[0]);
transport_connect(transport, "git-upload-archive", exec, fd);
 
+   packet_reader_init(, fd[0], NULL, 0, PACKET_READ_CHOMP_NEWLINE);
+
/*
 * Inject a fake --format field at the beginning of the
 * arguments, with the format inferred from our output
@@ -53,18 +56,20 @@ static int run_remote_archiver(int argc, const char **argv,
packet_write_fmt(fd[1], "argument %s\n", argv[i]);
packet_flush(fd[1]);
 
-   buf = packet_read_line(fd[0], NULL);
-   if (!buf)
+   status = packet_reader_read();
+
+   if (status == PACKET_READ_FLUSH)
die(_("git archive: expected ACK/NAK, got a flush packet"));
-   if (strcmp(buf, "ACK")) {
-   if (starts_with(buf, "NACK "))
-   die(_("git archive: NACK %s"), buf + 5);
-   if (starts_with(buf, "ERR "))
-   die(_("remote error: %s"), buf + 4);
+   if (strcmp(reader.buffer, "ACK")) {
+   if (starts_with(reader.buffer, "NACK "))
+   die(_("git archive: NACK %s"), reader.buffer + 5);
+   if (starts_with(reader.buffer, "ERR "))
+   die(_("remote error: %s"), reader.buffer + 4);
die(_("git archive: protocol error"));
}
 
-   if (packet_read_line(fd[0], NULL))
+   status = packet_reader_read();
+   if (status != PACKET_READ_FLUSH)
die(_("git archive: expected a flush"));
 
/* Now, start reading from fd[0] and spit it out to stdout */
-- 
2.19.0.397.gdd90340f6a-goog



[PATCH v2] config: document value 2 for protocol.version

2018-09-10 Thread Josh Steadmon
From: Brandon Williams 

Update the config documentation to note the value `2` as an acceptable
value for the protocol.version config.

Signed-off-by: Brandon Williams 
Signed-off-by: Josh Steadmon 
---
 Documentation/config.txt | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index eb66a1197..ee3b5dd8e 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -2828,6 +2828,8 @@ protocol.version::
 * `1` - the original wire protocol with the addition of a version string
   in the initial response from the server.
 
+* `2` - link:technical/protocol-v2.html[wire protocol version 2].
+
 --
 
 pull.ff::
-- 
2.19.0.rc2.392.g5ba43deb5a-goog



Proposed approaches to supporting HTTP remotes in "git archive"

2018-07-27 Thread Josh Steadmon
# Supporting HTTP remotes in "git archive"

We would like to allow remote archiving from HTTP servers. There are a
few possible implementations to be discussed:

## Shallow clone to temporary repo

This approach builds on existing endpoints. Clients will connect to the
remote server's git-upload-pack service and fetch a shallow clone of the
requested commit into a temporary local repo. The write_archive()
function is then called on the local clone to write out the requested
archive.

### Benefits

* This can be implemented entirely in builtin/archive.c. No new service
  endpoints or server code are required.

* The archive is generated and compressed on the client side. This
  reduces CPU load on the server (for compressed archives) which would
   otherwise be a potential DoS vector.

* This provides a git-native way to archive any HTTP servers that
  support the git-upload-pack service; some providers (including GitHub)
  do not currently allow the git-upload-archive service.

### Drawbacks

* Archives generated remotely may not be bit-for-bit identical compared
  to those generated locally, if the versions of git used on the client
  and on the server differ.

* This requires higher bandwidth compared to transferring a compressed
  archive generated on the server.


## Use git-upload-archive

This approach requires adding support for the git-upload-archive
endpoint to the HTTP backend. Clients will connect to the remote
server's git-upload-archive service and the server will generate the
archive which is then delivered to the client.

### Benefits

* Matches existing "git archive" behavior for other remotes.

* Requires less bandwidth to send a compressed archive than a shallow
  clone.

* Resulting archive does not depend in any way on the client
  implementation.

### Drawbacks

* Implementation is more complicated; it will require changes to (at
  least) builtin/archive.c, http-backend.c, and
  builtin/upload-archive.c.

* Generates more CPU load on the server when compressing archives. This
  is potentially a DoS vector.

* Does not allow archiving from servers that don't support the
  git-upload-archive service.


## Add a new protocol v2 "archive" command

I am still a bit hazy on the exact details of this approach, please
forgive any inaccuracies (I'm a new contributor and haven't examined
custom v2 commands in much detail yet).

This approach builds off the existing v2 upload-pack endpoint. The
client will issue an archive command (with options to select particular
paths or a tree-ish). The server will generate the archive and deliver
it to the client.

### Benefits

* Requires less bandwidth to send a compressed archive than a shallow
  clone.

* Resulting archive does not depend in any way on the client
  implementation.

### Drawbacks

* Generates more CPU load on the server when compressing archives. This
  is potentially a DoS vector.

* Servers must support the v2 protocol (although the client could
  potentially fallback to some other supported remote archive
   functionality).

### Unknowns

* I am not clear on the relative complexity of this approach compared to
  the others, and would appreciate any guidance offered.


## Summary

Personally, I lean towards the first approach. It could give us an
opportunity to remove server-side complexity; there is no reason that
the shallow-clone approach must be restricted to the HTTP transport, and
we could re-implement other transports using this method.  Additionally,
it would allow clients to pull archives from remotes that would not
otherwise support it.

That said, I am happy to work on whichever approach the community deems
most worthwhile.


[PATCH] protocol-v2 doc: put HTTP headers after request

2018-06-22 Thread Josh Steadmon
HTTP servers return 400 if you send headers before the GET request.

Signed-off-by: Josh Steadmon 
---
 Documentation/technical/protocol-v2.txt | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/Documentation/technical/protocol-v2.txt 
b/Documentation/technical/protocol-v2.txt
index 49bda76d2..f58f24b1e 100644
--- a/Documentation/technical/protocol-v2.txt
+++ b/Documentation/technical/protocol-v2.txt
@@ -64,9 +64,8 @@ When using the http:// or https:// transport a client makes a 
"smart"
 info/refs request as described in `http-protocol.txt` and requests that
 v2 be used by supplying "version=2" in the `Git-Protocol` header.
 
-   C: Git-Protocol: version=2
-   C:
C: GET $GIT_URL/info/refs?service=git-upload-pack HTTP/1.0
+   C: Git-Protocol: version=2
 
 A v2 server would reply:
 
-- 
2.18.0.rc2.346.g013aa6912e-goog