[PATCH v3 0/3] Add commit-graph fuzzer and fix buffer overflow
Ad a new fuzz test for the commit graph and fix a buffer read-overflow that it discovered. Additionally, fix the Makefile instructions for building fuzzers. Changes since V2: * Avoid pointer arithmetic overflow when checking the graph's chunk count. * Merge the corrupt_graph_and_verify and corrupt_and_zero_graph_then_verify test functions. Josh Steadmon (3): commit-graph, fuzz: Add fuzzer for commit-graph commit-graph: fix buffer read-overflow Makefile: correct example fuzz build .gitignore | 1 + Makefile| 3 +- commit-graph.c | 67 + commit-graph.h | 3 ++ fuzz-commit-graph.c | 16 ++ t/t5318-commit-graph.sh | 15 +++-- 6 files changed, 83 insertions(+), 22 deletions(-) create mode 100644 fuzz-commit-graph.c Range-diff against v2: 1: af45c2337f ! 1: 675d58ecea commit-graph: fix buffer read-overflow @@ -22,8 +22,8 @@ + uint64_t chunk_offset; int chunk_repeated = 0; -+ if (chunk_lookup + GRAPH_CHUNKLOOKUP_WIDTH > -+ data + graph_size) { ++ if (data + graph_size - chunk_lookup < ++ GRAPH_CHUNKLOOKUP_WIDTH) { + error(_("chunk lookup table entry missing; graph file may be incomplete")); + free(graph); + return NULL; @@ -40,31 +40,34 @@ --- a/t/t5318-commit-graph.sh +++ b/t/t5318-commit-graph.sh @@ - test_i18ngrep "$grepstr" err - } + GRAPH_BYTE_OCTOPUS=$(($GRAPH_OCTOPUS_DATA_OFFSET + 4)) + GRAPH_BYTE_FOOTER=$(($GRAPH_OCTOPUS_DATA_OFFSET + 4 * $NUM_OCTOPUS_EDGES)) -+ -+# usage: corrupt_and_zero_graph_then_verify -+# Manipulates the commit-graph file at by inserting the data, -+# then zeros the file starting at . Finally, runs -+# 'git commit-graph verify' and places the output in the file 'err'. Tests 'err' -+# for the given string. -+corrupt_and_zero_graph_then_verify() { -+ corrupt_pos=$1 -+ data="${2:-\0}" -+ zero_pos=$3 -+ grepstr=$4 +-# usage: corrupt_graph_and_verify ++# usage: corrupt_graph_and_verify[] + # Manipulates the commit-graph file at the position +-# by inserting the data, then runs 'git commit-graph verify' ++# by inserting the data, optionally zeroing the file ++# starting at , then runs 'git commit-graph verify' + # and places the output in the file 'err'. Test 'err' for + # the given string. + corrupt_graph_and_verify() { + pos=$1 + data="${2:-\0}" + grepstr=$3 + orig_size=$(stat --format=%s $objdir/info/commit-graph) -+ cd "$TRASH_DIRECTORY/full" && -+ test_when_finished mv commit-graph-backup $objdir/info/commit-graph && -+ cp $objdir/info/commit-graph commit-graph-backup && -+ printf "$data" | dd of="$objdir/info/commit-graph" bs=1 seek="$corrupt_pos" conv=notrunc && ++ zero_pos=${4:-${orig_size}} + cd "$TRASH_DIRECTORY/full" && + test_when_finished mv commit-graph-backup $objdir/info/commit-graph && + cp $objdir/info/commit-graph commit-graph-backup && + printf "$data" | dd of="$objdir/info/commit-graph" bs=1 seek="$pos" conv=notrunc && + truncate --size=$zero_pos $objdir/info/commit-graph && + truncate --size=$orig_size $objdir/info/commit-graph && -+ test_must_fail git commit-graph verify 2>test_err && -+ grep -v "^+" test_err >err && -+ test_i18ngrep "$grepstr" err -+} + test_must_fail git commit-graph verify 2>test_err && + grep -v "^+" test_err >err + test_i18ngrep "$grepstr" err + } + + test_expect_success 'detect bad signature' ' corrupt_graph_and_verify 0 "\0" \ @@ -73,9 +76,9 @@ "incorrect checksum" ' -+test_expect_success 'detect truncated graph' ' -+ corrupt_and_zero_graph_then_verify $GRAPH_BYTE_CHUNK_COUNT "\xff" \ -+ $GRAPH_CHUNK_LOOKUP_OFFSET "chunk lookup table entry missing" ++test_expect_success 'detect incorrect chunk count' ' ++ corrupt_graph_and_verify $GRAPH_BYTE_CHUNK_COUNT "\xff" \ ++ "chunk lookup table entry missing" $GRAPH_CHUNK_LOOKUP_OFFSET +' + test_expect_success 'git fsck (checks commit-graph)' ' 2: 7519fc76df = 2: 06a32bfe8b Makefile: correct example fuzz build -- 2.20.0.rc2.12.g4c11c11dec
[PATCH v3 1/3] commit-graph, fuzz: Add fuzzer for commit-graph
Breaks load_commit_graph_one() into a new function, parse_commit_graph(). The latter function operates on arbitrary buffers, which makes it suitable as a fuzzing target. Since parse_commit_graph() is only called by load_commit_graph_one() (and the fuzzer described below), we omit error messages that would be duplicated by the caller. Adds fuzz-commit-graph.c, which provides a fuzzing entry point compatible with libFuzzer (and possibly other fuzzing engines). Signed-off-by: Josh Steadmon --- .gitignore | 1 + Makefile| 1 + commit-graph.c | 53 ++--- commit-graph.h | 3 +++ fuzz-commit-graph.c | 16 ++ 5 files changed, 57 insertions(+), 17 deletions(-) create mode 100644 fuzz-commit-graph.c diff --git a/.gitignore b/.gitignore index 0d77ea5894..8bcf153ed9 100644 --- a/.gitignore +++ b/.gitignore @@ -1,3 +1,4 @@ +/fuzz-commit-graph /fuzz_corpora /fuzz-pack-headers /fuzz-pack-idx diff --git a/Makefile b/Makefile index 1a44c811aa..6b72f37c29 100644 --- a/Makefile +++ b/Makefile @@ -684,6 +684,7 @@ SCRIPTS = $(SCRIPT_SH_INS) \ ETAGS_TARGET = TAGS +FUZZ_OBJS += fuzz-commit-graph.o FUZZ_OBJS += fuzz-pack-headers.o FUZZ_OBJS += fuzz-pack-idx.o diff --git a/commit-graph.c b/commit-graph.c index 40c855f185..07dd410f3c 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -84,16 +84,10 @@ static int commit_graph_compatible(struct repository *r) struct commit_graph *load_commit_graph_one(const char *graph_file) { void *graph_map; - const unsigned char *data, *chunk_lookup; size_t graph_size; struct stat st; - uint32_t i; - struct commit_graph *graph; + struct commit_graph *ret; int fd = git_open(graph_file); - uint64_t last_chunk_offset; - uint32_t last_chunk_id; - uint32_t graph_signature; - unsigned char graph_version, hash_version; if (fd < 0) return NULL; @@ -108,27 +102,55 @@ struct commit_graph *load_commit_graph_one(const char *graph_file) die(_("graph file %s is too small"), graph_file); } graph_map = xmmap(NULL, graph_size, PROT_READ, MAP_PRIVATE, fd, 0); + ret = parse_commit_graph(graph_map, fd, graph_size); + + if (!ret) { + munmap(graph_map, graph_size); + close(fd); + exit(1); + } + + return ret; +} + +struct commit_graph *parse_commit_graph(void *graph_map, int fd, + size_t graph_size) +{ + const unsigned char *data, *chunk_lookup; + uint32_t i; + struct commit_graph *graph; + uint64_t last_chunk_offset; + uint32_t last_chunk_id; + uint32_t graph_signature; + unsigned char graph_version, hash_version; + + if (!graph_map) + return NULL; + + if (graph_size < GRAPH_MIN_SIZE) + return NULL; + data = (const unsigned char *)graph_map; graph_signature = get_be32(data); if (graph_signature != GRAPH_SIGNATURE) { error(_("graph signature %X does not match signature %X"), graph_signature, GRAPH_SIGNATURE); - goto cleanup_fail; + return NULL; } graph_version = *(unsigned char*)(data + 4); if (graph_version != GRAPH_VERSION) { error(_("graph version %X does not match version %X"), graph_version, GRAPH_VERSION); - goto cleanup_fail; + return NULL; } hash_version = *(unsigned char*)(data + 5); if (hash_version != GRAPH_OID_VERSION) { error(_("hash version %X does not match version %X"), hash_version, GRAPH_OID_VERSION); - goto cleanup_fail; + return NULL; } graph = alloc_commit_graph(); @@ -152,7 +174,8 @@ struct commit_graph *load_commit_graph_one(const char *graph_file) if (chunk_offset > graph_size - GIT_MAX_RAWSZ) { error(_("improper chunk offset %08x%08x"), (uint32_t)(chunk_offset >> 32), (uint32_t)chunk_offset); - goto cleanup_fail; + free(graph); + return NULL; } switch (chunk_id) { @@ -187,7 +210,8 @@ struct commit_graph *load_commit_graph_one(const char *graph_file) if (chunk_repeated) { error(_("chunk id %08x appears multiple times"), chunk_id); - goto cleanup_fail; + free(graph); + return NULL; } if (last_chunk_id == GRAPH_CHUNKID_OIDLOOKUP) @@ -201,11 +225,6 @@ struct commit_graph *load_commit_graph_one
[PATCH v3 2/3] commit-graph: fix buffer read-overflow
fuzz-commit-graph identified a case where Git will read past the end of a buffer containing a commit graph if the graph's header has an incorrect chunk count. A simple bounds check in parse_commit_graph() prevents this. Signed-off-by: Josh Steadmon --- commit-graph.c | 14 -- t/t5318-commit-graph.sh | 15 +-- 2 files changed, 25 insertions(+), 4 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index 07dd410f3c..836d65a1d3 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -165,10 +165,20 @@ struct commit_graph *parse_commit_graph(void *graph_map, int fd, last_chunk_offset = 8; chunk_lookup = data + 8; for (i = 0; i < graph->num_chunks; i++) { - uint32_t chunk_id = get_be32(chunk_lookup + 0); - uint64_t chunk_offset = get_be64(chunk_lookup + 4); + uint32_t chunk_id; + uint64_t chunk_offset; int chunk_repeated = 0; + if (data + graph_size - chunk_lookup < + GRAPH_CHUNKLOOKUP_WIDTH) { + error(_("chunk lookup table entry missing; graph file may be incomplete")); + free(graph); + return NULL; + } + + chunk_id = get_be32(chunk_lookup + 0); + chunk_offset = get_be64(chunk_lookup + 4); + chunk_lookup += GRAPH_CHUNKLOOKUP_WIDTH; if (chunk_offset > graph_size - GIT_MAX_RAWSZ) { diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh index 5fe21db99f..5b6b44b78e 100755 --- a/t/t5318-commit-graph.sh +++ b/t/t5318-commit-graph.sh @@ -366,24 +366,30 @@ GRAPH_OCTOPUS_DATA_OFFSET=$(($GRAPH_COMMIT_DATA_OFFSET + \ GRAPH_BYTE_OCTOPUS=$(($GRAPH_OCTOPUS_DATA_OFFSET + 4)) GRAPH_BYTE_FOOTER=$(($GRAPH_OCTOPUS_DATA_OFFSET + 4 * $NUM_OCTOPUS_EDGES)) -# usage: corrupt_graph_and_verify +# usage: corrupt_graph_and_verify[] # Manipulates the commit-graph file at the position -# by inserting the data, then runs 'git commit-graph verify' +# by inserting the data, optionally zeroing the file +# starting at , then runs 'git commit-graph verify' # and places the output in the file 'err'. Test 'err' for # the given string. corrupt_graph_and_verify() { pos=$1 data="${2:-\0}" grepstr=$3 + orig_size=$(stat --format=%s $objdir/info/commit-graph) + zero_pos=${4:-${orig_size}} cd "$TRASH_DIRECTORY/full" && test_when_finished mv commit-graph-backup $objdir/info/commit-graph && cp $objdir/info/commit-graph commit-graph-backup && printf "$data" | dd of="$objdir/info/commit-graph" bs=1 seek="$pos" conv=notrunc && + truncate --size=$zero_pos $objdir/info/commit-graph && + truncate --size=$orig_size $objdir/info/commit-graph && test_must_fail git commit-graph verify 2>test_err && grep -v "^+" test_err >err test_i18ngrep "$grepstr" err } + test_expect_success 'detect bad signature' ' corrupt_graph_and_verify 0 "\0" \ "graph signature" @@ -484,6 +490,11 @@ test_expect_success 'detect invalid checksum hash' ' "incorrect checksum" ' +test_expect_success 'detect incorrect chunk count' ' + corrupt_graph_and_verify $GRAPH_BYTE_CHUNK_COUNT "\xff" \ + "chunk lookup table entry missing" $GRAPH_CHUNK_LOOKUP_OFFSET +' + test_expect_success 'git fsck (checks commit-graph)' ' cd "$TRASH_DIRECTORY/full" && git fsck && -- 2.20.0.rc2.12.g4c11c11dec
[PATCH v3 3/3] Makefile: correct example fuzz build
Signed-off-by: Josh Steadmon --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index 6b72f37c29..bbcfc2bc9f 100644 --- a/Makefile +++ b/Makefile @@ -3104,7 +3104,7 @@ cover_db_html: cover_db # An example command to build against libFuzzer from LLVM 4.0.0: # # make CC=clang CXX=clang++ \ -# FUZZ_CXXFLAGS="-fsanitize-coverage=trace-pc-guard -fsanitize=address" \ +# CFLAGS="-fsanitize-coverage=trace-pc-guard -fsanitize=address" \ # LIB_FUZZING_ENGINE=/usr/lib/llvm-4.0/lib/libFuzzer.a \ # fuzz-all # -- 2.20.0.rc2.12.g4c11c11dec
Re: [PATCHv2 0/9] Resending sb/submodule-recursive-fetch-gets-the-tip
On 2018.11.28 16:27, Stefan Beller wrote: > This is a resend of sb/submodule-recursive-fetch-gets-the-tip, > with all feedback addressed. As it took some time, I'll send it > without range-diff, but would ask for full review. > > I plan on resending after the next release as this got delayed quite a bit, > which is why I also rebased it to master. > > Thanks, > Stefan I am not very familiar with most of the submodule code, but for what it's worth, this entire series looks good to me. I'll note that most of the commits caused some style complaints, but I'll leave it up to your judgement as to whether they're valid or not. Reviewed-by: Josh Steadmon
[PATCH v2 2/3] commit-graph: fix buffer read-overflow
fuzz-commit-graph identified a case where Git will read past the end of a buffer containing a commit graph if the graph's header has an incorrect chunk count. A simple bounds check in parse_commit_graph() prevents this. Signed-off-by: Josh Steadmon --- commit-graph.c | 14 -- t/t5318-commit-graph.sh | 28 2 files changed, 40 insertions(+), 2 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index 07dd410f3c..224a5f161e 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -165,10 +165,20 @@ struct commit_graph *parse_commit_graph(void *graph_map, int fd, last_chunk_offset = 8; chunk_lookup = data + 8; for (i = 0; i < graph->num_chunks; i++) { - uint32_t chunk_id = get_be32(chunk_lookup + 0); - uint64_t chunk_offset = get_be64(chunk_lookup + 4); + uint32_t chunk_id; + uint64_t chunk_offset; int chunk_repeated = 0; + if (chunk_lookup + GRAPH_CHUNKLOOKUP_WIDTH > + data + graph_size) { + error(_("chunk lookup table entry missing; graph file may be incomplete")); + free(graph); + return NULL; + } + + chunk_id = get_be32(chunk_lookup + 0); + chunk_offset = get_be64(chunk_lookup + 4); + chunk_lookup += GRAPH_CHUNKLOOKUP_WIDTH; if (chunk_offset > graph_size - GIT_MAX_RAWSZ) { diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh index 5fe21db99f..2503cb0345 100755 --- a/t/t5318-commit-graph.sh +++ b/t/t5318-commit-graph.sh @@ -384,6 +384,29 @@ corrupt_graph_and_verify() { test_i18ngrep "$grepstr" err } + +# usage: corrupt_and_zero_graph_then_verify +# Manipulates the commit-graph file at by inserting the data, +# then zeros the file starting at . Finally, runs +# 'git commit-graph verify' and places the output in the file 'err'. Tests 'err' +# for the given string. +corrupt_and_zero_graph_then_verify() { + corrupt_pos=$1 + data="${2:-\0}" + zero_pos=$3 + grepstr=$4 + orig_size=$(stat --format=%s $objdir/info/commit-graph) + cd "$TRASH_DIRECTORY/full" && + test_when_finished mv commit-graph-backup $objdir/info/commit-graph && + cp $objdir/info/commit-graph commit-graph-backup && + printf "$data" | dd of="$objdir/info/commit-graph" bs=1 seek="$corrupt_pos" conv=notrunc && + truncate --size=$zero_pos $objdir/info/commit-graph && + truncate --size=$orig_size $objdir/info/commit-graph && + test_must_fail git commit-graph verify 2>test_err && + grep -v "^+" test_err >err && + test_i18ngrep "$grepstr" err +} + test_expect_success 'detect bad signature' ' corrupt_graph_and_verify 0 "\0" \ "graph signature" @@ -484,6 +507,11 @@ test_expect_success 'detect invalid checksum hash' ' "incorrect checksum" ' +test_expect_success 'detect truncated graph' ' + corrupt_and_zero_graph_then_verify $GRAPH_BYTE_CHUNK_COUNT "\xff" \ + $GRAPH_CHUNK_LOOKUP_OFFSET "chunk lookup table entry missing" +' + test_expect_success 'git fsck (checks commit-graph)' ' cd "$TRASH_DIRECTORY/full" && git fsck && -- 2.20.0.rc2.10.g7519fc76df
[PATCH v2 3/3] Makefile: correct example fuzz build
Signed-off-by: Josh Steadmon --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index 6b72f37c29..bbcfc2bc9f 100644 --- a/Makefile +++ b/Makefile @@ -3104,7 +3104,7 @@ cover_db_html: cover_db # An example command to build against libFuzzer from LLVM 4.0.0: # # make CC=clang CXX=clang++ \ -# FUZZ_CXXFLAGS="-fsanitize-coverage=trace-pc-guard -fsanitize=address" \ +# CFLAGS="-fsanitize-coverage=trace-pc-guard -fsanitize=address" \ # LIB_FUZZING_ENGINE=/usr/lib/llvm-4.0/lib/libFuzzer.a \ # fuzz-all # -- 2.20.0.rc2.10.g7519fc76df
[PATCH v2 1/3] commit-graph, fuzz: Add fuzzer for commit-graph
Breaks load_commit_graph_one() into a new function, parse_commit_graph(). The latter function operates on arbitrary buffers, which makes it suitable as a fuzzing target. Since parse_commit_graph() is only called by load_commit_graph_one() (and the fuzzer described below), we omit error messages that would be duplicated by the caller. Adds fuzz-commit-graph.c, which provides a fuzzing entry point compatible with libFuzzer (and possibly other fuzzing engines). Signed-off-by: Josh Steadmon --- .gitignore | 1 + Makefile| 1 + commit-graph.c | 53 ++--- commit-graph.h | 3 +++ fuzz-commit-graph.c | 16 ++ 5 files changed, 57 insertions(+), 17 deletions(-) create mode 100644 fuzz-commit-graph.c diff --git a/.gitignore b/.gitignore index 0d77ea5894..8bcf153ed9 100644 --- a/.gitignore +++ b/.gitignore @@ -1,3 +1,4 @@ +/fuzz-commit-graph /fuzz_corpora /fuzz-pack-headers /fuzz-pack-idx diff --git a/Makefile b/Makefile index 1a44c811aa..6b72f37c29 100644 --- a/Makefile +++ b/Makefile @@ -684,6 +684,7 @@ SCRIPTS = $(SCRIPT_SH_INS) \ ETAGS_TARGET = TAGS +FUZZ_OBJS += fuzz-commit-graph.o FUZZ_OBJS += fuzz-pack-headers.o FUZZ_OBJS += fuzz-pack-idx.o diff --git a/commit-graph.c b/commit-graph.c index 40c855f185..07dd410f3c 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -84,16 +84,10 @@ static int commit_graph_compatible(struct repository *r) struct commit_graph *load_commit_graph_one(const char *graph_file) { void *graph_map; - const unsigned char *data, *chunk_lookup; size_t graph_size; struct stat st; - uint32_t i; - struct commit_graph *graph; + struct commit_graph *ret; int fd = git_open(graph_file); - uint64_t last_chunk_offset; - uint32_t last_chunk_id; - uint32_t graph_signature; - unsigned char graph_version, hash_version; if (fd < 0) return NULL; @@ -108,27 +102,55 @@ struct commit_graph *load_commit_graph_one(const char *graph_file) die(_("graph file %s is too small"), graph_file); } graph_map = xmmap(NULL, graph_size, PROT_READ, MAP_PRIVATE, fd, 0); + ret = parse_commit_graph(graph_map, fd, graph_size); + + if (!ret) { + munmap(graph_map, graph_size); + close(fd); + exit(1); + } + + return ret; +} + +struct commit_graph *parse_commit_graph(void *graph_map, int fd, + size_t graph_size) +{ + const unsigned char *data, *chunk_lookup; + uint32_t i; + struct commit_graph *graph; + uint64_t last_chunk_offset; + uint32_t last_chunk_id; + uint32_t graph_signature; + unsigned char graph_version, hash_version; + + if (!graph_map) + return NULL; + + if (graph_size < GRAPH_MIN_SIZE) + return NULL; + data = (const unsigned char *)graph_map; graph_signature = get_be32(data); if (graph_signature != GRAPH_SIGNATURE) { error(_("graph signature %X does not match signature %X"), graph_signature, GRAPH_SIGNATURE); - goto cleanup_fail; + return NULL; } graph_version = *(unsigned char*)(data + 4); if (graph_version != GRAPH_VERSION) { error(_("graph version %X does not match version %X"), graph_version, GRAPH_VERSION); - goto cleanup_fail; + return NULL; } hash_version = *(unsigned char*)(data + 5); if (hash_version != GRAPH_OID_VERSION) { error(_("hash version %X does not match version %X"), hash_version, GRAPH_OID_VERSION); - goto cleanup_fail; + return NULL; } graph = alloc_commit_graph(); @@ -152,7 +174,8 @@ struct commit_graph *load_commit_graph_one(const char *graph_file) if (chunk_offset > graph_size - GIT_MAX_RAWSZ) { error(_("improper chunk offset %08x%08x"), (uint32_t)(chunk_offset >> 32), (uint32_t)chunk_offset); - goto cleanup_fail; + free(graph); + return NULL; } switch (chunk_id) { @@ -187,7 +210,8 @@ struct commit_graph *load_commit_graph_one(const char *graph_file) if (chunk_repeated) { error(_("chunk id %08x appears multiple times"), chunk_id); - goto cleanup_fail; + free(graph); + return NULL; } if (last_chunk_id == GRAPH_CHUNKID_OIDLOOKUP) @@ -201,11 +225,6 @@ struct commit_graph *load_commit_graph_one
[PATCH v2 0/3] Add commit-graph fuzzer and fix buffer overflow
Add a new fuzz test for the commit graph and fix a buffer read-overflow that it discovered. Additionally, fix the Makefile instructions for building fuzzers. Changes since V1: * Moved the parse_commit_graph() declaration to the header file, since we don't mind if others use it. * Moved some unnecessary comments into commit messages. * Fixed some style issues. * Added a test case for detecting commit graphs with missing chunk lookup entries. * Ævar's comments on the Makefile made me realize the fuzzer build instructions were using the wrong variable. Added a new commit to fix this. Josh Steadmon (3): commit-graph, fuzz: Add fuzzer for commit-graph commit-graph: fix buffer read-overflow Makefile: correct example fuzz build .gitignore | 1 + Makefile| 3 +- commit-graph.c | 67 + commit-graph.h | 3 ++ fuzz-commit-graph.c | 16 ++ t/t5318-commit-graph.sh | 28 + 6 files changed, 98 insertions(+), 20 deletions(-) create mode 100644 fuzz-commit-graph.c Range-diff against v1: 1: 53e62baaa8 ! 1: 0b57ecbe1b commit-graph, fuzz: Add fuzzer for commit-graph @@ -4,7 +4,9 @@ Breaks load_commit_graph_one() into a new function, parse_commit_graph(). The latter function operates on arbitrary buffers, -which makes it suitable as a fuzzing target. +which makes it suitable as a fuzzing target. Since parse_commit_graph() +is only called by load_commit_graph_one() (and the fuzzer described +below), we omit error messages that would be duplicated by the caller. Adds fuzz-commit-graph.c, which provides a fuzzing entry point compatible with libFuzzer (and possibly other fuzzing engines). @@ -35,17 +37,6 @@ diff --git a/commit-graph.c b/commit-graph.c --- a/commit-graph.c +++ b/commit-graph.c -@@ - #define GRAPH_MIN_SIZE (GRAPH_HEADER_SIZE + 4 * GRAPH_CHUNKLOOKUP_WIDTH \ - + GRAPH_FANOUT_SIZE + GRAPH_OID_LEN) - -+struct commit_graph *parse_commit_graph(void *graph_map, int fd, -+ size_t graph_size); -+ -+ - char *get_commit_graph_filename(const char *obj_dir) - { - return xstrfmt("%s/info/commit-graph", obj_dir); @@ struct commit_graph *load_commit_graph_one(const char *graph_file) { @@ -70,7 +61,7 @@ graph_map = xmmap(NULL, graph_size, PROT_READ, MAP_PRIVATE, fd, 0); + ret = parse_commit_graph(graph_map, fd, graph_size); + -+ if (ret == NULL) { ++ if (!ret) { + munmap(graph_map, graph_size); + close(fd); + exit(1); @@ -79,10 +70,6 @@ + return ret; +} + -+/* -+ * This function is intended to be used only from load_commit_graph_one() or in -+ * fuzz tests. -+ */ +struct commit_graph *parse_commit_graph(void *graph_map, int fd, + size_t graph_size) +{ @@ -94,11 +81,9 @@ + uint32_t graph_signature; + unsigned char graph_version, hash_version; + -+ /* -+ * This should already be checked in load_commit_graph_one, but we still -+ * need a check here for when we're calling parse_commit_graph directly -+ * from fuzz tests. We can omit the error message in that case. -+ */ ++ if (!graph_map) ++ return NULL; ++ + if (graph_size < GRAPH_MIN_SIZE) + return NULL; + @@ -162,12 +147,25 @@ static void prepare_commit_graph_one(struct repository *r, const char *obj_dir) + diff --git a/commit-graph.h b/commit-graph.h + --- a/commit-graph.h + +++ b/commit-graph.h +@@ + + struct commit_graph *load_commit_graph_one(const char *graph_file); + ++struct commit_graph *parse_commit_graph(void *graph_map, int fd, ++ size_t graph_size); ++ + /* + * Return 1 if and only if the repository has a commit-graph + * file and generation numbers are computed in that file. + diff --git a/fuzz-commit-graph.c b/fuzz-commit-graph.c new file mode 100644 --- /dev/null +++ b/fuzz-commit-graph.c @@ -+#include "object-store.h" +#include "commit-graph.h" + +struct commit_graph *parse_commit_graph(void *graph_map, int fd, @@ -179,9 +177,8 @@ +{ + struct commit_graph *g; + -+ g = parse_commit_graph((void *) data, -1, size); -+ if (g) -+ free(g); ++ g = parse_commit_graph((void *)data, -1, size); ++ free(g); + + return 0; +} 2: ad2e761f44 ! 2: af45c2337f commit-graph: fix buffer read-overflow @@ -22,7 +22,8 @@ + uint64_t chunk_offset;
Re: [PATCH 1/2] commit-graph, fuzz: Add fuzzer for commit-graph
On 2018.12.05 23:48, Ævar Arnfjörð Bjarmason wrote: > > On Wed, Dec 05 2018, Josh Steadmon wrote: > > > Breaks load_commit_graph_one() into a new function, > > parse_commit_graph(). The latter function operates on arbitrary buffers, > > which makes it suitable as a fuzzing target. > > > > Adds fuzz-commit-graph.c, which provides a fuzzing entry point > > compatible with libFuzzer (and possibly other fuzzing engines). > > > > Signed-off-by: Josh Steadmon > > --- > > .gitignore | 1 + > > Makefile| 1 + > > commit-graph.c | 63 + > > fuzz-commit-graph.c | 18 + > > 4 files changed, 66 insertions(+), 17 deletions(-) > > create mode 100644 fuzz-commit-graph.c > > > > diff --git a/.gitignore b/.gitignore > > index 0d77ea5894..8bcf153ed9 100644 > > --- a/.gitignore > > +++ b/.gitignore > > @@ -1,3 +1,4 @@ > > +/fuzz-commit-graph > > /fuzz_corpora > > /fuzz-pack-headers > > /fuzz-pack-idx > > diff --git a/Makefile b/Makefile > > index 1a44c811aa..6b72f37c29 100644 > > --- a/Makefile > > +++ b/Makefile > > @@ -684,6 +684,7 @@ SCRIPTS = $(SCRIPT_SH_INS) \ > > > > ETAGS_TARGET = TAGS > > > > +FUZZ_OBJS += fuzz-commit-graph.o > > FUZZ_OBJS += fuzz-pack-headers.o > > FUZZ_OBJS += fuzz-pack-idx.o > > > > diff --git a/commit-graph.c b/commit-graph.c > > index 40c855f185..0755359b1a 100644 > > --- a/commit-graph.c > > +++ b/commit-graph.c > > @@ -46,6 +46,10 @@ > > #define GRAPH_MIN_SIZE (GRAPH_HEADER_SIZE + 4 * GRAPH_CHUNKLOOKUP_WIDTH \ > > + GRAPH_FANOUT_SIZE + GRAPH_OID_LEN) > > > > +struct commit_graph *parse_commit_graph(void *graph_map, int fd, > > + size_t graph_size); > > + > > + > > char *get_commit_graph_filename(const char *obj_dir) > > { > > return xstrfmt("%s/info/commit-graph", obj_dir); > > @@ -84,16 +88,10 @@ static int commit_graph_compatible(struct repository *r) > > struct commit_graph *load_commit_graph_one(const char *graph_file) > > { > > void *graph_map; > > - const unsigned char *data, *chunk_lookup; > > size_t graph_size; > > struct stat st; > > - uint32_t i; > > - struct commit_graph *graph; > > + struct commit_graph *ret; > > int fd = git_open(graph_file); > > - uint64_t last_chunk_offset; > > - uint32_t last_chunk_id; > > - uint32_t graph_signature; > > - unsigned char graph_version, hash_version; > > > > if (fd < 0) > > return NULL; > > @@ -108,27 +106,61 @@ struct commit_graph *load_commit_graph_one(const char > > *graph_file) > > die(_("graph file %s is too small"), graph_file); > > } > > graph_map = xmmap(NULL, graph_size, PROT_READ, MAP_PRIVATE, fd, 0); > > + ret = parse_commit_graph(graph_map, fd, graph_size); > > + > > + if (ret == NULL) { > > Code in git usually uses just !ret. Will fix in V2, thanks. > > + munmap(graph_map, graph_size); > > + close(fd); > > + exit(1); > > Ouch, exit(1) from load_commit_graph_one()? Can't we return NULL here > instead? Nasty to exit from a library routine, but then I see later... > > > @@ -201,11 +235,6 @@ struct commit_graph *load_commit_graph_one(const char > > *graph_file) > > } > > > > return graph; > > - > > -cleanup_fail: > > - munmap(graph_map, graph_size); > > - close(fd); > > - exit(1); > > } > > ... ah, I see this is where you got the exit(1) from. So it was there > already. > > > static void prepare_commit_graph_one(struct repository *r, const char > > *obj_dir) > > diff --git a/fuzz-commit-graph.c b/fuzz-commit-graph.c > > new file mode 100644 > > index 00..420851d0d2 > > --- /dev/null > > +++ b/fuzz-commit-graph.c > > @@ -0,0 +1,18 @@ > > +#include "object-store.h" > > +#include "commit-graph.h" > > + > > +struct commit_graph *parse_commit_graph(void *graph_map, int fd, > > + size_t graph_size); > > + > > +int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size); > > + > > +int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) > > +{ > > + struct commit_graph *g; > > + > > + g = parse_commit_graph((void *) data, -1, size); > > + if (g) > > + free(g); > > + > > + return 0; > > +} > > I hadn't looked at this before, but see your 5e47215080 ("fuzz: add > basic fuzz testing target.", 2018-10-12) for some prior art. > > There's instructions there for a very long "make" invocation. Would be > nice if this were friendlier and we could just do "make test-fuzz" or > something... Yeah, the problem is that there are too many combinations of fuzzing engine, sanitizer, and compiler to make any reasonable default here. Even if you just stick with libFuzzer, address sanitizer, and clang, the flags change radically depending on which version of clang you're using.
[PATCH 2/2] commit-graph: fix buffer read-overflow
fuzz-commit-graph identified a case where Git will read past the end of a buffer containing a commit graph if the graph's header has an incorrect chunk count. A simple bounds check in parse_commit_graph() prevents this. Signed-off-by: Josh Steadmon Helped-by: Derrick Stolee --- commit-graph.c | 13 +++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index 0755359b1a..fee171a5f3 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -175,10 +175,19 @@ struct commit_graph *parse_commit_graph(void *graph_map, int fd, last_chunk_offset = 8; chunk_lookup = data + 8; for (i = 0; i < graph->num_chunks; i++) { - uint32_t chunk_id = get_be32(chunk_lookup + 0); - uint64_t chunk_offset = get_be64(chunk_lookup + 4); + uint32_t chunk_id; + uint64_t chunk_offset; int chunk_repeated = 0; + if (chunk_lookup + GRAPH_CHUNKLOOKUP_WIDTH > data + graph_size) { + error(_("chunk lookup table entry missing; graph file may be incomplete")); + free(graph); + return NULL; + } + + chunk_id = get_be32(chunk_lookup + 0); + chunk_offset = get_be64(chunk_lookup + 4); + chunk_lookup += GRAPH_CHUNKLOOKUP_WIDTH; if (chunk_offset > graph_size - GIT_MAX_RAWSZ) { -- 2.20.0.rc2.403.gdbc3b29805-goog
[PATCH 1/2] commit-graph, fuzz: Add fuzzer for commit-graph
Breaks load_commit_graph_one() into a new function, parse_commit_graph(). The latter function operates on arbitrary buffers, which makes it suitable as a fuzzing target. Adds fuzz-commit-graph.c, which provides a fuzzing entry point compatible with libFuzzer (and possibly other fuzzing engines). Signed-off-by: Josh Steadmon --- .gitignore | 1 + Makefile| 1 + commit-graph.c | 63 + fuzz-commit-graph.c | 18 + 4 files changed, 66 insertions(+), 17 deletions(-) create mode 100644 fuzz-commit-graph.c diff --git a/.gitignore b/.gitignore index 0d77ea5894..8bcf153ed9 100644 --- a/.gitignore +++ b/.gitignore @@ -1,3 +1,4 @@ +/fuzz-commit-graph /fuzz_corpora /fuzz-pack-headers /fuzz-pack-idx diff --git a/Makefile b/Makefile index 1a44c811aa..6b72f37c29 100644 --- a/Makefile +++ b/Makefile @@ -684,6 +684,7 @@ SCRIPTS = $(SCRIPT_SH_INS) \ ETAGS_TARGET = TAGS +FUZZ_OBJS += fuzz-commit-graph.o FUZZ_OBJS += fuzz-pack-headers.o FUZZ_OBJS += fuzz-pack-idx.o diff --git a/commit-graph.c b/commit-graph.c index 40c855f185..0755359b1a 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -46,6 +46,10 @@ #define GRAPH_MIN_SIZE (GRAPH_HEADER_SIZE + 4 * GRAPH_CHUNKLOOKUP_WIDTH \ + GRAPH_FANOUT_SIZE + GRAPH_OID_LEN) +struct commit_graph *parse_commit_graph(void *graph_map, int fd, + size_t graph_size); + + char *get_commit_graph_filename(const char *obj_dir) { return xstrfmt("%s/info/commit-graph", obj_dir); @@ -84,16 +88,10 @@ static int commit_graph_compatible(struct repository *r) struct commit_graph *load_commit_graph_one(const char *graph_file) { void *graph_map; - const unsigned char *data, *chunk_lookup; size_t graph_size; struct stat st; - uint32_t i; - struct commit_graph *graph; + struct commit_graph *ret; int fd = git_open(graph_file); - uint64_t last_chunk_offset; - uint32_t last_chunk_id; - uint32_t graph_signature; - unsigned char graph_version, hash_version; if (fd < 0) return NULL; @@ -108,27 +106,61 @@ struct commit_graph *load_commit_graph_one(const char *graph_file) die(_("graph file %s is too small"), graph_file); } graph_map = xmmap(NULL, graph_size, PROT_READ, MAP_PRIVATE, fd, 0); + ret = parse_commit_graph(graph_map, fd, graph_size); + + if (ret == NULL) { + munmap(graph_map, graph_size); + close(fd); + exit(1); + } + + return ret; +} + +/* + * This function is intended to be used only from load_commit_graph_one() or in + * fuzz tests. + */ +struct commit_graph *parse_commit_graph(void *graph_map, int fd, + size_t graph_size) +{ + const unsigned char *data, *chunk_lookup; + uint32_t i; + struct commit_graph *graph; + uint64_t last_chunk_offset; + uint32_t last_chunk_id; + uint32_t graph_signature; + unsigned char graph_version, hash_version; + + /* +* This should already be checked in load_commit_graph_one, but we still +* need a check here for when we're calling parse_commit_graph directly +* from fuzz tests. We can omit the error message in that case. +*/ + if (graph_size < GRAPH_MIN_SIZE) + return NULL; + data = (const unsigned char *)graph_map; graph_signature = get_be32(data); if (graph_signature != GRAPH_SIGNATURE) { error(_("graph signature %X does not match signature %X"), graph_signature, GRAPH_SIGNATURE); - goto cleanup_fail; + return NULL; } graph_version = *(unsigned char*)(data + 4); if (graph_version != GRAPH_VERSION) { error(_("graph version %X does not match version %X"), graph_version, GRAPH_VERSION); - goto cleanup_fail; + return NULL; } hash_version = *(unsigned char*)(data + 5); if (hash_version != GRAPH_OID_VERSION) { error(_("hash version %X does not match version %X"), hash_version, GRAPH_OID_VERSION); - goto cleanup_fail; + return NULL; } graph = alloc_commit_graph(); @@ -152,7 +184,8 @@ struct commit_graph *load_commit_graph_one(const char *graph_file) if (chunk_offset > graph_size - GIT_MAX_RAWSZ) { error(_("improper chunk offset %08x%08x"), (uint32_t)(chunk_offset >> 32), (uint32_t)chunk_offset); - goto cleanup_fail; + free(graph); + return NULL; }
[PATCH 0/2] Add commit-graph fuzzer and fix buffer overflow
Add a new fuzz test for the commit graph and fix a buffer read-overflow that it discovered. Josh Steadmon (2): commit-graph, fuzz: Add fuzzer for commit-graph commit-graph: fix buffer read-overflow .gitignore | 1 + Makefile| 1 + commit-graph.c | 76 + fuzz-commit-graph.c | 18 +++ 4 files changed, 77 insertions(+), 19 deletions(-) create mode 100644 fuzz-commit-graph.c -- 2.20.0.rc2.403.gdbc3b29805-goog
[PATCH v5 1/1] protocol: advertise multiple supported versions
Currently the client advertises that it supports the wire protocol version set in the protocol.version config. However, not all services support the same set of protocol versions. For example, git-receive-pack supports v1 and v0, but not v2. If a client connects to git-receive-pack and requests v2, it will instead be downgraded to v0. Other services, such as git-upload-archive, do not do any version negotiation checks. This patch creates a protocol version registry. Individual client and server programs register all the protocol versions they support prior to communicating with a remote instance. Versions should be listed in preference order; the version specified in protocol.version will automatically be moved to the front of the registry. The protocol version registry is passed to remote helpers via the GIT_PROTOCOL environment variable. Clients now advertise the full list of registered versions. Servers select the first allowed version from this advertisement. Additionally, remove special cases around advertising version=0. Previously we avoided adding version negotiation fields in server responses if it looked like the client wanted v0. However, including these fields does not change behavior, so it's better to have simpler code. While we're at it, remove unnecessary externs from function declarations in protocol.h. Signed-off-by: Josh Steadmon --- builtin/archive.c | 3 + builtin/clone.c | 4 ++ builtin/fetch-pack.c| 4 ++ builtin/fetch.c | 5 ++ builtin/ls-remote.c | 5 ++ builtin/pull.c | 5 ++ builtin/push.c | 4 ++ builtin/receive-pack.c | 3 + builtin/send-pack.c | 3 + builtin/upload-archive.c| 3 + builtin/upload-pack.c | 4 ++ connect.c | 52 +++ protocol.c | 122 +--- protocol.h | 23 ++- remote-curl.c | 27 +--- t/t5551-http-fetch-smart.sh | 1 + t/t5570-git-daemon.sh | 2 +- t/t5601-clone.sh| 38 +-- t/t5700-protocol-v1.sh | 8 +-- t/t5702-protocol-v2.sh | 16 +++-- transport-helper.c | 6 ++ 21 files changed, 256 insertions(+), 82 deletions(-) diff --git a/builtin/archive.c b/builtin/archive.c index e74f675390..8adb9f381b 100644 --- a/builtin/archive.c +++ b/builtin/archive.c @@ -8,6 +8,7 @@ #include "transport.h" #include "parse-options.h" #include "pkt-line.h" +#include "protocol.h" #include "sideband.h" static void create_output_file(const char *output_file) @@ -94,6 +95,8 @@ int cmd_archive(int argc, const char **argv, const char *prefix) OPT_END() }; + register_allowed_protocol_version(protocol_v0); + argc = parse_options(argc, argv, prefix, local_opts, NULL, PARSE_OPT_KEEP_ALL); diff --git a/builtin/clone.c b/builtin/clone.c index fd2c3ef090..1651a950b6 100644 --- a/builtin/clone.c +++ b/builtin/clone.c @@ -900,6 +900,10 @@ int cmd_clone(int argc, const char **argv, const char *prefix) struct refspec rs = REFSPEC_INIT_FETCH; struct argv_array ref_prefixes = ARGV_ARRAY_INIT; + register_allowed_protocol_version(protocol_v2); + register_allowed_protocol_version(protocol_v1); + register_allowed_protocol_version(protocol_v0); + fetch_if_missing = 0; packet_trace_identity("clone"); diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c index 1a1bc63566..cba935e4d3 100644 --- a/builtin/fetch-pack.c +++ b/builtin/fetch-pack.c @@ -57,6 +57,10 @@ int cmd_fetch_pack(int argc, const char **argv, const char *prefix) fetch_if_missing = 0; + register_allowed_protocol_version(protocol_v2); + register_allowed_protocol_version(protocol_v1); + register_allowed_protocol_version(protocol_v0); + packet_trace_identity("fetch-pack"); memset(, 0, sizeof(args)); diff --git a/builtin/fetch.c b/builtin/fetch.c index 61bec5d213..2a20cf8bfd 100644 --- a/builtin/fetch.c +++ b/builtin/fetch.c @@ -21,6 +21,7 @@ #include "argv-array.h" #include "utf8.h" #include "packfile.h" +#include "protocol.h" #include "list-objects-filter-options.h" static const char * const builtin_fetch_usage[] = { @@ -1476,6 +1477,10 @@ int cmd_fetch(int argc, const char **argv, const char *prefix) int prune_tags_ok = 1; struct argv_array argv_gc_auto = ARGV_ARRAY_INIT; + register_allowed_protocol_version(protocol_v2); + register_allowed_protocol_version(protocol_v1); + register_allowed_protocol_version(protocol_v0); + packet_trace_identity("fetch"); fetch_if_missing = 0; diff --git a/builtin/ls-remote.c b/builtin/ls-remote.c index 1a25df7ee1..ea685e8bb9 100644 -
[PATCH v5 0/1] Advertise multiple supported proto versions
Changes from V4: remove special cases around advertising version=0. Add a registry of supported wire protocol versions that individual commands can use to declare supported versions before contacting a server. The client will then advertise all supported versions, while the server will choose the first allowed version from the advertised list. Every command that acts as a client or server must now register its supported protocol versions. Josh Steadmon (1): protocol: advertise multiple supported versions builtin/archive.c | 3 + builtin/clone.c | 4 ++ builtin/fetch-pack.c| 4 ++ builtin/fetch.c | 5 ++ builtin/ls-remote.c | 5 ++ builtin/pull.c | 5 ++ builtin/push.c | 4 ++ builtin/receive-pack.c | 3 + builtin/send-pack.c | 3 + builtin/upload-archive.c| 3 + builtin/upload-pack.c | 4 ++ connect.c | 52 +++ protocol.c | 122 +--- protocol.h | 23 ++- remote-curl.c | 27 +--- t/t5551-http-fetch-smart.sh | 1 + t/t5570-git-daemon.sh | 2 +- t/t5601-clone.sh| 38 +-- t/t5700-protocol-v1.sh | 8 +-- t/t5702-protocol-v2.sh | 16 +++-- transport-helper.c | 6 ++ 21 files changed, 256 insertions(+), 82 deletions(-) Range-diff against v4: 1: 3c023991c0 ! 1: 60f6f2fbd8 protocol: advertise multiple supported versions @@ -21,6 +21,12 @@ Clients now advertise the full list of registered versions. Servers select the first allowed version from this advertisement. +Additionally, remove special cases around advertising version=0. +Previously we avoided adding version negotiation fields in server +responses if it looked like the client wanted v0. However, including +these fields does not change behavior, so it's better to have simpler +code. + While we're at it, remove unnecessary externs from function declarations in protocol.h. @@ -245,18 +251,21 @@ { struct child_process *conn; @@ + prog, path, 0, target_host, 0); - /* If using a new version put that stuff here after a second null byte */ +- /* If using a new version put that stuff here after a second null byte */ - if (version > 0) { -+ if (strcmp(version_advert->buf, "version=0")) { - strbuf_addch(, '\0'); +- strbuf_addch(, '\0'); - strbuf_addf(, "version=%d%c", - version, '\0'); -+ strbuf_addf(, "%s%c", version_advert->buf, '\0'); - } +- } ++ /* Add version fields after a second null byte */ ++ strbuf_addch(, '\0'); ++ strbuf_addf(, "%s%c", version_advert->buf, '\0'); packet_write(fd[1], request.buf, request.len); + @@ */ static void push_ssh_options(struct argv_array *args, struct argv_array *env, @@ -264,9 +273,9 @@ - enum protocol_version version, int flags) + const struct strbuf *version_advert, int flags) { - if (variant == VARIANT_SSH && +- if (variant == VARIANT_SSH && - version > 0) { -+ strcmp(version_advert->buf, "version=0")) { ++ if (variant == VARIANT_SSH) { argv_array_push(args, "-o"); argv_array_push(args, "SendEnv=" GIT_PROTOCOL_ENVIRONMENT); - argv_array_pushf(env, GIT_PROTOCOL_ENVIRONMENT "=version=%d", @@ -346,13 +355,13 @@ - if (version > 0) { - argv_array_pushf(>env_array, GIT_PROTOCOL_ENVIRONMENT "=version=%d", - version); -+ if (strcmp(version_advert.buf, "version=0")) { -+ argv_array_pushf(>env_array, -+ GIT_PROTOCOL_ENVIRONMENT "=%s", -+ version_advert.buf); - } +- } ++ argv_array_pushf(>env_array, ++ GIT_PROTOCOL_ENVIRONMENT "=%s", ++ version_advert.buf); } argv_array_push(>args, cmd.buf); + diff --git a/protocol.c b/protocol.c --- a/protocol.c @@ -573,8 +582,7 @@ +static int get_client_protocol_http_header(const struct strbuf *version_advert, + struct strbuf *header) +{ -+ if (version_advert
Re: [PATCH 2/3] remote-curl: tighten "version 2" check for smart-http
On 2018.11.16 03:48, Jeff King wrote: > In a v2 smart-http conversation, the server should reply to our initial > request with a pkt-line saying "version 2" (this is the start of the > "capabilities advertisement"). We check for the string using > starts_with(), but that's overly permissive (it would match "version > 20", for example). > > Let's tighten this check to use strcmp(). Note that we don't need to > worry about a trailing newline here, because the ptk-line code will have > chomped it for us already. > > Signed-off-by: Jeff King > --- > remote-curl.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/remote-curl.c b/remote-curl.c > index dd9bc41aa1..3c9c4a07c3 100644 > --- a/remote-curl.c > +++ b/remote-curl.c > @@ -375,7 +375,7 @@ static void check_smart_http(struct discovery *d, const > char *service, > d->len = src_len; > d->proto_git = 1; > > - } else if (starts_with(line, "version 2")) { > + } else if (!strcmp(line, "version 2")) { > /* >* v2 smart http; do not consume version packet, which will >* be handled elsewhere. > -- > 2.19.1.1636.gc7a073d580 > Looks good to me. Reviewed-by: Josh Steadmon
Re: [PATCH 1/3] remote-curl: refactor smart-http discovery
On 2018.11.16 03:47, Jeff King wrote: > After making initial contact with an http server, we have to decide if > the server supports smart-http, and if so, which version. Our rules are > a bit inconsistent: > > 1. For v0, we require that the content-type indicates a smart-http > response. We also require the response to look vaguely like a > pkt-line starting with "#". If one of those does not match, we fall > back to dumb-http. > > But according to our http protocol spec[1]: > >Dumb servers MUST NOT return a return type starting with >`application/x-git-`. > > If we see the expected content-type, we should consider it > smart-http. At that point we can parse the pkt-line for real, and > complain if it is not syntactically valid. > > 2. For v2, we do not actually check the content-type. Our v2 protocol > spec says[2]: > >When using the http:// or https:// transport a client makes a >"smart" info/refs request as described in `http-protocol.txt`[...] > > and the http spec is clear that for a smart-http[3]: > >The Content-Type MUST be `application/x-$servicename-advertisement`. > > So it is required according to the spec. > > These inconsistencies were easy to miss because of the way the original > code was written as an inline conditional. Let's pull it out into its > own function for readability, and improve a few things: > > - we now predicate the smart/dumb decision entirely on the presence of >the correct content-type > > - we do a real pkt-line parse before deciding how to proceed (and die >if it isn't valid) > > - use skip_prefix() for comparing service strings, instead of >constructing expected output in a strbuf; this avoids dealing with >memory cleanup > > Note that this _is_ tightening what the client will allow. It's all > according to the spec, but it's possible that other implementations > might violate these. However, violating these particular rules seems > like an odd choice for a server to make. > > [1] Documentation/technical/http-protocol.txt, l. 166-167 > [2] Documentation/technical/protocol-v2.txt, l. 63-64 > [3] Documentation/technical/http-protocol.txt, l. 247 > > Helped-by: Josh Steadmon > Signed-off-by: Jeff King > --- > remote-curl.c | 93 --- > 1 file changed, 59 insertions(+), 34 deletions(-) > > diff --git a/remote-curl.c b/remote-curl.c > index 762a55a75f..dd9bc41aa1 100644 > --- a/remote-curl.c > +++ b/remote-curl.c > @@ -330,9 +330,65 @@ static int get_protocol_http_header(enum > protocol_version version, > return 0; > } > > +static void check_smart_http(struct discovery *d, const char *service, > + struct strbuf *type) > +{ > + char *src_buf; > + size_t src_len; > + char *line; > + const char *p; > + > + /* > + * If we don't see x-$service-advertisement, then it's not smart-http. > + * But once we do, we commit to it and assume any other protocol > + * violations are hard errors. > + */ > + if (!skip_prefix(type->buf, "application/x-", ) || > + !skip_prefix(p, service, ) || > + strcmp(p, "-advertisement")) > + return; > + > + /* > + * "Peek" at the first packet by using a separate buf/len pair; some > + * cases below require us leaving the originals intact. > + */ > + src_buf = d->buf; > + src_len = d->len; > + line = packet_read_line_buf(_buf, _len, NULL); > + if (!line) > + die("invalid server response; expected service, got flush > packet"); > + > + if (skip_prefix(line, "# service=", ) && !strcmp(p, service)) { > + /* > + * The header can include additional metadata lines, up > + * until a packet flush marker. Ignore these now, but > + * in the future we might start to scan them. > + */ > + while (packet_read_line_buf(_buf, _len, NULL)) > + ; > + > + /* > + * v0 smart http; callers expect us to soak up the > + * service and header packets > + */ > + d->buf = src_buf; > + d->len = src_len; > + d->proto_git = 1; > + > + } else if (starts_with(line, "version 2")) { > + /* > + * v2 smart http; do not consume version packet, which will > + * be handled elsewhere
Re: [PATCH 0/3] remote-curl smart-http discovery cleanup
On 2018.11.16 03:44, Jeff King wrote: [...] > Amusingly, this does break the test you just added, because it tries to > issue an ERR after claiming "text/html" (and after my patch, we > correctly fall back to dumb-http). Heh yeah, I copied the script from a dumb-http test without reading the spec.
Re: [PATCH v3 1/1] protocol: advertise multiple supported versions
On 2018.11.16 11:45, Junio C Hamano wrote: > Josh Steadmon writes: > > >> What I was alludding to was a lot simpler, though. An advert string > >> "version=0:version=1" from a client that prefers version 0 won't be > >> !strcmp("version=0", advert) but seeing many strcmp() in the patch > >> made me wonder. > > > > Ah I see. The strcmp()s against "version=0" are special cases for where > > it looks like the client does not understand any sort of version > > negotiation. If we see multiple versions listed in the advert, then the > > rest of the selection logic should do the right thing. > > OK, let me try to see if I understand correctly by rephrasing. > > If the client does not say anything about which version it prefers > (because it only knows about version 0 without even realizing that > there is a version negotiation exchange), we substitute the lack of > proposed versions with string "version=0", and the strcmp()s I saw > and was puzzled by are all about special casing such a client. But > we would end up behaving the same way (at least when judged only by > externally visible effects) to a client that is aware of version > negotiation and tells us it prefers version 0 (and nothing else) > with the selection logic anyway. > > Did I get it right? If so, yeah, I think it makes sense to avoid > two codepaths that ends up doing the same thing by removing the > special case. Yes, that is correct. The next version will remove the special cases here. > > However, I think that it might work to remove the special cases. In the > > event that the client is so old that it doesn't understand any form of > > version negotiation, it should just ignore the version fields / > > environment vars. If you think it's cleaner to remove the special cases, > > let me know.
Re: [PATCH] remote-curl: die on server-side errors
On 2018.11.14 02:00, Jeff King wrote: > On Tue, Nov 13, 2018 at 07:49:15PM -0500, Jeff King wrote: > > > Yes, the packet_read_line_buf() interface will both advance the buf > > pointer and decrement the length. So if we want to "peek", we have to > > do so with a copy (there's a peek function if you use the packet_reader > > interface, but that might be overkill here). > > > > You can rewrite it like this, which is a pretty faithful conversion and > > passes the tests (but see below). > > [...] > > Here's a version which is less faithful, but I think does sensible > things in all cases, and is much easier to follow. I get a little > nervous just because it tightens some cases, and one never knows if > other implementations might be relying on the looseness. E.g.: > > - in the current code we'd still drop back to dumb http if the server > tells us "application/x-git-upload-pack" but the initial pktline > doesn't start with "#" (even though if it _does_ have "#" at > position 5 but isn't a valid pktline, we'd complain!) > > - right now the v2 protocol does not require the server to say > "application/x-git-upload-pack" for the content-type > > This patch tightens both of those (I also made a few stylistic tweaks, > and added the ERR condition to show where it would go). I dunno. Part of > me sees this as a nice cleanup, but maybe it is better to just leave it > alone. A lot of these behaviors are just how it happens to work now, and > not part of the spec, but we don't know what might be relying on them. At least according to the protocol-v2 and http-protocol docs, the stricter behavior seems correct: For the first point above, dumb servers should never use an "application/x-git-*" content type (http-protocol.txt line 163-167). For the second point, the docs require v2 servers to use "application/x-git-*" content types. protocol-v2.txt lines 63-65 state that v2 clients should make a smart http request, while http-protocol.txt lines 247-252 state that a smart server's response type must be "application/x-git-*". Of course we don't know if other implementations follow the spec, but ISTM that this patch at least doesn't contradict how we've promised the protocols should work. If no one has any objections, I'll include the diff below in v2. Thanks for the help Jeff! > diff --git a/remote-curl.c b/remote-curl.c > index 762a55a75f..1adb96311b 100644 > --- a/remote-curl.c > +++ b/remote-curl.c > @@ -330,9 +330,61 @@ static int get_protocol_http_header(enum > protocol_version version, > return 0; > } > > +static void check_smart_http(struct discovery *d, const char *service, > + struct strbuf *type) > +{ > + char *src_buf; > + size_t src_len; > + char *line; > + const char *p; > + > + if (!skip_prefix(type->buf, "application/x-", ) || > + !skip_prefix(p, service, ) || > + strcmp(p, "-advertisement")) > + return; > + > + /* > + * We speculatively try to read a packet, which means we must preserve > + * the original buf/len pair in some cases. > + */ > + src_buf = d->buf; > + src_len = d->len; > + line = packet_read_line_buf(_buf, _len, NULL); > + if (!line) > + die("invalid server response; expected service, got flush > packet"); > + > + if (skip_prefix(line, "# service=", ) && !strcmp(p, service)) { > + /* > + * The header can include additional metadata lines, up > + * until a packet flush marker. Ignore these now, but > + * in the future we might start to scan them. > + */ > + while (packet_read_line_buf(_buf, _len, NULL)) > + ; > + > + /* > + * v0 smart http; callers expect us to soak up the > + * service and header packets > + */ > + d->buf = src_buf; > + d->len = src_len; > + d->proto_git = 1; > + > + } else if (starts_with(line, "version 2")) { /* should be strcmp()? */ > + /* > + * v2 smart http; do not consume version packet, which will > + * be handled elsewhere. > + */ > + d->proto_git = 1; > + } else if (skip_prefix(line, "ERR ", )) { > + die(_("remote error: %s"), p); > + } else { > + die("invalid server response; got '%s'", line); > + } > +} > + > static struct discovery *discover_refs(const char *service, int for_push) > { > - struct strbuf exp = STRBUF_INIT; > struct strbuf type = STRBUF_INIT; > struct strbuf charset = STRBUF_INIT; > struct strbuf buffer = STRBUF_INIT; > @@ -405,38 +457,8 @@ static struct discovery *discover_refs(const char > *service, int for_push) > last->buf_alloc = strbuf_detach(, >len); > last->buf = last->buf_alloc; > > - strbuf_addf(, "application/x-%s-advertisement", service); > - if
Re: [PATCH v3 1/1] protocol: advertise multiple supported versions
On 2018.11.14 11:38, Junio C Hamano wrote: > Josh Steadmon writes: > > > On 2018.11.13 13:01, Junio C Hamano wrote: > >> I am wondering if the code added by this patch outside this > >> function, with if (strcmp(client_ad.buf, "version=0") sprinkled all > >> over the place, works sensibly when the other side says "I prefer > >> version=0 but I do not mind talking version=1". > > > > Depends on what you mean by "sensibly" :). In the current case, if the > > client prefers v0, it will always end up using v0. After the fixes > > described above, it will always use v0 unless the server no longer > > supports v0. Is that what you would expect? > > Yes, that sounds like a sensible behaviour. > > What I was alludding to was a lot simpler, though. An advert string > "version=0:version=1" from a client that prefers version 0 won't be > !strcmp("version=0", advert) but seeing many strcmp() in the patch > made me wonder. Ah I see. The strcmp()s against "version=0" are special cases for where it looks like the client does not understand any sort of version negotiation. If we see multiple versions listed in the advert, then the rest of the selection logic should do the right thing. However, I think that it might work to remove the special cases. In the event that the client is so old that it doesn't understand any form of version negotiation, it should just ignore the version fields / environment vars. If you think it's cleaner to remove the special cases, let me know.
Re: [PATCH v4 0/1] Advertise multiple supported proto versions
On 2018.11.14 19:22, Junio C Hamano wrote: > Josh Steadmon writes: > > > Fix several bugs identified in v3, clarify commit message, and clean up > > extern keyword in protocol.h. > > It is good to descirbe the change relative to v3 here, which would > help those who are interested and reviewed v3. > > To help those who missed the boat and v4 is their first encounter > with this series, having the description relative to v3 alone and > nothing else is not very friendly, though. Ack. Will keep this in mind for future patches. > > + diff --git a/builtin/upload-archive.c b/builtin/upload-archive.c > > + --- a/builtin/upload-archive.c > > + +++ b/builtin/upload-archive.c > > +@@ > > + #include "builtin.h" > > + #include "archive.h" > > + #include "pkt-line.h" > > ++#include "protocol.h" > > + #include "sideband.h" > > + #include "run-command.h" > > + #include "argv-array.h" > > +@@ > > + if (argc == 2 && !strcmp(argv[1], "-h")) > > + usage(upload_archive_usage); > > + > > ++ register_allowed_protocol_version(protocol_v0); > > ++ > > + /* > > +* Set up sideband subprocess. > > +* > > This one unfortunately seems to interact rather badly with your > js/remote-archive-v2 topic which is already in 'next', when this > topic is merged to 'pu', and my attempt to mechanically resolve > conflicts breaks t5000. Hmm, should we drop js/remote-archive-v2 then? Any solution that unblocks js/remote-archive-v2 will almost certainly depend on this patch.
[PATCH v2] Makefile: use FUZZ_CXXFLAGS for linking fuzzers
OSS-Fuzz requires C++-specific flags to link fuzzers. Passing these in CFLAGS causes lots of build warnings. Using separate FUZZ_CXXFLAGS avoids this. Signed-off-by: Josh Steadmon --- Since there's nothing else using CXXFLAGS, let's just make it explicit that these apply to the fuzzers. Range-diff against v1: 1: 1630a93f82 < -: -- Makefile: use CXXFLAGS for linking fuzzers -: -- > 1: 6b3d6dd7f0 Makefile: use FUZZ_CXXFLAGS for linking fuzzers Makefile | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/Makefile b/Makefile index bbfbb4292d..0c05896797 100644 --- a/Makefile +++ b/Makefile @@ -3098,14 +3098,16 @@ cover_db_html: cover_db # An example command to build against libFuzzer from LLVM 4.0.0: # # make CC=clang CXX=clang++ \ -# CFLAGS="-fsanitize-coverage=trace-pc-guard -fsanitize=address" \ +# FUZZ_CXXFLAGS="-fsanitize-coverage=trace-pc-guard -fsanitize=address" \ # LIB_FUZZING_ENGINE=/usr/lib/llvm-4.0/lib/libFuzzer.a \ # fuzz-all # +FUZZ_CXXFLAGS ?= $(CFLAGS) + .PHONY: fuzz-all $(FUZZ_PROGRAMS): all - $(QUIET_LINK)$(CXX) $(CFLAGS) $(LIB_OBJS) $(BUILTIN_OBJS) \ + $(QUIET_LINK)$(CXX) $(FUZZ_CXXFLAGS) $(LIB_OBJS) $(BUILTIN_OBJS) \ $(XDIFF_OBJS) $(EXTLIBS) git.o $@.o $(LIB_FUZZING_ENGINE) -o $@ fuzz-all: $(FUZZ_PROGRAMS) -- 2.19.1.930.g4563a0d9d0-goog
[PATCH v4 1/1] protocol: advertise multiple supported versions
Currently the client advertises that it supports the wire protocol version set in the protocol.version config. However, not all services support the same set of protocol versions. For example, git-receive-pack supports v1 and v0, but not v2. If a client connects to git-receive-pack and requests v2, it will instead be downgraded to v0. Other services, such as git-upload-archive, do not do any version negotiation checks. This patch creates a protocol version registry. Individual client and server programs register all the protocol versions they support prior to communicating with a remote instance. Versions should be listed in preference order; the version specified in protocol.version will automatically be moved to the front of the registry. The protocol version registry is passed to remote helpers via the GIT_PROTOCOL environment variable. Clients now advertise the full list of registered versions. Servers select the first allowed version from this advertisement. While we're at it, remove unnecessary externs from function declarations in protocol.h. Signed-off-by: Josh Steadmon --- builtin/archive.c| 3 + builtin/clone.c | 4 ++ builtin/fetch-pack.c | 4 ++ builtin/fetch.c | 5 ++ builtin/ls-remote.c | 5 ++ builtin/pull.c | 5 ++ builtin/push.c | 4 ++ builtin/receive-pack.c | 3 + builtin/send-pack.c | 3 + builtin/upload-archive.c | 3 + builtin/upload-pack.c| 4 ++ connect.c| 47 +++ protocol.c | 122 --- protocol.h | 23 +++- remote-curl.c| 28 ++--- t/t5570-git-daemon.sh| 2 +- t/t5700-protocol-v1.sh | 8 +-- t/t5702-protocol-v2.sh | 16 +++-- transport-helper.c | 6 ++ 19 files changed, 237 insertions(+), 58 deletions(-) diff --git a/builtin/archive.c b/builtin/archive.c index e74f675390..8adb9f381b 100644 --- a/builtin/archive.c +++ b/builtin/archive.c @@ -8,6 +8,7 @@ #include "transport.h" #include "parse-options.h" #include "pkt-line.h" +#include "protocol.h" #include "sideband.h" static void create_output_file(const char *output_file) @@ -94,6 +95,8 @@ int cmd_archive(int argc, const char **argv, const char *prefix) OPT_END() }; + register_allowed_protocol_version(protocol_v0); + argc = parse_options(argc, argv, prefix, local_opts, NULL, PARSE_OPT_KEEP_ALL); diff --git a/builtin/clone.c b/builtin/clone.c index fd2c3ef090..1651a950b6 100644 --- a/builtin/clone.c +++ b/builtin/clone.c @@ -900,6 +900,10 @@ int cmd_clone(int argc, const char **argv, const char *prefix) struct refspec rs = REFSPEC_INIT_FETCH; struct argv_array ref_prefixes = ARGV_ARRAY_INIT; + register_allowed_protocol_version(protocol_v2); + register_allowed_protocol_version(protocol_v1); + register_allowed_protocol_version(protocol_v0); + fetch_if_missing = 0; packet_trace_identity("clone"); diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c index 1a1bc63566..cba935e4d3 100644 --- a/builtin/fetch-pack.c +++ b/builtin/fetch-pack.c @@ -57,6 +57,10 @@ int cmd_fetch_pack(int argc, const char **argv, const char *prefix) fetch_if_missing = 0; + register_allowed_protocol_version(protocol_v2); + register_allowed_protocol_version(protocol_v1); + register_allowed_protocol_version(protocol_v0); + packet_trace_identity("fetch-pack"); memset(, 0, sizeof(args)); diff --git a/builtin/fetch.c b/builtin/fetch.c index 61bec5d213..2a20cf8bfd 100644 --- a/builtin/fetch.c +++ b/builtin/fetch.c @@ -21,6 +21,7 @@ #include "argv-array.h" #include "utf8.h" #include "packfile.h" +#include "protocol.h" #include "list-objects-filter-options.h" static const char * const builtin_fetch_usage[] = { @@ -1476,6 +1477,10 @@ int cmd_fetch(int argc, const char **argv, const char *prefix) int prune_tags_ok = 1; struct argv_array argv_gc_auto = ARGV_ARRAY_INIT; + register_allowed_protocol_version(protocol_v2); + register_allowed_protocol_version(protocol_v1); + register_allowed_protocol_version(protocol_v0); + packet_trace_identity("fetch"); fetch_if_missing = 0; diff --git a/builtin/ls-remote.c b/builtin/ls-remote.c index 1a25df7ee1..ea685e8bb9 100644 --- a/builtin/ls-remote.c +++ b/builtin/ls-remote.c @@ -1,5 +1,6 @@ #include "builtin.h" #include "cache.h" +#include "protocol.h" #include "transport.h" #include "ref-filter.h" #include "remote.h" @@ -80,6 +81,10 @@ int cmd_ls_remote(int argc, const char **argv, const char *prefix) memset(_array, 0, sizeof(ref_array)); + register_al
[PATCH v4 0/1] Advertise multiple supported proto versions
Fix several bugs identified in v3, clarify commit message, and clean up extern keyword in protocol.h. Josh Steadmon (1): protocol: advertise multiple supported versions builtin/archive.c| 3 + builtin/clone.c | 4 ++ builtin/fetch-pack.c | 4 ++ builtin/fetch.c | 5 ++ builtin/ls-remote.c | 5 ++ builtin/pull.c | 5 ++ builtin/push.c | 4 ++ builtin/receive-pack.c | 3 + builtin/send-pack.c | 3 + builtin/upload-archive.c | 3 + builtin/upload-pack.c| 4 ++ connect.c| 47 +++ protocol.c | 122 --- protocol.h | 23 +++- remote-curl.c| 28 ++--- t/t5570-git-daemon.sh| 2 +- t/t5700-protocol-v1.sh | 8 +-- t/t5702-protocol-v2.sh | 16 +++-- transport-helper.c | 6 ++ 19 files changed, 237 insertions(+), 58 deletions(-) Range-diff against v3: 1: b9968e3fb0 ! 1: 3c023991c0 protocol: advertise multiple supported versions @@ -4,22 +4,25 @@ Currently the client advertises that it supports the wire protocol version set in the protocol.version config. However, not all services -support the same set of protocol versions. When connecting to -git-receive-pack, the client automatically downgrades to v0 if -config.protocol is set to v2, but this check is not performed for other -services. +support the same set of protocol versions. For example, git-receive-pack +supports v1 and v0, but not v2. If a client connects to git-receive-pack +and requests v2, it will instead be downgraded to v0. Other services, +such as git-upload-archive, do not do any version negotiation checks. -This patch creates a protocol version registry. Individual operations -register all the protocol versions they support prior to communicating -with a server. Versions should be listed in preference order; the -version specified in protocol.version will automatically be moved to the -front of the registry. +This patch creates a protocol version registry. Individual client and +server programs register all the protocol versions they support prior to +communicating with a remote instance. Versions should be listed in +preference order; the version specified in protocol.version will +automatically be moved to the front of the registry. The protocol version registry is passed to remote helpers via the GIT_PROTOCOL environment variable. Clients now advertise the full list of registered versions. Servers -select the first recognized version from this advertisement. +select the first allowed version from this advertisement. + +While we're at it, remove unnecessary externs from function declarations +in protocol.h. Signed-off-by: Josh Steadmon @@ -165,6 +168,20 @@ git_config(git_push_config, ); argc = parse_options(argc, argv, prefix, options, push_usage, 0); + diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c + --- a/builtin/receive-pack.c + +++ b/builtin/receive-pack.c +@@ + + packet_trace_identity("receive-pack"); + ++ register_allowed_protocol_version(protocol_v1); ++ register_allowed_protocol_version(protocol_v0); ++ + argc = parse_options(argc, argv, prefix, options, receive_pack_usage, 0); + + if (argc > 1) + diff --git a/builtin/send-pack.c b/builtin/send-pack.c --- a/builtin/send-pack.c +++ b/builtin/send-pack.c @@ -179,6 +196,42 @@ argc = parse_options(argc, argv, prefix, options, send_pack_usage, 0); if (argc > 0) { + diff --git a/builtin/upload-archive.c b/builtin/upload-archive.c + --- a/builtin/upload-archive.c + +++ b/builtin/upload-archive.c +@@ + #include "builtin.h" + #include "archive.h" + #include "pkt-line.h" ++#include "protocol.h" + #include "sideband.h" + #include "run-command.h" + #include "argv-array.h" +@@ + if (argc == 2 && !strcmp(argv[1], "-h")) + usage(upload_archive_usage); + ++ register_allowed_protocol_version(protocol_v0); ++ + /* +* Set up sideband subprocess. +* + + diff --git a/builtin/upload-pack.c b/builtin/upload-pack.c + --- a/builtin/upload-pack.c + +++ b/builtin/upload-pack.c +@@ + packet_trace_identity("upload-pack"); + read_replace_refs = 0; + ++ register_allowed_protocol_version(protocol_v2); ++ register_allowed_protocol_version(protocol_v1); ++ register_allowed_protocol_version(protocol_v0);
Re: [PATCH v3 1/1] protocol: advertise multiple supported versions
On 2018.11.13 19:28, SZEDER Gábor wrote: > On Mon, Nov 12, 2018 at 01:49:05PM -0800, stead...@google.com wrote: > > > diff --git a/protocol.c b/protocol.c > > index 5e636785d1..54d2ab991b 100644 > > --- a/protocol.c > > +++ b/protocol.c > > > +void get_client_protocol_version_advertisement(struct strbuf *advert) > > +{ > > + int tmp_nr = nr_allowed_versions; > > + enum protocol_version *tmp_allowed_versions, config_version; > > + strbuf_reset(advert); > > + > > + have_advertised_versions_already = 1; > > + > > + config_version = get_protocol_version_config(); > > + if (config_version == protocol_v0) { > > + strbuf_addstr(advert, "version=0"); > > + return; > > + } > > + > > + if (tmp_nr > 0) { > > + ALLOC_ARRAY(tmp_allowed_versions, tmp_nr); > > + copy_array(tmp_allowed_versions, allowed_versions, tmp_nr, > > + sizeof(enum protocol_version)); > > + } else { > > + ALLOC_ARRAY(tmp_allowed_versions, 1); > > + tmp_nr = 1; > > + tmp_allowed_versions[0] = config_version; > > + } > > + > > + if (tmp_allowed_versions[0] != config_version) > > + for (int i = 1; i < nr_allowed_versions; i++) > > We don't do C99 yet, thus the declaration of a loop variable like this > is not allowed and triggers compiler errors. > > > + if (tmp_allowed_versions[i] == config_version) { > > + enum protocol_version swap = > > + tmp_allowed_versions[0]; > > + tmp_allowed_versions[0] = > > + tmp_allowed_versions[i]; > > + tmp_allowed_versions[i] = swap; > > + } > > + > > + strbuf_addf(advert, "version=%s", > > + format_protocol_version(tmp_allowed_versions[0])); > > + for (int i = 1; i < tmp_nr; i++) > > Likewise. > > > + strbuf_addf(advert, ":version=%s", > > + format_protocol_version(tmp_allowed_versions[i])); > > +} Sorry about that. Will fix in v4. Out of curiousity, do you have a config.mak snippet that will make these into errors? I played around with adding combinations of -ansi, -std=c89, and -pedantic to CFLAGS, but I couldn't get anything that detect the problem without also breaking on other parts of the build.
Re: [PATCH v3 1/1] protocol: advertise multiple supported versions
On 2018.11.13 13:01, Junio C Hamano wrote: > stead...@google.com writes: > > > Currently the client advertises that it supports the wire protocol > > version set in the protocol.version config. However, not all services > > support the same set of protocol versions. When connecting to > > git-receive-pack, the client automatically downgrades to v0 if > > config.protocol is set to v2, but this check is not performed for other > > services. > > "downgrades to v0 even if ... is set to v2" you mean? Otherwise it > is unclear why asking for v2 leads to using v0. The downgrade on push happens only when the the configured version is v2. If v1 is requested, no downgrade is triggered. I'll clarify the commit message in the next version. > > This patch creates a protocol version registry. Individual operations > > register all the protocol versions they support prior to communicating > > with a server. Versions should be listed in preference order; the > > version specified in protocol.version will automatically be moved to the > > front of the registry. > > > > The protocol version registry is passed to remote helpers via the > > GIT_PROTOCOL environment variable. > > > > Clients now advertise the full list of registered versions. Servers > > select the first recognized version from this advertisement. > > Makes sense. > > > +void get_client_protocol_version_advertisement(struct strbuf *advert) > > +{ > > + int tmp_nr = nr_allowed_versions; > > + enum protocol_version *tmp_allowed_versions, config_version; > > + strbuf_reset(advert); > > + > > + have_advertised_versions_already = 1; > > + > > + config_version = get_protocol_version_config(); > > + if (config_version == protocol_v0) { > > + strbuf_addstr(advert, "version=0"); > > + return; > > + } > > + > > + if (tmp_nr > 0) { > > + ALLOC_ARRAY(tmp_allowed_versions, tmp_nr); > > + copy_array(tmp_allowed_versions, allowed_versions, tmp_nr, > > + sizeof(enum protocol_version)); > > + } else { > > + ALLOC_ARRAY(tmp_allowed_versions, 1); > > + tmp_nr = 1; > > + tmp_allowed_versions[0] = config_version; > > + } > > + > > + if (tmp_allowed_versions[0] != config_version) > > + for (int i = 1; i < nr_allowed_versions; i++) > > + if (tmp_allowed_versions[i] == config_version) { > > + enum protocol_version swap = > > + tmp_allowed_versions[0]; > > + tmp_allowed_versions[0] = > > + tmp_allowed_versions[i]; > > + tmp_allowed_versions[i] = swap; > > + } > > + > > + strbuf_addf(advert, "version=%s", > > + format_protocol_version(tmp_allowed_versions[0])); > > + for (int i = 1; i < tmp_nr; i++) > > + strbuf_addf(advert, ":version=%s", > > + format_protocol_version(tmp_allowed_versions[i])); > > +} > > + > > So the idea is that the protocols the other end can talk come in > advert in their preferred order, and we take an intersection of them > and our "allowed-versions", but the preference is further skewed > with the "swap" thing if we have our own preference specified via > config? We currently don't intersect with our own allowed list, we just accept the first version that we recognize. This introduces its own version negotiation bug; if we add v2 push in the future, a new client talking to an old server would try to use v2 even though the server may not have the corresponding v2 push implementation. I'll fix this in the next version. In any case, the ordering of the server's allowed versions won't matter; we'll pick the the first version in the client's list which is also allowed on the server. > I am wondering if the code added by this patch outside this > function, with if (strcmp(client_ad.buf, "version=0") sprinkled all > over the place, works sensibly when the other side says "I prefer > version=0 but I do not mind talking version=1". Depends on what you mean by "sensibly" :). In the current case, if the client prefers v0, it will always end up using v0. After the fixes described above, it will always use v0 unless the server no longer supports v0. Is that what you would expect? > Isn't tmp_allowed_versions[] leaking when we return from this > function? Yes, sorry about that. Will fix.
Re: [PATCH] remote-curl: die on server-side errors
On 2018.11.13 23:30, Junio C Hamano wrote: > stead...@google.com writes: > > > When a smart HTTP server sends an error message via pkt-line, > > remote-curl will fail to detect the error (which usually results in > > incorrectly falling back to dumb-HTTP mode). > > OK, that is a valid thing to worry about. > > > > > This patch adds a check in discover_refs() for server-side error > > messages, as well as a test case for this issue. > > This makes me wonder if discoer_refs() is the only place where we > ought to be checking for ERR packets but we are not. Are there > other places that we also need a similar fix? Quite possibly; I'll review the other client code to see if there are similar issues before sending v2.
Re: [PATCH] remote-curl: die on server-side errors
On 2018.11.13 09:26, Jeff King wrote: > On Mon, Nov 12, 2018 at 02:44:56PM -0800, stead...@google.com wrote: > > > When a smart HTTP server sends an error message via pkt-line, > > remote-curl will fail to detect the error (which usually results in > > incorrectly falling back to dumb-HTTP mode). > > > > This patch adds a check in discover_refs() for server-side error > > messages, as well as a test case for this issue. > > Aside from the reformatting of the conditional that Junio mentioned, > this seems pretty good to me. But while looking at that, I found a few > things, some old and some new. :) > > > diff --git a/remote-curl.c b/remote-curl.c > > index 762a55a75f..bb3a86505e 100644 > > --- a/remote-curl.c > > +++ b/remote-curl.c > > @@ -436,7 +436,9 @@ static struct discovery *discover_refs(const char > > *service, int for_push) > > } else if (maybe_smart && > >last->len > 5 && starts_with(last->buf + 4, "version 2")) { > > last->proto_git = 1; > > - } > > + } else if (maybe_smart && last->len > 5 && > > + starts_with(last->buf + 4, "ERR ")) > > + die(_("remote error: %s"), last->buf + 8); > > The magic "4" here and in the existing "version 2" check is because we > are expecting pkt-lines. The original conditional always calls > packed_read_line_buf() and will complain if we didn't actually get a > pkt-line. > > Should we confirm that we got a real packet-line? Or at least that those > first four are even plausible hex chars? > > I admit that it's pretty unlikely that the server is going to fool us > here. It would need something like "foo ERRORS ARE FUN!". And even then > we'd report an error (whereas the correct behavior is falling back to > dumb http, but we know that won't work anyway because that's not a valid > ref advertisement). So I doubt this is really a bug per se, but it might > make it easier to understand what's going on if we actually parsed the > packet. Unfortunately we can't just directly parse the data in last->buf, because other parts of the code are expecting to see the raw pkt-line data there. I tried adding a duplicate pointer and length variable for this data and parsing that through packet_read_line_buf(), but even without using the results this apparently has side-effects that break all of t5550 (and probably other tests as well). It also fails if I completely duplicate last->buf into a new char* and call packet_readline_buf() on that, so there's clearly some interaction in the pkt-line guts that I'm not properly accounting for. > Similarly, we seem eager to accept "version 2" even if we are only > expecting v0. I know you have another series working in that direction, > but I don't think it touches this "proto_git". I guess accepting > "version 2" as "we're speaking git protocol" and then barfing later with > "wait, I didn't mean to speak v2" is probably OK. > > -Peff
Re: [PATCH] remote-curl: die on server-side errors
On 2018.11.13 12:02, Junio C Hamano wrote: > stead...@google.com writes: > > > When a smart HTTP server sends an error message via pkt-line, > > remote-curl will fail to detect the error (which usually results in > > incorrectly falling back to dumb-HTTP mode). > > > > This patch adds a check in discover_refs() for server-side error > > messages, as well as a test case for this issue. > > > > Signed-off-by: Josh Steadmon > > --- > > Forgot to mention one procedural comment. > > As you can see in the To: line of this reply, your MUA is placing > only the e-mail address without name on your From: line. > > Preferrably I'd like to see the same string as your sign-off on the > "From:" line in your messages for a bit of human touch ;-) Can you > tweak your MUA to make that happen? > > The second preference is to add an in-body header (i.e. as the first > line of the body of the message) so that the body of the message > starts like this: > > From: Josh Steadmon > > When a smart HTTP server sends an error message via pkt-line, > remote-curl will fail to detect the error (which usually results in > incorrectly falling back to dumb-HTTP mode). > > This patch adds a check in discover_refs() for server-side error > messages, as well as a test case for this issue. > > Signed-off-by: Josh Steadmon > --- > remote-curl.c | 4 +++- > t/lib-httpd.sh | 1 + > t/lib-httpd/apache.conf | 4 > > Either way would make sure that the resulting patch's author line > will be attributed correctly to the same identity as who is signing > it off first as the author. > > Thanks. This should be fixed for future patch submissions. Thanks for the heads-up.
Re: [PATCH] Makefile: use CXXFLAGS for linking fuzzers
On 2018.11.13 12:12, Junio C Hamano wrote: > stead...@google.com writes: > > > OSS-Fuzz requires C++-specific flags to link fuzzers. Passing these in > > CFLAGS causes lots of build warnings. Using separate CXXFLAGS avoids > > this. > > We are not a C++ shop, so allow me to show ignorance about how > projects that are OSS-Fuzz-enabled work. Do they use one set of > CXXFLAGS when compiling the "real thing" and a separate set (perhaps > one is subset of the other, or perhaps these two just have overlap) > of CXXFLAGS when building to link with the fuzzer? > > What I am trying to get at is if this should be CXXFLAGS or > CXX_FUZZER_FLAGS. If the OSS-Fuzz-enabled C++ projects use one set > of flags for the "main" part of the project (to produce binaries to > be run by the end users) and then link with extra flags to work with > fuzzers, I would imagine that they won't call the latter CXXFLAGS > but call it something else, and we probably should follow suit if > that is the case. > > Not that we plan to (re)write the maint part of Git in C++ ever, so > I am personally OK with sacrificing the most generic CXXFLAGS macro > for the sole use of OSS-Fuzz linkage, but I'd prefer to leave the > door open so that other things like OSS-Fuzz that require C++ can be > added like your work does to the project. > > Thanks. OSS-Fuzz only provides one set of CXXFLAGS for use on both compiling project C++ project files as well linking the fuzzers themselves. So in the event that Git ever added any C++ sources, they would need to use the same set of CXXFLAGS. Given that, do you agree with Stefan that it is more intuitive to define CXXFLAGS next to the fuzzer build rules, since that's the only place it's used for now?
Re: [PATCH 1/1] apply --recount: allow "no-op hunks"
On 2018.11.12 12:54, Johannes Schindelin via GitGitGadget wrote: > From: Johannes Schindelin > > When editing patches e.g. in `git add -e`, it is quite common that a > hunk ends up having no -/+ lines, i.e. it is now supposed to do nothing. > > This use case was broken by ad6e8ed37bc1 (apply: reject a hunk that does > not do anything, 2015-06-01) with the good intention of catching a very > real, different issue in hand-edited patches. > > So let's use the `--recount` option as the tell-tale whether the user > would actually be okay with no-op hunks. > > Add a test case to make sure that this use case does not regress again. > > Signed-off-by: Johannes Schindelin Looks good to me. Reviewed-by: Josh Steadmon
Re: [PATCH] coccicheck: introduce 'pending' semantic patches
On 2018.11.09 16:10, Stefan Beller wrote: > From: SZEDER Gábor > > Teach `make coccicheck` to avoid patches named "*.pending.cocci" and > handle them separately in a new `make coccicheck-pending` instead. > This means that we can separate "critical" patches from "FYI" patches. > The former target can continue causing Travis to fail its static > analysis job, while the latter can let us keep an eye on ongoing > (pending) transitions without them causing too much fallout. > > Document the intended use-cases around these two targets. > As the process around the pending patches is not yet fully explored, > leave that out. > > Based-on-work-by: SZEDER Gábor > Signed-off-by: Stefan Beller > Signed-off-by: Junio C Hamano > --- > > I dialed back on the workflow, as we may want to explore it first > before writing it down. > > Stefan Looks good to me. Reviewed-by: Josh Steadmon
Re: [PATCH] upload-pack: fix broken if/else chain in config callback
On 2018.10.24 10:50, Johannes Schindelin wrote: > Maybe a lot of explanation, but definitely a good one. The explanation and > the patch look good to me. > > Thanks, > Dscho Agreed, as a newbie I definitely appreciate detailed explanations. Looks good to me as well. Reviewed-by: Josh Steadmon
Re: [PATCH v3] archive: initialize archivers earlier
On 2018.10.23 13:09, Junio C Hamano wrote: > stead...@google.com writes: > > > diff --git a/t/t5000-tar-tree.sh b/t/t5000-tar-tree.sh > > index 2a97b27b0a..cfd5ca492f 100755 > > --- a/t/t5000-tar-tree.sh > > +++ b/t/t5000-tar-tree.sh > > @@ -39,6 +39,8 @@ test_lazy_prereq TAR_NEEDS_PAX_FALLBACK ' > > > > test_lazy_prereq GZIP 'gzip --version' > > > > +test_lazy_prereq ZIP 'zip --version' > > + > > There are a handful of zip implementations; Info-ZIP found on many > Linux distros does support 'zip --version', but we may want to make > sure this test covers different implementations of zip sufficiently. > > Queuing this patch (or an update of it) on 'pu' and hoping those > with zip from different origins to try it would not help very much, > either, as zip implementations that do not react to "zip --version" > would silently turn the prereq off without breaking anything. > > In any case, please refrain from adding any ZIP prerequiste to t5000 > which is about tar; t5003-archive-zip may be a much better fit. It > has an already working machinery that validates the generated zip > archive under UNZIP prerequisite, so we may not even have to invent > our own ZIP prereq if we did so. Ack. This has been removed in v4. V4 also has a test case in t5003 based on Jeff's advice. > > @@ -206,6 +208,19 @@ test_expect_success 'git archive with --output, > > override inferred format' ' > > test_cmp_bin b.tar d4.zip > > ' > > > > +test_expect_success GZIP 'git archive with --output and --remote creates > > .tgz' ' > > + git archive --output=d5.tgz --remote=. HEAD && > > + gzip -d -c < d5.tgz > d5.tar && > > + test_cmp_bin b.tar d5.tar > > +' > > We try to write redirections without SP between redirection operator > and target filename, i.e. "gzip -d -c d5.tar". Fixed in v5.
Re: [PATCH v2] archive: initialize archivers earlier
On 2018.10.22 20:06, Jeff King wrote: > On Mon, Oct 22, 2018 at 04:51:27PM -0700, Josh Steadmon wrote: > > > > > +test_expect_success GZIP 'git archive with --output and --remote uses > > > > expected format' ' > > > > + git archive --output=d5.tgz --remote=. HEAD && > > > > + gzip -d -c < d5.tgz > d5.tar && > > > > + test_cmp_bin b.tar d5.tar > > > > +' > > > > > > This nicely tests the more-interesting tgz case. But unfortunately it > > > won't run on machines without the GZIP prerequisite. I'd think that > > > would really be _most_ machines, but is it worth having a separate zip > > > test to cover machines without gzip? I guess that just creates the > > > opposite problem: not everybody has ZIP. > > > > Added a test to compare the file lists from the .zip file to the > > reference .tar file. I'm not sure if this is the best way to do things, > > but it at least verifies that a .zip is produced. However, it's brittle > > if the output of "zip -sf" changes. Let me know if you have a better > > idea. > > I wonder if we could do something more black-box. What we really care > about here is not the exact output, but rather that "-o foo.zip" > produces the same output as "--format zip". Could we do that without > even relying on ZIP? > > I think it should follow even for tgz, because we use "-n" for a > repeatable output. But there we are relying on an external gzip just to > _create_ the file, so we'd still need the GZIP prereq. > > Hmm. Looks like we already have a similar test in t5003. So maybe just: > > diff --git a/t/t5003-archive-zip.sh b/t/t5003-archive-zip.sh > index 55c7870997..cf19f56924 100755 > --- a/t/t5003-archive-zip.sh > +++ b/t/t5003-archive-zip.sh > @@ -158,11 +158,16 @@ test_expect_success 'git archive --format=zip with > --output' \ > 'git archive --format=zip --output=d2.zip HEAD && > test_cmp_bin d.zip d2.zip' > > -test_expect_success 'git archive with --output, inferring format' ' > +test_expect_success 'git archive with --output, inferring format (local)' ' > git archive --output=d3.zip HEAD && > test_cmp_bin d.zip d3.zip > ' > > +test_expect_success 'git archive with --output, ferring format (remote)' ' > + git archive --remote=. --output=d4.zip HEAD && > + test_cmp_bin d.zip d4.zip > +' > + > test_expect_success \ > 'git archive --format=zip with prefix' \ > 'git archive --format=zip --prefix=prefix/ HEAD >e.zip' > > which I think exposes the bug and can run everywhere? Makes sense, thanks!
Re: [PATCH v2] archive: initialize archivers earlier
On 2018.10.22 18:35, Jeff King wrote: > On Mon, Oct 22, 2018 at 02:48:11PM -0700, stead...@google.com wrote: > > > Initialize archivers as soon as possible when running git-archive and > > git-upload-archive. Various non-obvious behavior depends on having the > > archivers initialized, such as determining the desired archival format > > from the provided filename. > > > > Since 08716b3c11 ("archive: refactor file extension format-guessing", > > 2011-06-21), archive_format_from_filename() has used the registered > > archivers to match filenames (provided via --output) to archival > > formats. However, when git-archive is executed with --remote, format > > detection happens before the archivers have been registered. This causes > > archives from remotes to always be generated as TAR files, regardless of > > the actual filename (unless an explicit --format is provided). > > > > This patch fixes that behavior; archival format is determined properly > > from the output filename, even when --remote is used. > > > > Signed-off-by: Josh Steadmon > > Helped-by: Jeff King > > Thanks, this looks good overall. > > A few minor comments (that I'm not even sure are worth re-rolling for): > > > diff --git a/builtin/upload-archive.c b/builtin/upload-archive.c > > index 25d9116356..3f35ebcfe8 100644 > > --- a/builtin/upload-archive.c > > +++ b/builtin/upload-archive.c > > @@ -43,6 +43,7 @@ int cmd_upload_archive_writer(int argc, const char > > **argv, const char *prefix) > > } > > > > /* parse all options sent by the client */ > > + init_archivers(); > > return write_archive(sent_argv.argc, sent_argv.argv, prefix, > > the_repository, NULL, 1); > > } > > This seems to separate the comment from what it describes. Any reason > not to just init_archivers() closer to the top of the function here > (probably after the enter_repo() call)? Ack, fixed. > > diff --git a/t/t5000-tar-tree.sh b/t/t5000-tar-tree.sh > > index 2a97b27b0a..3e95fdf660 100755 > > --- a/t/t5000-tar-tree.sh > > +++ b/t/t5000-tar-tree.sh > > @@ -206,6 +206,12 @@ test_expect_success 'git archive with --output, > > override inferred format' ' > > test_cmp_bin b.tar d4.zip > > ' > > > > +test_expect_success GZIP 'git archive with --output and --remote uses > > expected format' ' > > + git archive --output=d5.tgz --remote=. HEAD && > > + gzip -d -c < d5.tgz > d5.tar && > > + test_cmp_bin b.tar d5.tar > > +' > > This nicely tests the more-interesting tgz case. But unfortunately it > won't run on machines without the GZIP prerequisite. I'd think that > would really be _most_ machines, but is it worth having a separate zip > test to cover machines without gzip? I guess that just creates the > opposite problem: not everybody has ZIP. Added a test to compare the file lists from the .zip file to the reference .tar file. I'm not sure if this is the best way to do things, but it at least verifies that a .zip is produced. However, it's brittle if the output of "zip -sf" changes. Let me know if you have a better idea.
Re: [PATCH v2 1/1] protocol: advertise multiple supported versions
On 2018.10.12 15:30, Stefan Beller wrote: > On Thu, Oct 11, 2018 at 6:02 PM wrote: > > > > From: Josh Steadmon > > > > Currently the client advertises that it supports the wire protocol > > version set in the protocol.version config. However, not all services > > support the same set of protocol versions. When connecting to > > git-receive-pack, the client automatically downgrades to v0 if > > config.protocol is set to v2, but this check is not performed for other > > services. > > > > This patch creates a protocol version registry. Individual commands > > register all the protocol versions they support prior to communicating > > with a server. Versions should be listed in preference order; the > > version specified in protocol.version will automatically be moved to the > > front of the registry. > > > > The protocol version registry is passed to remote helpers via the > > GIT_PROTOCOL environment variable. > > > > Clients now advertise the full list of registered versions. Servers > > select the first recognized version from this advertisement. > > I like this patch from the users point of view (i.e. inside fetch/push etc), > and I need to through the details in connect/protocol as that seems > to be a lot of new code, but hides the complexity very well. > > > + register_allowed_protocol_version(protocol_v2); > > + register_allowed_protocol_version(protocol_v1); > > + register_allowed_protocol_version(protocol_v0); > > Would it make sense to have a > > register_allowed_protocol_versions(protocol_v2, protocol_v1, > protocol_v0, NULL); > > similar to argv_array_pushl or would that be overengineered? Hmm, well it wouldn't be quite as clean as the argv_* versions, since we can't take the pointer of an enum value, so we don't have a good stop-value for the va_list. I suppose we could use protocol_unknown_version, but that seems unintuitive to me. We could also pass in an explicit argument count, but... ugh. Am I missing some other way to do this cleanly? I'll admit I'm not very familiar with va_lists. > > @@ -1085,10 +1085,9 @@ static struct child_process *git_connect_git(int > > fd[2], char *hostandport, > > target_host, 0); > > > > /* If using a new version put that stuff here after a second null > > byte */ > > - if (version > 0) { > > + if (strcmp(version_advert->buf, "version=0")) { > > strbuf_addch(, '\0'); > > - strbuf_addf(, "version=%d%c", > > - version, '\0'); > > + strbuf_addf(, "%s%c", version_advert->buf, '\0'); > > It's a bit unfortunate to pass around the string, but reading through > the following > lines seems like it is easiest. > > > > @@ -1226,16 +1226,10 @@ struct child_process *git_connect(int fd[2], const > > char *url, > > { > > char *hostandport, *path; > > struct child_process *conn; > > + struct strbuf version_advert = STRBUF_INIT; > > enum protocol protocol; > > - enum protocol_version version = get_protocol_version_config(); > > > > - /* > > -* NEEDSWORK: If we are trying to use protocol v2 and we are > > planning > > -* to perform a push, then fallback to v0 since the client doesn't > > know > > -* how to push yet using v2. > > -*/ > > - if (version == protocol_v2 && !strcmp("git-receive-pack", prog)) > > - version = protocol_v0; > > + get_client_protocol_version_advertisement(_advert); > > Nice to have this special casing gone! > > > diff --git a/protocol.c b/protocol.c > > index 5e636785d1..f788269c47 100644 > > + > > +static const char protocol_v0_string[] = "0"; > > +static const char protocol_v1_string[] = "1"; > > +static const char protocol_v2_string[] = "2"; > > + > ... > > +/* Return the text representation of a wire protocol version. */ > > +static const char *format_protocol_version(enum protocol_version version) > > +{ > > + switch (version) { > > + case protocol_v0: > > + return protocol_v0_string; > > + case protocol_v1: > > + return protocol_v1_string; > > + case protocol_v2: > > + return protocol_v2_string; > > + case protocol_unknown_version: > > + die(_("Unrecognized protocol version")); > > + } > > + die
Re: [PATCH 1/1] archive: init archivers before determining format
On 2018.10.19 19:59, Jeff King wrote: > On Fri, Oct 19, 2018 at 04:19:28PM -0700, stead...@google.com wrote: > > > diff --git a/builtin/archive.c b/builtin/archive.c > > index e74f675390..dd3283a247 100644 > > --- a/builtin/archive.c > > +++ b/builtin/archive.c > > @@ -45,7 +45,10 @@ static int run_remote_archiver(int argc, const char > > **argv, > > * it. > > */ > > if (name_hint) { > > - const char *format = archive_format_from_filename(name_hint); > > + const char *format; > > + init_tar_archiver(); > > + init_zip_archiver(); > > + format = archive_format_from_filename(name_hint); > > if (format) > > packet_write_fmt(fd[1], "argument --format=%s\n", > > format); > > Hrm. This code was added back in 56baa61d01 (archive: move file > extension format-guessing lower, 2011-06-21), and your example > invocation worked back then! > > Unfortunately it was broken by the very next patch in the series, > 08716b3c11 (archive: refactor file extension format-guessing, > 2011-06-21). I guess that's what I get for not adding regression tests. > > It's probably worth mentioning those points in the commit message. Done. > Does this work with configured archiver extensions, too? I think so, > because we load them via init_tar_archiver(). If you mean things like .tgz and .tar.gz, then yes, they are affected by the bug as well, and this patch fixes them. The test included in v2 uses a .tgz file. > Can we avoid repeating the list of archivers here? This needs to stay in > sync with the list in write_archive(). I know there are only two, but > can we factor out an init_archivers() call or something? Done. > We also should probably just call it unconditionally when we start the > archiver command (I don't think there are any other bugs like this > lurking, but it doesn't cost very much to initialize these; it makes > sense to just do it early). Done. > Other than those minor points (and the lack of test), your fix looks > good to me. Thanks for the review!
Re: [PATCH 0/1] Fix format detection when archiving remotely
On 2018.10.19 19:41, Jeff King wrote: > On Fri, Oct 19, 2018 at 04:19:27PM -0700, stead...@google.com wrote: > > > Currently, git-archive does not properly determine the desired archive > > format when both --output and --remote are provided, because > > run_remote_archiver() does not initialize the archivers prior to calling > > archive_format_from_filename(). This results in the remote archiver > > always returning a TAR file, regardless of the requested format. > > > > This patch initializes the TAR and ZIP archivers before calling > > archive_format_from_filename(), which fixes format detection. > > It seems like some of this content could be in the commit message of the > actual patch. Ack. I'll be sending v2 shortly, please let me know if I've missed anything that should be included. > > Steps to reproduce: > > > > ∫ git version > > git version 2.19.1.568.g152ad8e336-goog > > ∫ cd ~/src/git > > ∫ git archive --output ~/good.zip HEAD > > ∫ file ~/good.zip > > /home/steadmon/good.zip: Zip archive data, at least v1.0 to extract > > ∫ git archive --output ~/bad.zip --remote=. HEAD > > ∫ file ~/bad.zip > > /home/steadmon/bad.zip: POSIX tar archive > > And this could be in a test script in the actual patch. :) Done.
Re: What's cooking in git.git (Oct 2018, #02; Sat, 13)
On 2018.10.12 23:53, Junio C Hamano wrote: > * js/remote-archive-v2 (2018-09-28) 4 commits > (merged to 'next' on 2018-10-12 at 5f34377f60) > + archive: allow archive over HTTP(S) with proto v2 > + archive: implement protocol v2 archive command > + archive: use packet_reader for communications > + archive: follow test standards around assertions > > The original implementation of "git archive --remote" more or less > bypassed the transport layer and did not work over http(s). The > version 2 of the protocol is defined to allow going over http(s) as > well as Git native transport. > > Will merge to 'master'. The first two patches (test cleanup and packet_reader refactor) are OK, but the latter two will break the archive command when an old client attempts to talk to a new server (due to the version advertisement problem noted in [1]). Sorry that I didn't catch that these had made it into next. [1]: https://public-inbox.org/git/20180927223314.ga230...@google.com/#t
Re: [RFC PATCH 1/2] fuzz: Add basic fuzz testing target.
On 2018.10.10 11:14, Junio C Hamano wrote: > Josh Steadmon writes: > > > +FUZZ_OBJS += fuzz-pack-headers.o > > + > > +FUZZ_PROGRAMS += $(patsubst %.o,%,$(FUZZ_OBJS)) > > + > > ... > > +### Fuzz testing > > +# > > +.PHONY: fuzz-clean fuzz-objs fuzz-compile > > I take it that you anticipate the fuzz programs in the future all > be named fuzz-$(blah), whose source is fuzz-$(blah).o (even though > we may grow some common code that may be linked with them, which can > be done by tweaking the rule for the $(FUZZ_PROGRAMS) target). Am I > reading you correctly? Would fuzz-{clean,objs,compile} risk squatting > on nicer names we may want to use for $(blah) down the line? Yes, that's correct. I've reworked the rules to be more compatible with how OSS-Fuzz expects to build these targets, and now "fuzz-all" is the only remaining special target. > > + ... > > +$(FUZZ_PROGRAMS): fuzz-compile > > + clang++ $(FUZZ_LDFLAGS) $(LIB_OBJS) $(BUILTIN_OBJS) $(XDIFF_OBJS) \ > > + $(EXTLIBS) git.o $@.o /usr/lib/llvm-4.0/lib/libFuzzer.a -o $@ > > Is the expected usage pattern to know a single fuzz-* program the > builder wants to build, to run "make fuzz-pack-headers"? If not, it > also would be a good idea to have something like > > fuzz-build-all:: $(FUZZ_PROGRAMS) > .PHONY: fuzz-build-all > > perhaps? > > Also, in the final version we unleash to general developer audience, > we'd want to support "make V=1" (and "make" that is "$(QUIET)"). Done and done.
Re: [PATCH 1/1] protocol: limit max protocol version per service
On 2018.10.12 16:32, Jonathan Nieder wrote: > Josh Steadmon wrote: > > > For now, I'm going to try adding an --allowed_versions flag for the > > remote helpers, but if anyone has a better idea, let me know. > > I forgot to mention: the stateless-connect remote helper capability is > still experimental so you're free to change its interface (e.g. to > change the syntax of the stateless-connect command it provides). For v2 of this patch, I ended up using GIT_PROTOCOL to pass the version string to the remote helpers.
Re: [PATCH 1/1] protocol: limit max protocol version per service
On 2018.10.05 12:25, Stefan Beller wrote: > > > > I suppose if we are strict about serving from a single endpoint, the > > > > version registry makes more sense, and individual operations can declare > > > > acceptable version numbers before calling any network code? > > > > > > Ah yeah, that makes sense! > > > > Thinking out loud here. Please let me know if I say something stupid :) > > > > So we'll have (up to) three pieces of version information we'll care > > about for version negotiation: > > > > 1. (maybe) a client-side protocol.version config entry > > (and in case we don't, we have it implicit ly hardcoded, as > we have to choose the default for users that don't care themselves about > this setting) > > > 2. a list of acceptable proto versions for the currently running > >operation on the client > > 3. a list of acceptable proto versions for the server endpoint that > >handles the request > > Yes that matches my understanding. The issue is between (1) and (2) > as (1) is in a generic config, whereas (2) is specific to the command, > such that it may differ. And as a user you may want to express things > like: "use the highest version", which is done by setting (1) to "version 2" > despite (2) not having support of all commands for v2. > > > According to the doc on protocol.version: "If set, clients will attempt > > to communicate with a server using the specified protocol version. If > > unset, no attempt will be made by the client to communicate using a > > particular protocol version, this results in protocol version 0 being > > used." > > > > So, if protocol.version is not set, or set to 0, the client should not > > attempt any sort of version negotiation. > > Yes, as version 0 is unaware of versions, i.e. there are old installations > out there where all the versioning code is not there, so in case of an > old client the new server *must* speak v0 to be able to communicate > (and vice versa). > > > > Otherwise, the client prefers a > > particular version, but we don't guarantee that they will actually use > > that version after the (unspecified) version negotiation procedure. > > > > If protocol.version is set to something other than 0, we construct a > > list of acceptable versions for the given operation. If the > > protocol.version entry is present in that list, we move it to the front > > of the list to note that it is the preferred version. We send all of > > these, in order, in the request. > > > > When the server endpoint begins to handle a request, it first constructs > > a list of acceptable versions. If the client specifies a list of > > versions, we check them one-by-one to see if they are acceptable. If we > > find a match, we use that version. If we don't find any matches or if > > the client did not send a version list, we default to v0. > > > > Seem reasonable? > > This sounds super reasonable! So this runs into problems with remote-curl (and possibly other remote helpers): builtin/push.c can declare whatever allowed versions it wants, but those are not carried over when remote-curl is started to actually talk to the remote. What's worse, remote-curl starts its HTTP connection before it knows what command it's actually acting as a helper for. For now, I'm going to try adding an --allowed_versions flag for the remote helpers, but if anyone has a better idea, let me know. Thanks, Josh
Re: [PATCH 1/1] protocol: limit max protocol version per service
On 2018.10.03 15:47, Stefan Beller wrote: > On Wed, Oct 3, 2018 at 2:34 PM Josh Steadmon wrote: > > > > Is there a method or design for advertising multiple acceptable versions > > from the client? > > I think the client can send multiple versions, looking through protocol.c > (and not the documentation as I should for this:) > > /* >* Determine which protocol version the client has requested. Since >* multiple 'version' keys can be sent by the client, indicating that >* the client is okay to speak any of them, select the greatest version >* that the client has requested. This is due to the assumption that >* the most recent protocol version will be the most state-of-the-art. >*/ > ... > const char *git_protocol = getenv(GIT_PROTOCOL_ENVIRONMENT); > string_list_split(, git_protocol, ':', -1); > ... > for_each_string_list_item(item, ) { > ... > if (skip_prefix(item->string, "version=", )) > ... > > in determine_protocol_version_server which already had the client > speak to it, so I think at least the server can deal with multiple versions. > > But given that we transport the version in env variables, we'd > need to be extra careful if we just did not see the version > in the --remote=. above? Sorry, I'm not sure I understand this. What about env variables requires caution? > > From my understanding, we can only add a single > > version=X field in the advertisement, but IIUC we can extend this fairly > > easily? Perhaps we can have "version=X" to mean the preferred version, > > and then a repeatable "acceptable_version=Y" field or similar? > > Just re-use "version X", separated by colons as above. > > > > From a maintenance perspective, do we want to keep > > > this part of the code central, as it ties protocol (as proxied > > > by service name) to the max version number? > > > I would think that we'd rather have the decision local to the > > > code, i.e. builtin/fetch would need to tell protocol.c that it > > > can do (0,1,2) and builtin/push can do (0,1), and then the > > > networking layers of code would figure out by the input > > > from the caller and the input from the user (configured > > > protocol.version) what is the best to go forward from > > > then on. > > > > I like having it centralized, because enforcing this in git_connect() > > and discover_refs() catches all the outgoing version advertisements, but > > there's lots of code paths that lead to those two functions that would > > all have to have the acceptable version numbers plumbed through. > > Makes sense. > > > I suppose we could also have a registry of services to version numbers, > > but I tend to dislike non-local sources of data. But if the list likes > > that approach better, I'll be happy to implement it. > > > > But I guess having the central place here is not to > > > bad either. How will it cope with the desire of protocol v2 > > > to have only one end point (c.f. serve.{c,h} via builtin/serve > > > as "git serve") ? > > > > I'm not sure about this. In my series to add a v2 archive command, I > > added support for a new endpoint for proto v2 and I don't recall seeing > > any complaints, but that is still open for review. > > Ah I guess new end points would imply that you can speak at least > a given version N. > > > I suppose if we are strict about serving from a single endpoint, the > > version registry makes more sense, and individual operations can declare > > acceptable version numbers before calling any network code? > > Ah yeah, that makes sense! Thinking out loud here. Please let me know if I say something stupid :) So we'll have (up to) three pieces of version information we'll care about for version negotiation: 1. (maybe) a client-side protocol.version config entry 2. a list of acceptable proto versions for the currently running operation on the client 3. a list of acceptable proto versions for the server endpoint that handles the request According to the doc on protocol.version: "If set, clients will attempt to communicate with a server using the specified protocol version. If unset, no attempt will be made by the client to communicate using a particular protocol version, this results in protocol version 0 being used." So, if protocol.version is not set, or set to 0, the client should not attempt any sort of version negotiation. Otherwise, the client prefers a particular version, but we don't guarantee that they will actually use that version after the (unspecified) version nego
[RFC PATCH 2/2] fuzz: Add fuzz testing for packfile indices.
Signed-off-by: Josh Steadmon --- .gitignore | 1 + Makefile| 5 - fuzz-pack-idx.c | 13 + packfile.c | 44 +--- packfile.h | 13 + 5 files changed, 56 insertions(+), 20 deletions(-) create mode 100644 fuzz-pack-idx.c diff --git a/.gitignore b/.gitignore index 87a28b3115..64b3377d40 100644 --- a/.gitignore +++ b/.gitignore @@ -1,5 +1,6 @@ /fuzz_corpora /fuzz-pack-headers +/fuzz-pack-idx /GIT-BUILD-OPTIONS /GIT-CFLAGS /GIT-LDFLAGS diff --git a/Makefile b/Makefile index 10bb82b115..f7e6be8f19 100644 --- a/Makefile +++ b/Makefile @@ -685,6 +685,7 @@ SCRIPTS = $(SCRIPT_SH_INS) \ ETAGS_TARGET = TAGS FUZZ_OBJS += fuzz-pack-headers.o +FUZZ_OBJS += fuzz-pack-idx.o FUZZ_PROGRAMS += $(patsubst %.o,%,$(FUZZ_OBJS)) @@ -3071,7 +3072,7 @@ cover_db_html: cover_db ### Fuzz testing # -.PHONY: fuzz-clean fuzz-objs fuzz-compile +.PHONY: fuzz-clean fuzz-objs fuzz-compile fuzz-all FUZZ_CFLAGS = $(CFLAGS) -fsanitize-coverage=trace-pc-guard -fsanitize=address FUZZ_LDFLAGS = $(FUZZ_CFLAGS) @@ -3089,3 +3090,5 @@ fuzz-compile: $(FUZZ_PROGRAMS): fuzz-compile clang++ $(FUZZ_LDFLAGS) $(LIB_OBJS) $(BUILTIN_OBJS) $(XDIFF_OBJS) \ $(EXTLIBS) git.o $@.o /usr/lib/llvm-4.0/lib/libFuzzer.a -o $@ + +fuzz-all: $(FUZZ_PROGRAMS) diff --git a/fuzz-pack-idx.c b/fuzz-pack-idx.c new file mode 100644 index 00..0c3d777aac --- /dev/null +++ b/fuzz-pack-idx.c @@ -0,0 +1,13 @@ +#include "object-store.h" +#include "packfile.h" + +int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size); + +int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) +{ + struct packed_git p; + + load_idx("fuzz-input", GIT_SHA1_RAWSZ, (void *)data, size, ); + + return 0; +} diff --git a/packfile.c b/packfile.c index 841b36182f..86074a76e9 100644 --- a/packfile.c +++ b/packfile.c @@ -80,10 +80,8 @@ void pack_report(void) static int check_packed_git_idx(const char *path, struct packed_git *p) { void *idx_map; - struct pack_idx_header *hdr; size_t idx_size; - uint32_t version, nr, i, *index; - int fd = git_open(path); + int fd = git_open(path), ret; struct stat st; const unsigned int hashsz = the_hash_algo->rawsz; @@ -101,16 +99,32 @@ static int check_packed_git_idx(const char *path, struct packed_git *p) idx_map = xmmap(NULL, idx_size, PROT_READ, MAP_PRIVATE, fd, 0); close(fd); - hdr = idx_map; + ret = load_idx(path, hashsz, idx_map, idx_size, p); + + if (ret) + munmap(idx_map, idx_size); + + return ret; +} + +int load_idx(const char *path, const unsigned int hashsz, void *idx_map, +size_t idx_size, struct packed_git *p) +{ + struct pack_idx_header *hdr = idx_map; + uint32_t version, nr, i, *index; + + if (idx_size < 4 * 256 + hashsz + hashsz) + return error("index file %s is too small", path); + if (idx_map == NULL) + return error("empty data"); + if (hdr->idx_signature == htonl(PACK_IDX_SIGNATURE)) { version = ntohl(hdr->idx_version); - if (version < 2 || version > 2) { - munmap(idx_map, idx_size); + if (version < 2 || version > 2) return error("index file %s is version %"PRIu32 " and is not supported by this binary" " (try upgrading GIT to a newer version)", path, version); - } } else version = 1; @@ -120,10 +134,8 @@ static int check_packed_git_idx(const char *path, struct packed_git *p) index += 2; /* skip index header */ for (i = 0; i < 256; i++) { uint32_t n = ntohl(index[i]); - if (n < nr) { - munmap(idx_map, idx_size); + if (n < nr) return error("non-monotonic index %s", path); - } nr = n; } @@ -135,10 +147,8 @@ static int check_packed_git_idx(const char *path, struct packed_git *p) * - hash of the packfile * - file checksum */ - if (idx_size != 4*256 + nr * (hashsz + 4) + hashsz + hashsz) { - munmap(idx_map, idx_size); + if (idx_size != 4 * 256 + nr * (hashsz + 4) + hashsz + hashsz) return error("wrong index v1 file size in %s", path); - } } else if (version == 2) { /* * Minimum size: @@ -157,20 +167,16 @@ static int check_packed_git_idx(const char *path, struct packed_git *p) unsigned long
[RFC PATCH 0/2] add fuzzing targets for use with LLVM libFuzzer
libFuzzer[1] is a fuzzing engine included in recent versions of LLVM. It is used by OSS-Fuzz[2] for continuous fuzzing of OSS projects. This series adds two basic fuzzing targets covering packfile header and index code. It is not particularly portable, and requires the use of LLVM v4.0 (the latest version available on my workstation). I would particularly appreciate advice on how to make the Makefile more portable. [1]: https://llvm.org/docs/LibFuzzer.html [2]: https://github.com/google/oss-fuzz Josh Steadmon (2): fuzz: Add basic fuzz testing target. fuzz: Add fuzz testing for packfile indices. .gitignore | 3 +++ Makefile| 33 - fuzz-pack-headers.c | 14 ++ fuzz-pack-idx.c | 13 + packfile.c | 44 +--- packfile.h | 13 + 6 files changed, 100 insertions(+), 20 deletions(-) create mode 100644 fuzz-pack-headers.c create mode 100644 fuzz-pack-idx.c -- 2.19.0.605.g01d371f741-goog
[RFC PATCH 1/2] fuzz: Add basic fuzz testing target.
Signed-off-by: Josh Steadmon --- .gitignore | 2 ++ Makefile| 30 +- fuzz-pack-headers.c | 14 ++ 3 files changed, 45 insertions(+), 1 deletion(-) create mode 100644 fuzz-pack-headers.c diff --git a/.gitignore b/.gitignore index 9d1363a1eb..87a28b3115 100644 --- a/.gitignore +++ b/.gitignore @@ -1,3 +1,5 @@ +/fuzz_corpora +/fuzz-pack-headers /GIT-BUILD-OPTIONS /GIT-CFLAGS /GIT-LDFLAGS diff --git a/Makefile b/Makefile index 13e1c52478..10bb82b115 100644 --- a/Makefile +++ b/Makefile @@ -590,6 +590,8 @@ XDIFF_OBJS = VCSSVN_OBJS = GENERATED_H = EXTRA_CPPFLAGS = +FUZZ_OBJS = +FUZZ_PROGRAMS = LIB_OBJS = PROGRAM_OBJS = PROGRAMS = @@ -682,6 +684,10 @@ SCRIPTS = $(SCRIPT_SH_INS) \ ETAGS_TARGET = TAGS +FUZZ_OBJS += fuzz-pack-headers.o + +FUZZ_PROGRAMS += $(patsubst %.o,%,$(FUZZ_OBJS)) + # Empty... EXTRA_PROGRAMS = @@ -2250,6 +2256,7 @@ TEST_OBJS := $(patsubst %$X,%.o,$(TEST_PROGRAMS)) $(patsubst %,t/helper/%,$(TEST OBJECTS := $(LIB_OBJS) $(BUILTIN_OBJS) $(PROGRAM_OBJS) $(TEST_OBJS) \ $(XDIFF_OBJS) \ $(VCSSVN_OBJS) \ + $(FUZZ_OBJS) \ common-main.o \ git.o ifndef NO_CURL @@ -2931,7 +2938,7 @@ profile-clean: cocciclean: $(RM) contrib/coccinelle/*.cocci.patch* -clean: profile-clean coverage-clean cocciclean +clean: profile-clean coverage-clean cocciclean fuzz-clean $(RM) *.res $(RM) $(OBJECTS) $(RM) $(LIB_FILE) $(XDIFF_LIB) $(VCSSVN_LIB) @@ -3061,3 +3068,24 @@ cover_db: coverage-report cover_db_html: cover_db cover -report html -outputdir cover_db_html cover_db + +### Fuzz testing +# +.PHONY: fuzz-clean fuzz-objs fuzz-compile + +FUZZ_CFLAGS = $(CFLAGS) -fsanitize-coverage=trace-pc-guard -fsanitize=address +FUZZ_LDFLAGS = $(FUZZ_CFLAGS) + + +fuzz-clean: + $(RM) $(FUZZ_PROGRAMS) $(FUZZ_OBJS) + +fuzz-objs: $(FUZZ_OBJS) + +fuzz-compile: + $(MAKE) CC=clang LD=clang CFLAGS="$(FUZZ_CFLAGS)" \ + LDFLAGS="$(FUZZ_LDFLAGS)" all fuzz-objs + +$(FUZZ_PROGRAMS): fuzz-compile + clang++ $(FUZZ_LDFLAGS) $(LIB_OBJS) $(BUILTIN_OBJS) $(XDIFF_OBJS) \ + $(EXTLIBS) git.o $@.o /usr/lib/llvm-4.0/lib/libFuzzer.a -o $@ diff --git a/fuzz-pack-headers.c b/fuzz-pack-headers.c new file mode 100644 index 00..99da1d0fd3 --- /dev/null +++ b/fuzz-pack-headers.c @@ -0,0 +1,14 @@ +#include "packfile.h" + +int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size); + +int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) +{ + enum object_type type; + unsigned long len; + + unpack_object_header_buffer((const unsigned char *)data, + (unsigned long)size, , ); + + return 0; +} -- 2.19.0.605.g01d371f741-goog
Re: [PATCH 1/1] protocol: limit max protocol version per service
On 2018.10.02 15:28, Stefan Beller wrote: > On Tue, Oct 2, 2018 at 3:00 PM Josh Steadmon wrote: > > > > For services other than git-receive-pack, clients currently advertise > > that they support the version set in the protocol.version config, > > regardless of whether or not there is actually an implementation of that > > service for the given protocol version. This causes backwards- > > compatibility problems when a new implementation for the given > > protocol version is added. > > > > This patch sets maximum allowed protocol versions for git-receive-pack, > > git-upload-archive, and git-upload-pack. > > > > Previously, git-receive-pack would downgrade from v2 to v0, but would > > allow v1 if set in protocol.version. Now, it will downgrade from v2 to > > v1. > > But does git-receive-pack understand v1? > As to my understanding we have not even defined v1 > for push (receive-pack) and archive --remote (upload-archive). > v1 is only known to fetch (upload-pack). > > > +enum protocol_version determine_maximum_protocol_version( > > + const char *service, enum protocol_version default_version) > > +{ > > + if (!strcmp(service, "git-receive-pack")) > > + return protocol_v1; > > + else if (!strcmp(service, "git-upload-archive")) > > + return protocol_v1; > > so I would think these two would be _v0. > ... goes and checks ... > aa9bab29b8 (upload-pack, receive-pack: introduce protocol version 1, > 2017-10-16) seems to actually teach v1 to receive-pack as well, > but upload-archive was completely off radar, so I think returning > (v1, v0, v2 in the order as in the code) would make sense? I believe that git-upload-archive can still speak version 1 without any trouble, but it at least doesn't break anything in the test suite to limit this to v0 either. > Asides from this, I thought there was a deliberate decision > that we'd want to avoid a strict order on the protocol versions, > but I could not find prior discussion on list to back up this claim. :/ > > For example we'd go with e.g. enums instead of integers > for version numbers, as then some internal setup could > also have things like protocol_v2018-10-02 or protocol_vWhatever; > some protocol version may be advantageous to the client, some to > the server, and we'd need to negotiate the best version that both > are happy with. (e.g. the server may like version 0, 2 and 3, and > the client may like 0,2,4 as 3 is bad security wise for the client, > so both would negotiate to 2 as their best case) Is there a method or design for advertising multiple acceptable versions from the client? From my understanding, we can only add a single version=X field in the advertisement, but IIUC we can extend this fairly easily? Perhaps we can have "version=X" to mean the preferred version, and then a repeatable "acceptable_version=Y" field or similar? > From a maintenance perspective, do we want to keep > this part of the code central, as it ties protocol (as proxied > by service name) to the max version number? > I would think that we'd rather have the decision local to the > code, i.e. builtin/fetch would need to tell protocol.c that it > can do (0,1,2) and builtin/push can do (0,1), and then the > networking layers of code would figure out by the input > from the caller and the input from the user (configured > protocol.version) what is the best to go forward from > then on. I like having it centralized, because enforcing this in git_connect() and discover_refs() catches all the outgoing version advertisements, but there's lots of code paths that lead to those two functions that would all have to have the acceptable version numbers plumbed through. I suppose we could also have a registry of services to version numbers, but I tend to dislike non-local sources of data. But if the list likes that approach better, I'll be happy to implement it. > But I guess having the central place here is not to > bad either. How will it cope with the desire of protocol v2 > to have only one end point (c.f. serve.{c,h} via builtin/serve > as "git serve") ? I'm not sure about this. In my series to add a v2 archive command, I added support for a new endpoint for proto v2 and I don't recall seeing any complaints, but that is still open for review. I suppose if we are strict about serving from a single endpoint, the version registry makes more sense, and individual operations can declare acceptable version numbers before calling any network code? Thanks for the review, Josh
[PATCH 1/1] protocol: limit max protocol version per service
For services other than git-receive-pack, clients currently advertise that they support the version set in the protocol.version config, regardless of whether or not there is actually an implementation of that service for the given protocol version. This causes backwards- compatibility problems when a new implementation for the given protocol version is added. This patch sets maximum allowed protocol versions for git-receive-pack, git-upload-archive, and git-upload-pack. Previously, git-receive-pack would downgrade from v2 to v0, but would allow v1 if set in protocol.version. Now, it will downgrade from v2 to v1. Signed-off-by: Josh Steadmon --- connect.c | 11 --- protocol.c| 13 + protocol.h| 7 +++ remote-curl.c | 11 --- 4 files changed, 28 insertions(+), 14 deletions(-) diff --git a/connect.c b/connect.c index 94547e5056..4a8cd78239 100644 --- a/connect.c +++ b/connect.c @@ -1228,14 +1228,11 @@ struct child_process *git_connect(int fd[2], const char *url, struct child_process *conn; enum protocol protocol; enum protocol_version version = get_protocol_version_config(); + enum protocol_version max_version; - /* -* NEEDSWORK: If we are trying to use protocol v2 and we are planning -* to perform a push, then fallback to v0 since the client doesn't know -* how to push yet using v2. -*/ - if (version == protocol_v2 && !strcmp("git-receive-pack", prog)) - version = protocol_v0; + max_version = determine_maximum_protocol_version(prog, version); + if (version > max_version) + version = max_version; /* Without this we cannot rely on waitpid() to tell * what happened to our children. diff --git a/protocol.c b/protocol.c index 5e636785d1..1c553d8b99 100644 --- a/protocol.c +++ b/protocol.c @@ -79,3 +79,16 @@ enum protocol_version determine_protocol_version_client(const char *server_respo return version; } + +enum protocol_version determine_maximum_protocol_version( + const char *service, enum protocol_version default_version) +{ + if (!strcmp(service, "git-receive-pack")) + return protocol_v1; + else if (!strcmp(service, "git-upload-archive")) + return protocol_v1; + else if (!strcmp(service, "git-upload-pack")) + return protocol_v2; + + return default_version; +} diff --git a/protocol.h b/protocol.h index 2ad35e433c..eabc0c5fab 100644 --- a/protocol.h +++ b/protocol.h @@ -31,4 +31,11 @@ extern enum protocol_version determine_protocol_version_server(void); */ extern enum protocol_version determine_protocol_version_client(const char *server_response); +/* + * Used by a client to determine the maximum protocol version to advertise for a + * particular service. If the service is unrecognized, return default_version. + */ +extern enum protocol_version determine_maximum_protocol_version( + const char *service, enum protocol_version default_version); + #endif /* PROTOCOL_H */ diff --git a/remote-curl.c b/remote-curl.c index fb28309e85..028adb76ae 100644 --- a/remote-curl.c +++ b/remote-curl.c @@ -344,6 +344,7 @@ static struct discovery *discover_refs(const char *service, int for_push) int http_ret, maybe_smart = 0; struct http_get_options http_options; enum protocol_version version = get_protocol_version_config(); + enum protocol_version max_version; if (last && !strcmp(service, last->service)) return last; @@ -360,13 +361,9 @@ static struct discovery *discover_refs(const char *service, int for_push) strbuf_addf(_url, "service=%s", service); } - /* -* NEEDSWORK: If we are trying to use protocol v2 and we are planning -* to perform a push, then fallback to v0 since the client doesn't know -* how to push yet using v2. -*/ - if (version == protocol_v2 && !strcmp("git-receive-pack", service)) - version = protocol_v0; + max_version = determine_maximum_protocol_version(service, version); + if (version > max_version) + version = max_version; /* Add the extra Git-Protocol header */ if (get_protocol_http_header(version, _header)) -- 2.19.0.605.g01d371f741-goog
[PATCH 0/1] Limit client version advertisements
As discussed in [1], clients will incorrectly advertise support for protocol version 2 even when the service in question does not have a v2 implementation. This patch sets maximum protocol versions for git-receive-pack, git-upload-archive, and git-upload-pack. [1]: https://public-inbox.org/git/20180927012455.234876-1-stead...@google.com/ Josh Steadmon (1): protocol: limit max protocol version per service connect.c | 11 --- protocol.c| 13 + protocol.h| 7 +++ remote-curl.c | 11 --- 4 files changed, 28 insertions(+), 14 deletions(-) -- 2.19.0.605.g01d371f741-goog
Re: [PATCH v2 0/4] Add proto v2 archive command with HTTP support
On 2018.09.27 15:20, Junio C Hamano wrote: > Jonathan Nieder writes: > > > 1. Clients sending version=2 when they do not, in fact, speak protocol > > v2 for a service is a (serious) bug. (Separately from this > > series) we should fix it. > > > > 2. That bug is already in the wild, alas. Fortunately the semantics of > > GIT_PROTOCOL as a list of key/value pairs is well defined. So we > > have choices of (a) bump version to version=3 (b) pass another > > value 'version=2:yesreallyversion=2' (c) etc. > > > > 3. This is likely to affect push, too. > > Do you mean that existing "git push", "git fetch" and "git archive" > sends version=2 even when they are not capable of speaking protocol > v2? I thought that "git archive [--remote]" was left outside of the > protocol update (that was the reason why the earlier attempt took a > hacky route of "shallow clone followed by local archive"), so there > is no "git archive" in the wild that can even say "version=$n" > (which requires you to be at least version=1)? Yes, the version on my desktop sends version=2 when archiving: ∫ which git /usr/bin/git ∫ git --version git version 2.19.0.605.g01d371f741-goog ∫ GIT_TRACE_PACKET=${HOME}/server_trace git daemon \ --enable=upload-archive \ --base-path=${HOME}/src/bare-repos & [1] 258496 ∫ git archive --remote git://localhost/test-repo.git HEAD >! test.tar ∫ grep version ~/server_trace 15:31:22.377869 pkt-line.c:80 packet: git< git-upload-archive /test-repo.git\0host=localhost\0\0version=2\0
Re: [PATCH 3/3] archive: allow archive over HTTP(S) with proto v2
On 2018.09.13 09:47, Junio C Hamano wrote: > Josh Steadmon writes: > > > Signed-off-by: Josh Steadmon > > --- > > builtin/archive.c | 8 +++- > > http-backend.c | 10 +- > > transport-helper.c | 5 +++-- > > 3 files changed, 19 insertions(+), 4 deletions(-) > > > > diff --git a/builtin/archive.c b/builtin/archive.c > > index 73831887d..5fa75b3f7 100644 > > --- a/builtin/archive.c > > +++ b/builtin/archive.c > > @@ -87,7 +87,13 @@ static int run_remote_archiver(int argc, const char > > **argv, > > status = packet_reader_read(); > > if (status != PACKET_READ_FLUSH) > > die(_("git archive: expected a flush")); > > - } > > + } else if (version == protocol_v2 && > > + starts_with(transport->url, "http")) > > + /* > > +* Commands over HTTP require two requests, so there's an > > +* additional server response to parse. > > +*/ > > + discover_version(); > > What should happen if the version discovered here is different from > v2 or the capabilities offered are different from what we saw > before? Perhaps we need some sanity checks, as we on this side of > the connection know we are making two requests, and may even end up > talking with an instance of "upload-archive" that is different from > the one we talked with earlier. > > > diff --git a/http-backend.c b/http-backend.c > > index 458642ef7..d62d583c7 100644 > > --- a/http-backend.c > > +++ b/http-backend.c > > @@ -32,6 +32,7 @@ struct rpc_service { > > static struct rpc_service rpc_service[] = { > > { "upload-pack", "uploadpack", 1, 1 }, > > { "receive-pack", "receivepack", 0, -1 }, > > + { "upload-archive", "uploadarchive", 1, 1 }, > > }; > > > > static struct string_list *get_parameters(void) > > @@ -637,6 +638,12 @@ static void service_rpc(struct strbuf *hdr, char > > *service_name) > > struct rpc_service *svc = select_service(hdr, service_name); > > struct strbuf buf = STRBUF_INIT; > > > > + if (!strcmp(service_name, "git-upload-archive")) { > > + /* git-upload-archive doesn't need --stateless-rpc */ > > + argv[1] = "."; > > + argv[2] = NULL; > > + } > > + > > strbuf_reset(); > > strbuf_addf(, "application/x-git-%s-request", svc->name); > > check_content_type(hdr, buf.buf); > > @@ -713,7 +720,8 @@ static struct service_cmd { > > {"GET", "/objects/pack/pack-[0-9a-f]{40}\\.idx$", get_idx_file}, > > > > {"POST", "/git-upload-pack$", service_rpc}, > > - {"POST", "/git-receive-pack$", service_rpc} > > + {"POST", "/git-receive-pack$", service_rpc}, > > + {"POST", "/git-upload-archive$", service_rpc}, > > }; > > > > static int bad_request(struct strbuf *hdr, const struct service_cmd *c) > > diff --git a/transport-helper.c b/transport-helper.c > > index 143ca008c..b4b96fc89 100644 > > --- a/transport-helper.c > > +++ b/transport-helper.c > > @@ -605,7 +605,8 @@ static int process_connect_service(struct transport > > *transport, > > ret = run_connect(transport, ); > > } else if (data->stateless_connect && > >(get_protocol_version_config() == protocol_v2) && > > - !strcmp("git-upload-pack", name)) { > > + (!strcmp("git-upload-pack", name) || > > + !strcmp("git-upload-archive", name))) { > > strbuf_addf(, "stateless-connect %s\n", name); > > ret = run_connect(transport, ); > > if (ret) > > @@ -639,7 +640,7 @@ static int connect_helper(struct transport *transport, > > const char *name, > > > > /* Get_helper so connect is inited. */ > > get_helper(transport); > > - if (!data->connect) > > + if (!data->connect && !data->stateless_connect) > > die(_("operation not supported by protocol")); > > This is somewhat curious. The upload-pack going over HTTP also is > triggered by the same "stateless-connect" remote helper command, as > we just saw in the previous hunk, and that support is not new. Why > do we need this change then? What's different between the "data" > that is obtained by get_helper(transport) for driving upload-pack > and upload-archive? Presumably upload-pack was working without this > change because it also sets the connect bit on, and upload-archive > does not work without this change because it does not? Why do these > two behave differently? The data struct is not different between upload-pack vs. upload-archive. Neither of them set the connect bit. The difference is that upload-pack doesn't need to call transport_connect(), so it never hits connect_helper().
Re: [PATCH v2 0/4] Add proto v2 archive command with HTTP support
On 2018.09.27 11:20, Stefan Beller wrote: > On Wed, Sep 26, 2018 at 6:25 PM Josh Steadmon wrote: > > > > This is the second version of my series to add a new protocol v2 command > > for archiving, with support for HTTP(S). > > > > NEEDSWORK: a server built with this series is not backwards-compatible > > with clients that set GIT_PROTOCOL=version=2 or configure > > protocol.version=2. The old client will unconditionally send "argument > > ..." packet lines, which breaks the server's expectations of a > > "command=archive" request, > > So if an old client sets protocol to v2, it would only apply that > protocol version > to fetch, not archive, so it would start following a v0 conversation, but > as the protocol version is set, it would be transmitted to the server. > This sounds like a bug in the client? Yeah, basically. We're telling the server we support v2, even if the specific operation we're trying to do doesn't have a v2 implementation on the client. So this is going to make it ugly to replace existing commands. > > while the server's capability advertisement > > in turn breaks the clients expectation of either an ACK or NACK. > > Could a modern client send either another protocol version (3?) > or a special capability along the protocol version ("fixed_archive") > > > I've been discussing workarounds for this with Jonathan Nieder, but > > please let me know if you have any suggestions for v3 of this series. > > Care to open the discussion to the list? What are the different > approaches, what are the pros/cons? Jonathan suggested something along the lines of what you said above, adding a new field in GIT_PROTOCOL. So we'd send something like "version=2:archive_version=2" and have the server detect the latter. I'm not sure if that's the best way to go about this since I'm not familiar with the version detection code for other parts of the system. I worry that it will lead us down the path of having to specify a version for every command that we eventually convert to protocol v2. On the other hand, I don't see any other way to work around this, at least in the archive case. We can't peek at the client's transmissions on the server, because v2 requires that the server speaks first...
[PATCH v2 4/4] archive: allow archive over HTTP(S) with proto v2
Signed-off-by: Josh Steadmon --- builtin/archive.c | 12 +++- http-backend.c | 13 - transport-helper.c | 7 --- 3 files changed, 27 insertions(+), 5 deletions(-) diff --git a/builtin/archive.c b/builtin/archive.c index f91d222677..78a259518d 100644 --- a/builtin/archive.c +++ b/builtin/archive.c @@ -87,7 +87,17 @@ static int run_remote_archiver(int argc, const char **argv, status = packet_reader_read(); if (status == PACKET_READ_NORMAL && reader.pktlen > 0) die(_("git archive: expected a flush")); - } + } else if (version == protocol_v2 && + (starts_with(transport->url, "http://;) || + starts_with(transport->url, "https://;))) + /* +* Commands over HTTP require two requests, so there's an +* additional server response to parse. We do only basic sanity +* checking here that the versions presented match across +* requests. +*/ + if (version != discover_version()) + die(_("git archive: received different protocol versions in subsequent requests")); /* Now, start reading from fd[0] and spit it out to stdout */ rv = recv_sideband("archive", fd[0], 1); diff --git a/http-backend.c b/http-backend.c index 9e894f197f..8e262d50e0 100644 --- a/http-backend.c +++ b/http-backend.c @@ -32,6 +32,7 @@ struct rpc_service { static struct rpc_service rpc_service[] = { { "upload-pack", "uploadpack", 1, 1 }, { "receive-pack", "receivepack", 0, -1 }, + { "upload-archive", "uploadarchive", 1, 1 }, }; static struct string_list *get_parameters(void) @@ -637,6 +638,15 @@ static void service_rpc(struct strbuf *hdr, char *service_name) struct rpc_service *svc = select_service(hdr, service_name); struct strbuf buf = STRBUF_INIT; + if (!strcmp(service_name, "git-upload-archive")) { + /* +* git-upload-archive doesn't need --stateless-rpc, because it +* always handles only a single request. +*/ + argv[1] = "."; + argv[2] = NULL; + } + strbuf_reset(); strbuf_addf(, "application/x-git-%s-request", svc->name); check_content_type(hdr, buf.buf); @@ -713,7 +723,8 @@ static struct service_cmd { {"GET", "/objects/pack/pack-[0-9a-f]{40}\\.idx$", get_idx_file}, {"POST", "/git-upload-pack$", service_rpc}, - {"POST", "/git-receive-pack$", service_rpc} + {"POST", "/git-receive-pack$", service_rpc}, + {"POST", "/git-upload-archive$", service_rpc}, }; static int bad_request(struct strbuf *hdr, const struct service_cmd *c) diff --git a/transport-helper.c b/transport-helper.c index 143ca008c8..c41c3dfcff 100644 --- a/transport-helper.c +++ b/transport-helper.c @@ -604,8 +604,9 @@ static int process_connect_service(struct transport *transport, strbuf_addf(, "connect %s\n", name); ret = run_connect(transport, ); } else if (data->stateless_connect && - (get_protocol_version_config() == protocol_v2) && - !strcmp("git-upload-pack", name)) { + get_protocol_version_config() == protocol_v2 && + (!strcmp("git-upload-pack", name) || + !strcmp("git-upload-archive", name))) { strbuf_addf(, "stateless-connect %s\n", name); ret = run_connect(transport, ); if (ret) @@ -639,7 +640,7 @@ static int connect_helper(struct transport *transport, const char *name, /* Get_helper so connect is inited. */ get_helper(transport); - if (!data->connect) + if (!data->connect && !data->stateless_connect) die(_("operation not supported by protocol")); if (!process_connect_service(transport, name, exec)) -- 2.19.0.605.g01d371f741-goog
[PATCH v2 3/4] archive: implement protocol v2 archive command
This adds a new archive command for protocol v2. The command expects arguments in the form "argument X" which are passed unmodified to git-upload-archive--writer. This command works over the file://, Git, and SSH transports. HTTP support will be added in a separate patch. NEEDSWORK: this is not backwards-compatible with older clients that set GIT_PROTOCOL=version=2 or configure protocol.version=2. Signed-off-by: Josh Steadmon --- Documentation/technical/protocol-v2.txt | 21 +++- builtin/archive.c | 45 + builtin/upload-archive.c| 27 +-- serve.c | 7 t/t5000-tar-tree.sh | 7 t/t5701-git-serve.sh| 1 + 6 files changed, 90 insertions(+), 18 deletions(-) diff --git a/Documentation/technical/protocol-v2.txt b/Documentation/technical/protocol-v2.txt index 09e4e0273f..6126fc5738 100644 --- a/Documentation/technical/protocol-v2.txt +++ b/Documentation/technical/protocol-v2.txt @@ -77,7 +77,8 @@ A v2 server would reply: S: Subsequent requests are then made directly to the service -`$GIT_URL/git-upload-pack`. (This works the same for git-receive-pack). +`$GIT_URL/git-upload-pack`. (This works the same for git-receive-pack or +git-upload-archive). Capability Advertisement -- @@ -168,6 +169,24 @@ printable ASCII characters except space (i.e., the byte range 32 < x < and debugging purposes, and MUST NOT be used to programmatically assume the presence or absence of particular features. + archive +~ + +`archive` is the command to request an archive (such as a tarball) from a server +over v2. + +archive takes the following optional arguments: + +argument + This specifies that should be passed as an argument to the + underlying archive writer. This option can be repeated to pass + additional arguments to the archive writer. + +See linkgit:git-archive[1] for allowed values of ``. + +The output of archive is a packet-line stream of the resulting archive, with +error messages transferred over the sideband channel. + ls-refs ~ diff --git a/builtin/archive.c b/builtin/archive.c index 4eb547c5b7..f91d222677 100644 --- a/builtin/archive.c +++ b/builtin/archive.c @@ -5,9 +5,11 @@ #include "cache.h" #include "builtin.h" #include "archive.h" +#include "connect.h" #include "transport.h" #include "parse-options.h" #include "pkt-line.h" +#include "protocol.h" #include "sideband.h" static void create_output_file(const char *output_file) @@ -23,6 +25,13 @@ static void create_output_file(const char *output_file) } } +static void do_v2_command_and_cap(int out) +{ + packet_write_fmt(out, "command=archive\n"); + /* Capability list would go here, if we wanted to request any. */ + packet_delim(out); +} + static int run_remote_archiver(int argc, const char **argv, const char *remote, const char *exec, const char *name_hint) @@ -32,6 +41,7 @@ static int run_remote_archiver(int argc, const char **argv, struct remote *_remote; struct packet_reader reader; enum packet_read_status status; + enum protocol_version version; _remote = remote_get(remote); if (!_remote->url[0]) @@ -41,6 +51,11 @@ static int run_remote_archiver(int argc, const char **argv, packet_reader_init(, fd[0], NULL, 0, PACKET_READ_CHOMP_NEWLINE); + version = discover_version(); + + if (version == protocol_v2 && server_supports_v2("archive", 0)) + do_v2_command_and_cap(fd[1]); + /* * Inject a fake --format field at the beginning of the * arguments, with the format inferred from our output @@ -56,22 +71,24 @@ static int run_remote_archiver(int argc, const char **argv, packet_write_fmt(fd[1], "argument %s\n", argv[i]); packet_flush(fd[1]); - status = packet_reader_read(); - - if (status != PACKET_READ_NORMAL || reader.pktlen <= 0) - die(_("git archive: expected ACK/NAK, got a flush packet")); - if (strcmp(reader.line, "ACK")) { - if (starts_with(reader.line, "NACK ")) - die(_("git archive: NACK %s"), reader.line + 5); - if (starts_with(reader.line, "ERR ")) - die(_("remote error: %s"), reader.line + 4); - die(_("git archive: protocol error")); + if (version == protocol_v0) { + status = packet_reader_read(); + + if (status != PACKET_READ_NORMAL || reader.pktlen <= 0) + die(_("git archive: expec
[PATCH v2 1/4] archive: follow test standards around assertions
Move assertions outside of the check_tar function so that all top-level code is wrapped in a test_expect_* assertion. Signed-off-by: Josh Steadmon --- t/t5000-tar-tree.sh | 26 ++ 1 file changed, 10 insertions(+), 16 deletions(-) diff --git a/t/t5000-tar-tree.sh b/t/t5000-tar-tree.sh index 2a97b27b0a..c408e3a23d 100755 --- a/t/t5000-tar-tree.sh +++ b/t/t5000-tar-tree.sh @@ -62,11 +62,9 @@ check_tar() { dir=$1 dir_with_prefix=$dir/$2 - test_expect_success ' extract tar archive' ' - (mkdir $dir && cd $dir && "$TAR" xf -) <$tarfile - ' + (mkdir $dir && cd $dir && "$TAR" xf -) <$tarfile && - test_expect_success TAR_NEEDS_PAX_FALLBACK ' interpret pax headers' ' + if test_have_prereq TAR_NEEDS_PAX_FALLBACK ; then ( cd $dir && for header in *.paxheader @@ -82,16 +80,11 @@ check_tar() { fi done ) - ' + fi && - test_expect_success ' validate filenames' ' - (cd ${dir_with_prefix}a && find .) | sort >$listfile && - test_cmp a.lst $listfile - ' - - test_expect_success ' validate file contents' ' - diff -r a ${dir_with_prefix}a - ' + (cd ${dir_with_prefix}a && find .) | sort >$listfile && + test_cmp a.lst $listfile && + diff -r a ${dir_with_prefix}a } test_expect_success \ @@ -143,19 +136,20 @@ test_expect_success \ 'git archive' \ 'git archive HEAD >b.tar' -check_tar b +test_expect_success 'extract archive' 'check_tar b' test_expect_success 'git archive --prefix=prefix/' ' git archive --prefix=prefix/ HEAD >with_prefix.tar ' -check_tar with_prefix prefix/ +test_expect_success 'extract with prefix' 'check_tar with_prefix prefix/' test_expect_success 'git-archive --prefix=olde-' ' git archive --prefix=olde- HEAD >with_olde-prefix.tar ' -check_tar with_olde-prefix olde- +test_expect_success 'extract with olde- prefix' \ + 'check_tar with_olde-prefix olde-' test_expect_success 'git archive on large files' ' test_config core.bigfilethreshold 1 && -- 2.19.0.605.g01d371f741-goog
[PATCH v2 2/4] archive: use packet_reader for communications
Using packet_reader will simplify version detection and capability handling, which will make implementation of protocol v2 support in git-archive easier. This refactoring does not change the behavior of "git archive". Signed-off-by: Josh Steadmon --- builtin/archive.c | 23 ++- 1 file changed, 14 insertions(+), 9 deletions(-) diff --git a/builtin/archive.c b/builtin/archive.c index e74f675390..4eb547c5b7 100644 --- a/builtin/archive.c +++ b/builtin/archive.c @@ -27,10 +27,11 @@ static int run_remote_archiver(int argc, const char **argv, const char *remote, const char *exec, const char *name_hint) { - char *buf; int fd[2], i, rv; struct transport *transport; struct remote *_remote; + struct packet_reader reader; + enum packet_read_status status; _remote = remote_get(remote); if (!_remote->url[0]) @@ -38,6 +39,8 @@ static int run_remote_archiver(int argc, const char **argv, transport = transport_get(_remote, _remote->url[0]); transport_connect(transport, "git-upload-archive", exec, fd); + packet_reader_init(, fd[0], NULL, 0, PACKET_READ_CHOMP_NEWLINE); + /* * Inject a fake --format field at the beginning of the * arguments, with the format inferred from our output @@ -53,18 +56,20 @@ static int run_remote_archiver(int argc, const char **argv, packet_write_fmt(fd[1], "argument %s\n", argv[i]); packet_flush(fd[1]); - buf = packet_read_line(fd[0], NULL); - if (!buf) + status = packet_reader_read(); + + if (status != PACKET_READ_NORMAL || reader.pktlen <= 0) die(_("git archive: expected ACK/NAK, got a flush packet")); - if (strcmp(buf, "ACK")) { - if (starts_with(buf, "NACK ")) - die(_("git archive: NACK %s"), buf + 5); - if (starts_with(buf, "ERR ")) - die(_("remote error: %s"), buf + 4); + if (strcmp(reader.line, "ACK")) { + if (starts_with(reader.line, "NACK ")) + die(_("git archive: NACK %s"), reader.line + 5); + if (starts_with(reader.line, "ERR ")) + die(_("remote error: %s"), reader.line + 4); die(_("git archive: protocol error")); } - if (packet_read_line(fd[0], NULL)) + status = packet_reader_read(); + if (status == PACKET_READ_NORMAL && reader.pktlen > 0) die(_("git archive: expected a flush")); /* Now, start reading from fd[0] and spit it out to stdout */ -- 2.19.0.605.g01d371f741-goog
[PATCH v2 0/4] Add proto v2 archive command with HTTP support
This is the second version of my series to add a new protocol v2 command for archiving, with support for HTTP(S). NEEDSWORK: a server built with this series is not backwards-compatible with clients that set GIT_PROTOCOL=version=2 or configure protocol.version=2. The old client will unconditionally send "argument ..." packet lines, which breaks the server's expectations of a "command=archive" request, while the server's capability advertisement in turn breaks the clients expectation of either an ACK or NACK. I've been discussing workarounds for this with Jonathan Nieder, but please let me know if you have any suggestions for v3 of this series. Josh Steadmon (4): archive: follow test standards around assertions archive: use packet_reader for communications archive: implement protocol v2 archive command archive: allow archive over HTTP(S) with proto v2 Documentation/technical/protocol-v2.txt | 21 - builtin/archive.c | 58 +++-- builtin/upload-archive.c| 27 ++-- http-backend.c | 13 +- serve.c | 7 +++ t/t5000-tar-tree.sh | 33 +++--- t/t5701-git-serve.sh| 1 + transport-helper.c | 7 +-- 8 files changed, 130 insertions(+), 37 deletions(-) Range-diff against v1: -: -- > 1: c2e371ad24 archive: follow test standards around assertions 1: b514184273 ! 2: a65f73f627 archive: use packet_reader for communications @@ -6,7 +6,10 @@ handling, which will make implementation of protocol v2 support in git-archive easier. +This refactoring does not change the behavior of "git archive". + Signed-off-by: Josh Steadmon +Reviewed-by: Stefan Beller diff --git a/builtin/archive.c b/builtin/archive.c @@ -42,24 +45,24 @@ - if (!buf) + status = packet_reader_read(); + -+ if (status == PACKET_READ_FLUSH) ++ if (status != PACKET_READ_NORMAL || reader.pktlen <= 0) die(_("git archive: expected ACK/NAK, got a flush packet")); - if (strcmp(buf, "ACK")) { - if (starts_with(buf, "NACK ")) - die(_("git archive: NACK %s"), buf + 5); - if (starts_with(buf, "ERR ")) - die(_("remote error: %s"), buf + 4); -+ if (strcmp(reader.buffer, "ACK")) { -+ if (starts_with(reader.buffer, "NACK ")) -+ die(_("git archive: NACK %s"), reader.buffer + 5); -+ if (starts_with(reader.buffer, "ERR ")) -+ die(_("remote error: %s"), reader.buffer + 4); ++ if (strcmp(reader.line, "ACK")) { ++ if (starts_with(reader.line, "NACK ")) ++ die(_("git archive: NACK %s"), reader.line + 5); ++ if (starts_with(reader.line, "ERR ")) ++ die(_("remote error: %s"), reader.line + 4); die(_("git archive: protocol error")); } - if (packet_read_line(fd[0], NULL)) + status = packet_reader_read(); -+ if (status != PACKET_READ_FLUSH) ++ if (status == PACKET_READ_NORMAL && reader.pktlen > 0) die(_("git archive: expected a flush")); /* Now, start reading from fd[0] and spit it out to stdout */ 2: 1518c15dc1 < -: -- archive: implement protocol v2 archive command -: -- > 3: 0a8cc5e331 archive: implement protocol v2 archive command 3: 1b7ad8d8f6 ! 4: 97a1424f32 archive: allow archive over HTTP(S) with proto v2 @@ -10,16 +10,20 @@ +++ b/builtin/archive.c @@ status = packet_reader_read(); - if (status != PACKET_READ_FLUSH) + if (status == PACKET_READ_NORMAL && reader.pktlen > 0) die(_("git archive: expected a flush")); - } + } else if (version == protocol_v2 && -+ starts_with(transport->url, "http")) ++ (starts_with(transport->url, "http://;) || ++ starts_with(transport->url, "https://;))) + /* + * Commands over HTTP require two requests, so there's an -+ * additional server response to parse. ++ * additional server response to parse. We do only basic sanity ++ * checking here that the versions presented match across ++ * requests. + */ -+ discover_version(); ++ if (version != discover_version()) ++ die(_("git archive: received different
[PATCH 2/3] archive: implement protocol v2 archive command
This adds a new archive command for protocol v2. The command expects arguments in the form "argument X" which are passed unmodified to git-upload-archive--writer. This command works over the file://, Git, and SSH transports. HTTP support will be added in a separate patch. Signed-off-by: Josh Steadmon --- builtin/archive.c| 45 +++- builtin/upload-archive.c | 44 --- t/t5000-tar-tree.sh | 5 + 3 files changed, 77 insertions(+), 17 deletions(-) diff --git a/builtin/archive.c b/builtin/archive.c index e54fc39ad..73831887d 100644 --- a/builtin/archive.c +++ b/builtin/archive.c @@ -5,9 +5,11 @@ #include "cache.h" #include "builtin.h" #include "archive.h" +#include "connect.h" #include "transport.h" #include "parse-options.h" #include "pkt-line.h" +#include "protocol.h" #include "sideband.h" static void create_output_file(const char *output_file) @@ -23,6 +25,13 @@ static void create_output_file(const char *output_file) } } +static int do_v2_command_and_cap(int out) +{ + packet_write_fmt(out, "command=archive\n"); + /* Capability list would go here, if we had any. */ + packet_delim(out); +} + static int run_remote_archiver(int argc, const char **argv, const char *remote, const char *exec, const char *name_hint) @@ -32,6 +41,7 @@ static int run_remote_archiver(int argc, const char **argv, struct remote *_remote; struct packet_reader reader; enum packet_read_status status; + enum protocol_version version; _remote = remote_get(remote); if (!_remote->url[0]) @@ -41,6 +51,11 @@ static int run_remote_archiver(int argc, const char **argv, packet_reader_init(, fd[0], NULL, 0, PACKET_READ_CHOMP_NEWLINE); + version = discover_version(); + + if (version == protocol_v2) + do_v2_command_and_cap(fd[1]); + /* * Inject a fake --format field at the beginning of the * arguments, with the format inferred from our output @@ -56,22 +71,24 @@ static int run_remote_archiver(int argc, const char **argv, packet_write_fmt(fd[1], "argument %s\n", argv[i]); packet_flush(fd[1]); - status = packet_reader_read(); - - if (status == PACKET_READ_FLUSH) - die(_("git archive: expected ACK/NAK, got a flush packet")); - if (strcmp(reader.buffer, "ACK")) { - if (starts_with(reader.buffer, "NACK ")) - die(_("git archive: NACK %s"), reader.buffer + 5); - if (starts_with(reader.buffer, "ERR ")) - die(_("remote error: %s"), reader.buffer + 4); - die(_("git archive: protocol error")); + if (version == protocol_v0) { + status = packet_reader_read(); + + if (status == PACKET_READ_FLUSH) + die(_("git archive: expected ACK/NAK, got a flush packet")); + if (strcmp(reader.buffer, "ACK")) { + if (starts_with(reader.buffer, "NACK ")) + die(_("git archive: NACK %s"), reader.buffer + 5); + if (starts_with(reader.buffer, "ERR ")) + die(_("remote error: %s"), reader.buffer + 4); + die(_("git archive: protocol error")); + } + + status = packet_reader_read(); + if (status != PACKET_READ_FLUSH) + die(_("git archive: expected a flush")); } - status = packet_reader_read(); - if (status != PACKET_READ_FLUSH) - die(_("git archive: expected a flush")); - /* Now, start reading from fd[0] and spit it out to stdout */ rv = recv_sideband("archive", fd[0], 1); rv |= transport_disconnect(transport); diff --git a/builtin/upload-archive.c b/builtin/upload-archive.c index 25d911635..534e8fd56 100644 --- a/builtin/upload-archive.c +++ b/builtin/upload-archive.c @@ -5,6 +5,7 @@ #include "builtin.h" #include "archive.h" #include "pkt-line.h" +#include "protocol.h" #include "sideband.h" #include "run-command.h" #include "argv-array.h" @@ -73,13 +74,53 @@ static ssize_t process_input(int child_fd, int band) return sz; } +static int handle_v2_command_and_cap(void) +{ + struct packet_reader reader; + enum packet_read_status status; + + packet_reader_init(, 0, NULL, 0, PACKET_READ_CHOMP_NEWLINE); + + packet_write_fmt(1, "versi
[PATCH 3/3] archive: allow archive over HTTP(S) with proto v2
Signed-off-by: Josh Steadmon --- builtin/archive.c | 8 +++- http-backend.c | 10 +- transport-helper.c | 5 +++-- 3 files changed, 19 insertions(+), 4 deletions(-) diff --git a/builtin/archive.c b/builtin/archive.c index 73831887d..5fa75b3f7 100644 --- a/builtin/archive.c +++ b/builtin/archive.c @@ -87,7 +87,13 @@ static int run_remote_archiver(int argc, const char **argv, status = packet_reader_read(); if (status != PACKET_READ_FLUSH) die(_("git archive: expected a flush")); - } + } else if (version == protocol_v2 && + starts_with(transport->url, "http")) + /* +* Commands over HTTP require two requests, so there's an +* additional server response to parse. +*/ + discover_version(); /* Now, start reading from fd[0] and spit it out to stdout */ rv = recv_sideband("archive", fd[0], 1); diff --git a/http-backend.c b/http-backend.c index 458642ef7..d62d583c7 100644 --- a/http-backend.c +++ b/http-backend.c @@ -32,6 +32,7 @@ struct rpc_service { static struct rpc_service rpc_service[] = { { "upload-pack", "uploadpack", 1, 1 }, { "receive-pack", "receivepack", 0, -1 }, + { "upload-archive", "uploadarchive", 1, 1 }, }; static struct string_list *get_parameters(void) @@ -637,6 +638,12 @@ static void service_rpc(struct strbuf *hdr, char *service_name) struct rpc_service *svc = select_service(hdr, service_name); struct strbuf buf = STRBUF_INIT; + if (!strcmp(service_name, "git-upload-archive")) { + /* git-upload-archive doesn't need --stateless-rpc */ + argv[1] = "."; + argv[2] = NULL; + } + strbuf_reset(); strbuf_addf(, "application/x-git-%s-request", svc->name); check_content_type(hdr, buf.buf); @@ -713,7 +720,8 @@ static struct service_cmd { {"GET", "/objects/pack/pack-[0-9a-f]{40}\\.idx$", get_idx_file}, {"POST", "/git-upload-pack$", service_rpc}, - {"POST", "/git-receive-pack$", service_rpc} + {"POST", "/git-receive-pack$", service_rpc}, + {"POST", "/git-upload-archive$", service_rpc}, }; static int bad_request(struct strbuf *hdr, const struct service_cmd *c) diff --git a/transport-helper.c b/transport-helper.c index 143ca008c..b4b96fc89 100644 --- a/transport-helper.c +++ b/transport-helper.c @@ -605,7 +605,8 @@ static int process_connect_service(struct transport *transport, ret = run_connect(transport, ); } else if (data->stateless_connect && (get_protocol_version_config() == protocol_v2) && - !strcmp("git-upload-pack", name)) { + (!strcmp("git-upload-pack", name) || + !strcmp("git-upload-archive", name))) { strbuf_addf(, "stateless-connect %s\n", name); ret = run_connect(transport, ); if (ret) @@ -639,7 +640,7 @@ static int connect_helper(struct transport *transport, const char *name, /* Get_helper so connect is inited. */ get_helper(transport); - if (!data->connect) + if (!data->connect && !data->stateless_connect) die(_("operation not supported by protocol")); if (!process_connect_service(transport, name, exec)) -- 2.19.0.397.gdd90340f6a-goog
Add proto v2 archive command with HTTP support
This series adds a new protocol v2 command for archiving, and allows this command to work over HTTP(S). This was previously discussed in [1]. I've CCed everyone who participated in that discussion. [1]: https://public-inbox.org/git/CANq=j3tk7qebjoc7vnwkh4+wbnibmjjp5yukd9te5naywuk...@mail.gmail.com/
[PATCH 1/3] archive: use packet_reader for communications
Using packet_reader will simplify version detection and capability handling, which will make implementation of protocol v2 support in git-archive easier. Signed-off-by: Josh Steadmon --- builtin/archive.c | 23 ++- 1 file changed, 14 insertions(+), 9 deletions(-) diff --git a/builtin/archive.c b/builtin/archive.c index e74f67539..e54fc39ad 100644 --- a/builtin/archive.c +++ b/builtin/archive.c @@ -27,10 +27,11 @@ static int run_remote_archiver(int argc, const char **argv, const char *remote, const char *exec, const char *name_hint) { - char *buf; int fd[2], i, rv; struct transport *transport; struct remote *_remote; + struct packet_reader reader; + enum packet_read_status status; _remote = remote_get(remote); if (!_remote->url[0]) @@ -38,6 +39,8 @@ static int run_remote_archiver(int argc, const char **argv, transport = transport_get(_remote, _remote->url[0]); transport_connect(transport, "git-upload-archive", exec, fd); + packet_reader_init(, fd[0], NULL, 0, PACKET_READ_CHOMP_NEWLINE); + /* * Inject a fake --format field at the beginning of the * arguments, with the format inferred from our output @@ -53,18 +56,20 @@ static int run_remote_archiver(int argc, const char **argv, packet_write_fmt(fd[1], "argument %s\n", argv[i]); packet_flush(fd[1]); - buf = packet_read_line(fd[0], NULL); - if (!buf) + status = packet_reader_read(); + + if (status == PACKET_READ_FLUSH) die(_("git archive: expected ACK/NAK, got a flush packet")); - if (strcmp(buf, "ACK")) { - if (starts_with(buf, "NACK ")) - die(_("git archive: NACK %s"), buf + 5); - if (starts_with(buf, "ERR ")) - die(_("remote error: %s"), buf + 4); + if (strcmp(reader.buffer, "ACK")) { + if (starts_with(reader.buffer, "NACK ")) + die(_("git archive: NACK %s"), reader.buffer + 5); + if (starts_with(reader.buffer, "ERR ")) + die(_("remote error: %s"), reader.buffer + 4); die(_("git archive: protocol error")); } - if (packet_read_line(fd[0], NULL)) + status = packet_reader_read(); + if (status != PACKET_READ_FLUSH) die(_("git archive: expected a flush")); /* Now, start reading from fd[0] and spit it out to stdout */ -- 2.19.0.397.gdd90340f6a-goog
[PATCH v2] config: document value 2 for protocol.version
From: Brandon Williams Update the config documentation to note the value `2` as an acceptable value for the protocol.version config. Signed-off-by: Brandon Williams Signed-off-by: Josh Steadmon --- Documentation/config.txt | 2 ++ 1 file changed, 2 insertions(+) diff --git a/Documentation/config.txt b/Documentation/config.txt index eb66a1197..ee3b5dd8e 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -2828,6 +2828,8 @@ protocol.version:: * `1` - the original wire protocol with the addition of a version string in the initial response from the server. +* `2` - link:technical/protocol-v2.html[wire protocol version 2]. + -- pull.ff:: -- 2.19.0.rc2.392.g5ba43deb5a-goog
Proposed approaches to supporting HTTP remotes in "git archive"
# Supporting HTTP remotes in "git archive" We would like to allow remote archiving from HTTP servers. There are a few possible implementations to be discussed: ## Shallow clone to temporary repo This approach builds on existing endpoints. Clients will connect to the remote server's git-upload-pack service and fetch a shallow clone of the requested commit into a temporary local repo. The write_archive() function is then called on the local clone to write out the requested archive. ### Benefits * This can be implemented entirely in builtin/archive.c. No new service endpoints or server code are required. * The archive is generated and compressed on the client side. This reduces CPU load on the server (for compressed archives) which would otherwise be a potential DoS vector. * This provides a git-native way to archive any HTTP servers that support the git-upload-pack service; some providers (including GitHub) do not currently allow the git-upload-archive service. ### Drawbacks * Archives generated remotely may not be bit-for-bit identical compared to those generated locally, if the versions of git used on the client and on the server differ. * This requires higher bandwidth compared to transferring a compressed archive generated on the server. ## Use git-upload-archive This approach requires adding support for the git-upload-archive endpoint to the HTTP backend. Clients will connect to the remote server's git-upload-archive service and the server will generate the archive which is then delivered to the client. ### Benefits * Matches existing "git archive" behavior for other remotes. * Requires less bandwidth to send a compressed archive than a shallow clone. * Resulting archive does not depend in any way on the client implementation. ### Drawbacks * Implementation is more complicated; it will require changes to (at least) builtin/archive.c, http-backend.c, and builtin/upload-archive.c. * Generates more CPU load on the server when compressing archives. This is potentially a DoS vector. * Does not allow archiving from servers that don't support the git-upload-archive service. ## Add a new protocol v2 "archive" command I am still a bit hazy on the exact details of this approach, please forgive any inaccuracies (I'm a new contributor and haven't examined custom v2 commands in much detail yet). This approach builds off the existing v2 upload-pack endpoint. The client will issue an archive command (with options to select particular paths or a tree-ish). The server will generate the archive and deliver it to the client. ### Benefits * Requires less bandwidth to send a compressed archive than a shallow clone. * Resulting archive does not depend in any way on the client implementation. ### Drawbacks * Generates more CPU load on the server when compressing archives. This is potentially a DoS vector. * Servers must support the v2 protocol (although the client could potentially fallback to some other supported remote archive functionality). ### Unknowns * I am not clear on the relative complexity of this approach compared to the others, and would appreciate any guidance offered. ## Summary Personally, I lean towards the first approach. It could give us an opportunity to remove server-side complexity; there is no reason that the shallow-clone approach must be restricted to the HTTP transport, and we could re-implement other transports using this method. Additionally, it would allow clients to pull archives from remotes that would not otherwise support it. That said, I am happy to work on whichever approach the community deems most worthwhile.
[PATCH] protocol-v2 doc: put HTTP headers after request
HTTP servers return 400 if you send headers before the GET request. Signed-off-by: Josh Steadmon --- Documentation/technical/protocol-v2.txt | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/Documentation/technical/protocol-v2.txt b/Documentation/technical/protocol-v2.txt index 49bda76d2..f58f24b1e 100644 --- a/Documentation/technical/protocol-v2.txt +++ b/Documentation/technical/protocol-v2.txt @@ -64,9 +64,8 @@ When using the http:// or https:// transport a client makes a "smart" info/refs request as described in `http-protocol.txt` and requests that v2 be used by supplying "version=2" in the `Git-Protocol` header. - C: Git-Protocol: version=2 - C: C: GET $GIT_URL/info/refs?service=git-upload-pack HTTP/1.0 + C: Git-Protocol: version=2 A v2 server would reply: -- 2.18.0.rc2.346.g013aa6912e-goog