Re: [PATCH] Add failing test for fetching from multiple packs over dumb httpd
On Tue, Jan 27, 2015 at 01:12:21PM -0500, Jeff King wrote: On Tue, Jan 27, 2015 at 03:20:41PM +, Charles Bailey wrote: From: Charles Bailey cbaile...@bloomberg.net When objects are spread across multiple packs, if an initial fetch does require all pack files, a subsequent fetch for objects in packs not retrieved in the initial fetch will fail. s/does/does not/, I think? Yes, that's definitely what I meant to write. [...] It looks like the culprit is 7b64469 (Allow parse_pack_index on temporary files, 2010-04-19). It added a new idx_path parameter to parse_pack_index, which we pass as NULL. That causes its call to check_packed_git_idx to fail (because it has no idea what file we are talking about!). That change looks like it went into 1.7.1.1. I cannot confirm this working before then but we've definitely seen the bug in 1.7.12.3 and more recent versions. This seems to fix it: diff --git a/sha1_file.c b/sha1_file.c index 30995e6..eda4d90 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -1149,6 +1149,9 @@ struct packed_git *parse_pack_index(unsigned char *sha1, const char *idx_path) const char *path = sha1_pack_name(sha1); struct packed_git *p = alloc_packed_git(strlen(path) + 1); + if (!idx_path) + idx_path = sha1_pack_index_name(sha1); + strcpy(p-pack_name, path); hashcpy(p-sha1, sha1); if (check_packed_git_idx(idx_path, p)) { It certainly fixes my test script and I can give this patch a test in the 'real' world. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Add failing test for fetching from multiple packs over dumb httpd
On Tue, Jan 27, 2015 at 03:20:41PM +, Charles Bailey wrote: From: Charles Bailey cbaile...@bloomberg.net When objects are spread across multiple packs, if an initial fetch does require all pack files, a subsequent fetch for objects in packs not retrieved in the initial fetch will fail. s/does/does not/, I think? I'm not very familiar with the http client code so this analysis is based purely on observed behaviour. Debugging the http code is a royal pain because all the work happens in a separate helper. I use a git-remote-debug script like this: #!/bin/sh host=localhost:5001 proto=$(echo ${2:-$1} | sed 's/:.*//') prog=git-remote-$proto echo 2 gdb -ex 'target remote $host' $prog gdbserver localhost:5001 $prog $@ and then you can use: git fetch debug::http://... in the test script, cut-and-paste the gdb command printed to stderr, and you're dropped into the appropriate debugger without worrying about all of the stdio mess. When fetching only some refs from a repository served over dumb httpd Git appears to download all of the index files for the available packs but then only chooses the pack files that help it resolve the objects which we need. Right. And it looks like we have special code in sha1_file.c to make sure we do not trust an index which does not have a matching packfile. So that's good. The http-walker code does its own check, in fetch_and_setup_pack_index, that checks for an existing valid copy of the index. If we don't have it, we download the index and proceed. If we do, we skip straight to grabbing the pack. But if we have it and it doesn't appear valid, we return an error. And there seems to be a bug with checking the validity. It looks like the culprit is 7b64469 (Allow parse_pack_index on temporary files, 2010-04-19). It added a new idx_path parameter to parse_pack_index, which we pass as NULL. That causes its call to check_packed_git_idx to fail (because it has no idea what file we are talking about!). This seems to fix it: diff --git a/sha1_file.c b/sha1_file.c index 30995e6..eda4d90 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -1149,6 +1149,9 @@ struct packed_git *parse_pack_index(unsigned char *sha1, const char *idx_path) const char *path = sha1_pack_name(sha1); struct packed_git *p = alloc_packed_git(strlen(path) + 1); + if (!idx_path) + idx_path = sha1_pack_index_name(sha1); + strcpy(p-pack_name, path); hashcpy(p-sha1, sha1); if (check_packed_git_idx(idx_path, p)) { (Alternatively, we could pass in sha1_pack_index_name instead of NULL in the first place, but I think it is reasonable for parse_pack_index to take care of this). I think it may also make sense for fetch_and_setup_pack_index to delete and re-download a broken .idx file (rather than aborting), but I don't think that's a big deal. It should only happen in the face of on-disk data corruption, and the user can remove the broken .idx themselves. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Add failing test for fetching from multiple packs over dumb httpd
From: Charles Bailey cbaile...@bloomberg.net When objects are spread across multiple packs, if an initial fetch does require all pack files, a subsequent fetch for objects in packs not retrieved in the initial fetch will fail. --- I'm not very familiar with the http client code so this analysis is based purely on observed behaviour. When fetching only some refs from a repository served over dumb httpd Git appears to download all of the index files for the available packs but then only chooses the pack files that help it resolve the objects which we need. If we then later try to fetch an object which is in a pack file for which Git has previously downloaded an index file, it seems to trip because it believes it already has the object locally due to the presence of the index file but doesn't actually have it because it never retrieved the corresponding pack file. It reports an error of the form Cannot obtain needed object Manually deleting index files which have no corresponding local pack file will allow a repeat of the failed fetch to succeed. t/t5550-http-fetch-dumb.sh | 18 ++ 1 file changed, 18 insertions(+) diff --git a/t/t5550-http-fetch-dumb.sh b/t/t5550-http-fetch-dumb.sh index ac71418..cf2362a 100755 --- a/t/t5550-http-fetch-dumb.sh +++ b/t/t5550-http-fetch-dumb.sh @@ -165,6 +165,24 @@ test_expect_success 'fetch notices corrupt idx' ' ) ' +test_expect_failure 'fetch packed branches' ' + git checkout --orphan branch1 + echo base file + git add file + git commit -m base + git --bare init $HTTPD_DOCUMENT_ROOT_PATH/repo_packed_branches.git + git push $HTTPD_DOCUMENT_ROOT_PATH/repo_packed_branches.git branch1 + git --git-dir=$HTTPD_DOCUMENT_ROOT_PATH/repo_packed_branches.git repack -d + git checkout -b branch2 branch1 + echo b2 file + git commit -a -m b2 + git push $HTTPD_DOCUMENT_ROOT_PATH/repo_packed_branches.git branch2 + git --git-dir=$HTTPD_DOCUMENT_ROOT_PATH/repo_packed_branches.git repack -d + git --bare init clone_packed_branches.git + git --git-dir=clone_packed_branches.git fetch $HTTPD_URL/dumb/repo_packed_branches.git branch1:branch1 + git --git-dir=clone_packed_branches.git fetch $HTTPD_URL/dumb/repo_packed_branches.git branch2:branch2 +' + test_expect_success 'did not use upload-pack service' ' grep '/git-upload-pack' $HTTPD_ROOT_PATH/access.log act : exp -- 2.0.2.611.g8c85416 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html