Re: [PATCH] Add failing test for fetching from multiple packs over dumb httpd

2015-01-27 Thread Charles Bailey
On Tue, Jan 27, 2015 at 01:12:21PM -0500, Jeff King wrote:
 On Tue, Jan 27, 2015 at 03:20:41PM +, Charles Bailey wrote:
 
  From: Charles Bailey cbaile...@bloomberg.net
  
  When objects are spread across multiple packs, if an initial fetch does
  require all pack files, a subsequent fetch for objects in packs not
  retrieved in the initial fetch will fail.
 
 s/does/does not/, I think?

Yes, that's definitely what I meant to write.

[...]
 It looks like the culprit is 7b64469 (Allow parse_pack_index on
 temporary files, 2010-04-19). It added a new idx_path parameter to
 parse_pack_index, which we pass as NULL.  That causes its call to
 check_packed_git_idx to fail (because it has no idea what file we are
 talking about!).

That change looks like it went into 1.7.1.1. I cannot confirm this
working before then but we've definitely seen the bug in 1.7.12.3 and
more recent versions.

 This seems to fix it:
 
 diff --git a/sha1_file.c b/sha1_file.c
 index 30995e6..eda4d90 100644
 --- a/sha1_file.c
 +++ b/sha1_file.c
 @@ -1149,6 +1149,9 @@ struct packed_git *parse_pack_index(unsigned char 
 *sha1, const char *idx_path)
   const char *path = sha1_pack_name(sha1);
   struct packed_git *p = alloc_packed_git(strlen(path) + 1);
  
 + if (!idx_path)
 + idx_path = sha1_pack_index_name(sha1);
 +
   strcpy(p-pack_name, path);
   hashcpy(p-sha1, sha1);
   if (check_packed_git_idx(idx_path, p)) {

It certainly fixes my test script and I can give this patch a test in
the 'real' world.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Add failing test for fetching from multiple packs over dumb httpd

2015-01-27 Thread Jeff King
On Tue, Jan 27, 2015 at 03:20:41PM +, Charles Bailey wrote:

 From: Charles Bailey cbaile...@bloomberg.net
 
 When objects are spread across multiple packs, if an initial fetch does
 require all pack files, a subsequent fetch for objects in packs not
 retrieved in the initial fetch will fail.

s/does/does not/, I think?

 I'm not very familiar with the http client code so this analysis is based
 purely on observed behaviour.

Debugging the http code is a royal pain because all the work happens in
a separate helper. I use a git-remote-debug script like this:

  #!/bin/sh
  host=localhost:5001
  proto=$(echo ${2:-$1} | sed 's/:.*//')
  prog=git-remote-$proto
  echo 2 gdb -ex 'target remote $host' $prog
  gdbserver localhost:5001 $prog $@

and then you can use:

  git fetch debug::http://...

in the test script, cut-and-paste the gdb command printed to stderr, and
you're dropped into the appropriate debugger without worrying about all
of the stdio mess.

 When fetching only some refs from a repository served over dumb httpd Git
 appears to download all of the index files for the available packs but then
 only chooses the pack files that help it resolve the objects which we need.

Right. And it looks like we have special code in sha1_file.c to make
sure we do not trust an index which does not have a matching packfile.
So that's good.

The http-walker code does its own check, in fetch_and_setup_pack_index,
that checks for an existing valid copy of the index. If we don't have
it, we download the index and proceed. If we do, we skip straight to
grabbing the pack. But if we have it and it doesn't appear valid, we
return an error. And there seems to be a bug with checking the validity.

It looks like the culprit is 7b64469 (Allow parse_pack_index on
temporary files, 2010-04-19). It added a new idx_path parameter to
parse_pack_index, which we pass as NULL.  That causes its call to
check_packed_git_idx to fail (because it has no idea what file we are
talking about!).

This seems to fix it:

diff --git a/sha1_file.c b/sha1_file.c
index 30995e6..eda4d90 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1149,6 +1149,9 @@ struct packed_git *parse_pack_index(unsigned char *sha1, 
const char *idx_path)
const char *path = sha1_pack_name(sha1);
struct packed_git *p = alloc_packed_git(strlen(path) + 1);
 
+   if (!idx_path)
+   idx_path = sha1_pack_index_name(sha1);
+
strcpy(p-pack_name, path);
hashcpy(p-sha1, sha1);
if (check_packed_git_idx(idx_path, p)) {

(Alternatively, we could pass in sha1_pack_index_name instead of NULL in
the first place, but I think it is reasonable for parse_pack_index to
take care of this).

I think it may also make sense for fetch_and_setup_pack_index to delete
and re-download a broken .idx file (rather than aborting), but I don't
think that's a big deal. It should only happen in the face of on-disk
data corruption, and the user can remove the broken .idx themselves.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Add failing test for fetching from multiple packs over dumb httpd

2015-01-27 Thread Charles Bailey
From: Charles Bailey cbaile...@bloomberg.net

When objects are spread across multiple packs, if an initial fetch does
require all pack files, a subsequent fetch for objects in packs not
retrieved in the initial fetch will fail.
---

I'm not very familiar with the http client code so this analysis is based
purely on observed behaviour.

When fetching only some refs from a repository served over dumb httpd Git
appears to download all of the index files for the available packs but then
only chooses the pack files that help it resolve the objects which we need.

If we then later try to fetch an object which is in a pack file for
which Git has previously downloaded an index file, it seems to trip because it
believes it already has the object locally due to the presence of the index
file but doesn't actually have it because it never retrieved the corresponding
pack file. It reports an error of the form Cannot obtain needed object 

Manually deleting index files which have no corresponding local pack
file will allow a repeat of the failed fetch to succeed.

 t/t5550-http-fetch-dumb.sh | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/t/t5550-http-fetch-dumb.sh b/t/t5550-http-fetch-dumb.sh
index ac71418..cf2362a 100755
--- a/t/t5550-http-fetch-dumb.sh
+++ b/t/t5550-http-fetch-dumb.sh
@@ -165,6 +165,24 @@ test_expect_success 'fetch notices corrupt idx' '
)
 '
 
+test_expect_failure 'fetch packed branches' '
+   git checkout --orphan branch1 
+   echo base file 
+   git add file 
+   git commit -m base 
+   git --bare init $HTTPD_DOCUMENT_ROOT_PATH/repo_packed_branches.git 
+   git push $HTTPD_DOCUMENT_ROOT_PATH/repo_packed_branches.git branch1 
+   git --git-dir=$HTTPD_DOCUMENT_ROOT_PATH/repo_packed_branches.git 
repack -d 
+   git checkout -b branch2 branch1 
+   echo b2 file 
+   git commit -a -m b2 
+   git push $HTTPD_DOCUMENT_ROOT_PATH/repo_packed_branches.git branch2 
+   git --git-dir=$HTTPD_DOCUMENT_ROOT_PATH/repo_packed_branches.git 
repack -d 
+   git --bare init clone_packed_branches.git 
+   git --git-dir=clone_packed_branches.git fetch 
$HTTPD_URL/dumb/repo_packed_branches.git branch1:branch1 
+   git --git-dir=clone_packed_branches.git fetch 
$HTTPD_URL/dumb/repo_packed_branches.git branch2:branch2
+'
+
 test_expect_success 'did not use upload-pack service' '
grep '/git-upload-pack' $HTTPD_ROOT_PATH/access.log act
: exp
-- 
2.0.2.611.g8c85416

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html