Re: [PATCH v2 0/25] prune-safety

2014-10-18 Thread Jeff King
On Thu, Oct 16, 2014 at 05:39:18PM -0400, Jeff King wrote:

 @@ -301,6 +297,14 @@ static struct commit *handle_commit(struct rev_info 
 *revs,
   die(bad object %s, sha1_to_hex(tag-tagged-sha1));
   }
   object-flags |= flags;
 + /*
 +  * We'll handle the tagged object by looping or dropping
 +  * through to the non-tag handlers below. Do not
 +  * propagate data from the tag's pending entry.
 +  */
 + name = NULL;
 + path = NULL;
 + mode = 0;

Hmm. On second thought (and after seeing a warning from Coverity), this
should be:

diff --git a/revision.c b/revision.c
index 8030fc8..ebe3e93 100644
--- a/revision.c
+++ b/revision.c
@@ -302,7 +302,7 @@ static struct commit *handle_commit(struct rev_info *revs,
 * through to the non-tag handlers below. Do not
 * propagate data from the tag's pending entry.
 */
-   name = NULL;
+   name = ;
path = NULL;
mode = 0;
}

The rest of the function assumes that name is not NULL (which I'm not
sure is entirely safe, as add_pending_object can take a NULL; presumably
every add uses the empty string instead of NULL. But either way,
setting it to NULL here is definite wrong).

The path field is explicitly OK to be NULL.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/25] prune-safety

2014-10-16 Thread Junio C Hamano
Somewhere in this series the object enumeration seems to get
broken.  git clone --no-local of git.git repository died
with

Cloning into 'victim-7'...
remote: Counting objects: 173727, done.
remote: Compressing objects: 100% (43697/43697), done.
remote: Total 173727 (delta 130908), reused 170009 (delta 128151)
Receiving objects: 100% (173727/173727), 33.45 MiB | 13.69 MiB/s,
done.
Resolving deltas: 100% (130908/130908), done.
fatal: did not receive expected object 
a74380c3117994efba501df1707418eb6feb9284
fatal: index-pack failed

where a74380c... is such an object.

If you have a clone of https://github.com/git/git.git

$ git rev-list --parents --objects --all | grep  a74380c3117994

would be an easy way to reproduce.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/25] prune-safety

2014-10-16 Thread Junio C Hamano
Junio C Hamano gits...@pobox.com writes:

 Somewhere in this series the object enumeration seems to get
 broken.  git clone --no-local of git.git repository died
 with

 Cloning into 'victim-7'...
 remote: Counting objects: 173727, done.
 remote: Compressing objects: 100% (43697/43697), done.
 remote: Total 173727 (delta 130908), reused 170009 (delta 128151)
 Receiving objects: 100% (173727/173727), 33.45 MiB | 13.69 MiB/s,
 done.
 Resolving deltas: 100% (130908/130908), done.
 fatal: did not receive expected object 
 a74380c3117994efba501df1707418eb6feb9284
 fatal: index-pack failed

 where a74380c... is such an object.

Ehh, such is a nonsense.  It is a blob that directly is pointed by
a tag that is in refs/tags/*.

 If you have a clone of https://github.com/git/git.git

 $ git rev-list --parents --objects --all | grep  a74380c3117994

 would be an easy way to reproduce.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/25] prune-safety

2014-10-16 Thread Jeff King
On Thu, Oct 16, 2014 at 02:07:47PM -0700, Junio C Hamano wrote:

 Somewhere in this series the object enumeration seems to get
 broken.  git clone --no-local of git.git repository died
 with
 
 Cloning into 'victim-7'...
 remote: Counting objects: 173727, done.
 remote: Compressing objects: 100% (43697/43697), done.
 remote: Total 173727 (delta 130908), reused 170009 (delta 128151)
 Receiving objects: 100% (173727/173727), 33.45 MiB | 13.69 MiB/s,
 done.
 Resolving deltas: 100% (130908/130908), done.
 fatal: did not receive expected object 
 a74380c3117994efba501df1707418eb6feb9284
 fatal: index-pack failed
 
 where a74380c... is such an object.
 
 If you have a clone of https://github.com/git/git.git

Hrm. I cannot reproduce the clone failure...

 $ git rev-list --parents --objects --all | grep  a74380c3117994
 
 would be an easy way to reproduce.

But this I can. Digging into it...

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/25] prune-safety

2014-10-16 Thread Jeff King
On Thu, Oct 16, 2014 at 05:21:12PM -0400, Jeff King wrote:

 Hrm. I cannot reproduce the clone failure...

Oh, because I have bitmaps turned on, and we do not run the list-objects
code at all in that case.

The bug unsurprisingly bisects to traverse_commit_list: support
pending blobs/trees with paths. The problem is that I didn't notice
that handle_commit actually rewrites the object pointer when
unwrapping the tags, and that commit reuses the original pointer from
the entry. So we end up putting two copies of the tag object into the
pending list, rather than the tag and the blob.

The fix is:

diff --git a/revision.c b/revision.c
index 9a0f99a..8030fc8 100644
--- a/revision.c
+++ b/revision.c
@@ -231,12 +231,6 @@ static void add_pending_object_with_mode(struct rev_info 
*revs,
add_pending_object_with_path(revs, obj, name, mode, NULL);
 }
 
-static void add_pending_object_from_entry(struct rev_info *revs,
- struct object_array_entry *e)
-{
-   add_pending_object_with_path(revs, e-item, e-name, e-mode, e-path);
-}
-
 void add_pending_object(struct rev_info *revs,
struct object *obj, const char *name)
 {
@@ -283,6 +277,8 @@ static struct commit *handle_commit(struct rev_info *revs,
 {
struct object *object = entry-item;
const char *name = entry-name;
+   const char *path = entry-path;
+   unsigned int mode = entry-mode;
unsigned long flags = object-flags;
 
/*
@@ -301,6 +297,14 @@ static struct commit *handle_commit(struct rev_info *revs,
die(bad object %s, sha1_to_hex(tag-tagged-sha1));
}
object-flags |= flags;
+   /*
+* We'll handle the tagged object by looping or dropping
+* through to the non-tag handlers below. Do not
+* propagate data from the tag's pending entry.
+*/
+   name = NULL;
+   path = NULL;
+   mode = 0;
}
 
/*
@@ -332,7 +336,7 @@ static struct commit *handle_commit(struct rev_info *revs,
mark_tree_contents_uninteresting(tree);
return NULL;
}
-   add_pending_object_from_entry(revs, entry);
+   add_pending_object_with_path(revs, object, name, mode, path);
return NULL;
}
 
@@ -344,7 +348,7 @@ static struct commit *handle_commit(struct rev_info *revs,
return NULL;
if (flags  UNINTERESTING)
return NULL;
-   add_pending_object_from_entry(revs, entry);
+   add_pending_object_with_path(revs, object, name, mode, path);
return NULL;
}
die(%s is unknown object, name);

We should probably add a test for cloning a tag of an otherwise
unreferenced object, too.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/25] prune-safety

2014-10-16 Thread Junio C Hamano
Jeff King p...@peff.net writes:

 On Thu, Oct 16, 2014 at 05:21:12PM -0400, Jeff King wrote:

 Hrm. I cannot reproduce the clone failure...

 Oh, because I have bitmaps turned on, and we do not run the list-objects
 code at all in that case.

;-)

 We should probably add a test for cloning a tag of an otherwise
 unreferenced object, too.

Yeah, that too ;-)

Thanks for a quick diag.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/25] prune-safety

2014-10-16 Thread Jeff King
On Thu, Oct 16, 2014 at 03:18:34PM -0700, Junio C Hamano wrote:

  We should probably add a test for cloning a tag of an otherwise
  unreferenced object, too.
 
 Yeah, that too ;-)

Here's that test (after the scissors below). It can be applied totally
separately, though I stuck it in as patch 18.5/25 in the existing
series to confirm that my original series does cause the test to fail,
and that it passes with the patch I posted earlier.

Do you want to just squash the fix I posted earlier (into patch 19, the
traverse_commit_list: ... one)? I'm happy to repost the revised patch,
but my impression of your workflow is that squashing is just as easy
than replacing a patch (i.e., you're running rebase -i either way).

Or I'm happy to re-post the whole series, but it's rather long. :)

 Thanks for a quick diag.

I'm impressed that you found the bug so quickly. :) My biggest fear with
the whole series is that it's disrupting and refactoring some
fundamental parts of the code that might cause regressions. I put a lot
of my faith in the test suite, but that did not work out here (I do
still feel pretty good about the series overall, though I think I'd cook
it longer than usual given the complexity and the areas it touches).

-- 8 --
Subject: t5516: test pushing a tag of an otherwise unreferenced blob

It's not unreasonable to have a tag that points to a blob
that is not part of the normal history. We do this in
git.git to distribute gpg keys. However, we never explicitly
checked in our test suite that this actually works (i.e.,
that pack-objects actually sends the blob because of the tag
mentioning it).

It does in fact work fine, but a recent patch under
discussion broke this, and the test suite didn't notice.
Let's make the test suite more complete.

Signed-off-by: Jeff King p...@peff.net
---
The should have below is belt-and-suspenders. The test actually fails
with my series without the cat-file check, but I'm concerned a bug
that affects pack-objects could also affect the connectivity check on
the receiving side.

 t/t5516-fetch-push.sh | 13 +
 1 file changed, 13 insertions(+)

diff --git a/t/t5516-fetch-push.sh b/t/t5516-fetch-push.sh
index 67e0ab3..7c8a769 100755
--- a/t/t5516-fetch-push.sh
+++ b/t/t5516-fetch-push.sh
@@ -1277,4 +1277,17 @@ EOF
git push --no-thin --receive-pack=$rcvpck no-thin/.git 
refs/heads/master:refs/heads/foo
 '
 
+test_expect_success 'pushing a tag pushes the tagged object' '
+   rm -rf dst.git 
+   blob=$(echo unreferenced | git hash-object -w --stdin) 
+   git tag -m foo tag-of-blob $blob 
+   git init --bare dst.git 
+   git push dst.git tag-of-blob 
+   # the receiving index-pack should have noticed
+   # any problems, but we double check
+   echo unreferenced expect 
+   git --git-dir=dst.git cat-file blob tag-of-blob actual 
+   test_cmp expect actual
+'
+
 test_done
-- 
2.1.2.596.g7379948

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/25] prune-safety

2014-10-16 Thread Jeff King
On Thu, Oct 16, 2014 at 09:13:39PM -0700, Junio C Hamano wrote:

 Just being curious, but would the same bug, if allowed to be triggered
 cutting repacking of your repository, have corrupted the resulting bitmap?

I didn't test, but yes, almost certainly. The bug was in list-objects.c,
which is used by pack-objects to generate the list of objects to pack,
as well as to build the bitmaps. So not only would it have corrupted the
bitmaps, a `git repack -ad`[1] would have dropped the object completely,
corrupting the repository!

-Peff

[1] git-gc uses `repack -Ad`, of course. So assuming you had packed more
recently than 2 weeks ago, it would have just been ejected to a
loose object. Small comfort. :)

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/25] prune-safety

2014-10-15 Thread Jeff King
Here's a re-roll of the patch series I posted earlier to make git
prune keep more contiguous chunks of the object graph. The cleanups to
t5304 were spun off into their own series, and are dropped here.
However, the other patches seem to have multiplied in number (I must
have fed them after midnight).

Here are the changes since the first round (thanks everybody for your
comments):

  - fix bogus return values from freshen_file, foreach_alt_odb, and
for_each_packed_object

  - make for_each_object_in_pack static

  - clarify commit message for keep objects reachable from recent
objects patch (this was the one that confused Junio, and I
elaborated based on our discussion)

  - clarify the definition of loose object dirs in the comment above
for_each_loose_file_in_object_dir

  - in for_each_loose_file, traverse hashed loose object directories in
numeric order, and pass the number to the subdir callback (this is
used by prune-packed for its progress updates); as a side effect,
this fixes the bugs Michael noticed with the subdir callback.

  - prune-packed now reuses the for_each_loose_file interface

  - use revs-ignore_missing_links so we don't barf on already-missing
unreferenced objects

  - convert reachable.c to use traverse_commit_list instead of its own
custom walk; this gives support for ignore_missing_links above, and
saves us a fair bit of code.

  - while in the area, I noticed that reachable.c's reflog handling is
the same as rev-list's --reflog option; it now builds on what's in
revision.c.

That takes us up to patch 17. While working in reachable.c, I noticed an
oddity: we consider objects in the index to be reachable during prune
(which is good), but we do not when dropping them during a repack that
uses --unpack-unreachable=expiration. The remaining patches fix that,
which needed a fair bit of preparatory cleanup.

I'm really beginning to question whether the just drop objects that are
about to be pruned optimization done in 7e52f56 (gc: do not explode
objects which will be immediately pruned, 2012-04-07). It really
complicates things as pack-objects and prune need to have the exact same
rules (and implementing it naively, by having pack-objects run the same
code as prune, is not desirable because pack-objects has _already_ done
a complete expensive traversal to generate the packing list). And I fear
it will get even worse if we implement some of the race-condition fixes
that Michael suggested earlier.

On the other hand, the loosening behavior without 7e52f56 has some
severe pathological cases. A repository which has had a chunk of history
deleted can easily increase in size several orders of magnitude due to
loosening (since we lose the benefit of all deltas in the loosened
objects).

Finally, there are a few things that were discussed that I didn't
address/fix. I don't think any of them is a critical blocker, but I
did want to summarize the state:

  - when refreshing, we may update a pack's mtime multiple times. It
probably wouldn't be too hard to cache this and only update once per
program run, but I also don't think it's that big a deal in
practice.

  - We will munge mtimes of objects found in alternates. If we don't
have write access to the alternate, we'll create a local duplicate
of the object. This is the safer thing, but I'm not sure if there
are cases where we might try to write out a large number of objects
which exist in an alternate (OTOH, we will eventually drop them at
the next repack).

  - I didn't implement the sort by inode trick that fsck does when
traversing the loose objects. It wouldn't be too hard, but I'm not
convinced it's actually important.

  - I didn't convert fsck to the for_each_loose_file interface (mostly
because I didn't do the inode-sorting trick, and while I don't think
it matters, I didn't go to the work to show that it _doesn't_).

Here are the patches:

  [01/25]: foreach_alt_odb: propagate return value from callback
  [02/25]: isxdigit: cast input to unsigned char
  [03/25]: object_array: factor out slopbuf-freeing logic
  [04/25]: object_array: add a clear function
  [05/25]: clean up name allocation in prepare_revision_walk
  [06/25]: reachable: use traverse_commit_list instead of custom walk
  [07/25]: reachable: reuse revision.c add all reflogs code
  [08/25]: prune: factor out loose-object directory traversal
  [09/25]: reachable: mark index blobs as SEEN
  [10/25]: prune-packed: use for_each_loose_file_in_objdir
  [11/25]: count-objects: do not use xsize_t when counting object size
  [12/25]: count-objects: use for_each_loose_file_in_objdir
  [13/25]: sha1_file: add for_each iterators for loose and packed objects
  [14/25]: prune: keep objects reachable from recent objects
  [15/25]: pack-objects: refactor unpack-unreachable expiration check
  [16/25]: pack-objects: match prune logic for discarding objects
  [17/25]: write_sha1_file: freshen