Re: [PATCH 9/9] clone: run check_everything_connected

2013-03-31 Thread Duy Nguyen
On Thu, Mar 28, 2013 at 07:40:51AM +0700, Duy Nguyen wrote:
 Maybe we could do it in index-pack to save some (wall) time. I haven't
 tried but I think it might work. The problem is to make sure the pack
 contains objects for all sha1 references in the pack. By that
 description, we don't need to do standard DAG traversal. We could
 extract sha-1 references in index-pack as we uncompress objects and
 put all want sha-1 in a hash table. At the end of index-pack, we
 check if any sha-1 in the hash table still points to non-existing
 object.
 
 This way, at least we don't need to uncompress all objects again in
 rev-list. We could parse+hash in both phases in index-pack. The first
 phase (parse_pack_objects) is usually I/O bound, we could hide some
 cost there. The second phase is multithreaded, all the better.

It looks like what I describe above is exactly what index-pack
--strict does. Except that it holds the lock longer and has more
abstraction layers to slow things down. On linux-2.6 with 3 threads:

$ rev-list --all --objects --quiet (aka check_everything_connected)
34.26user 0.22system 0:34.56elapsed 99%CPU (0avgtext+0avgdata 
2550528maxresident)k
0inputs+0outputs (0major+208569minor)pagefaults 0swaps

$ index-pack --stdin
214.57user 8.38system 1:31.82elapsed 242%CPU (0avgtext+0avgdata 
1357328maxresident)k
8inputs+1421016outputs (0major+1222537minor)pagefaults 0swaps

$ index-pack --stdin --strict
297.36user 13.77system 2:11.82elapsed 236%CPU (0avgtext+0avgdata 
1875040maxresident)k
0inputs+1421016outputs (0major+1308718minor)pagefaults 0swaps

$ index-pack --stdin --connectivity
231.09user 7.42system 1:37.39elapsed 244%CPU (0avgtext+0avgdata 
2080816maxresident)k
0inputs+1421016outputs (0major+540069minor)pagefaults 0swaps

The last one does not hold locks by duplicating object hash table per
thread. As you can see the consumed memory is much higher than --stdin.
In return it only adds up 1/3 of rev-list time.

Maybe you should check which one is cheaper for clone case,
check_everything_connected() or index-pack --strict.
--
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 9/9] clone: run check_everything_connected

2013-03-26 Thread Junio C Hamano
Jeff King p...@peff.net writes:

 The slowdown is really quite terrible if you try git clone --bare
 linux-2.6.git. Even with this, the local-clone case already misses blob
 corruption. So it probably makes sense to restrict it to just the
 non-local clone case, which already has to do more work.

Probably.  We may want to enable fsck even for local clones in the
longer term and also have this check.  Those who know their filesystem
is trustworthy can do the filesystem-level copy with cp -R themselves
after all.

 Even still, it adds a non-trivial amount of work (linux-2.6 takes
 something like a minute to check). I don't like the idea of declaring
 git clone non-safe unless you turn on transfer.fsckObjects, though. It
 should have the same safety as git fetch.

True.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 9/9] clone: run check_everything_connected

2013-03-26 Thread Jeff King
On Tue, Mar 26, 2013 at 07:53:42AM +0700, Nguyen Thai Ngoc Duy wrote:

 On Tue, Mar 26, 2013 at 3:26 AM, Jeff King p...@peff.net wrote:
   static void update_remote_refs(const struct ref *refs,
 const struct ref *mapped_refs,
 const struct ref *remote_head_points_at,
 const char *branch_top,
 const char *msg)
   {
  +   const struct ref *rm = mapped_refs;
  +
  +   if (check_everything_connected(iterate_ref_map, 0, rm))
  +   die(_(remote did not send all necessary objects));
  +
  if (refs) {
  write_remote_refs(mapped_refs);
  if (option_single_branch)
 
 Maybe move this after checkout, so that I can switch terminal and
 start working while it's verifying? And maybe an option not to
 check_everything_connected, instead print a big fat warning telling
 the user to fsck later?

I tried to follow the fetch process of not installing the refs until we
had verified that the objects were reasonable. It probably doesn't
matter that much for clone, since you would not have simultaneous users
expecting the repository to be in a reasonable state until after clone
completes, though.

We also would have to tweak check_everything_connected, which does
something like --not --all to avoid rechecking objects we already
have. But that is not too hard to do.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 9/9] clone: run check_everything_connected

2013-03-25 Thread Duy Nguyen
On Tue, Mar 26, 2013 at 3:26 AM, Jeff King p...@peff.net wrote:
  static void update_remote_refs(const struct ref *refs,
const struct ref *mapped_refs,
const struct ref *remote_head_points_at,
const char *branch_top,
const char *msg)
  {
 +   const struct ref *rm = mapped_refs;
 +
 +   if (check_everything_connected(iterate_ref_map, 0, rm))
 +   die(_(remote did not send all necessary objects));
 +
 if (refs) {
 write_remote_refs(mapped_refs);
 if (option_single_branch)

Maybe move this after checkout, so that I can switch terminal and
start working while it's verifying? And maybe an option not to
check_everything_connected, instead print a big fat warning telling
the user to fsck later?
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html