Re: [PATCH 9/9] clone: run check_everything_connected
On Thu, Mar 28, 2013 at 07:40:51AM +0700, Duy Nguyen wrote: Maybe we could do it in index-pack to save some (wall) time. I haven't tried but I think it might work. The problem is to make sure the pack contains objects for all sha1 references in the pack. By that description, we don't need to do standard DAG traversal. We could extract sha-1 references in index-pack as we uncompress objects and put all want sha-1 in a hash table. At the end of index-pack, we check if any sha-1 in the hash table still points to non-existing object. This way, at least we don't need to uncompress all objects again in rev-list. We could parse+hash in both phases in index-pack. The first phase (parse_pack_objects) is usually I/O bound, we could hide some cost there. The second phase is multithreaded, all the better. It looks like what I describe above is exactly what index-pack --strict does. Except that it holds the lock longer and has more abstraction layers to slow things down. On linux-2.6 with 3 threads: $ rev-list --all --objects --quiet (aka check_everything_connected) 34.26user 0.22system 0:34.56elapsed 99%CPU (0avgtext+0avgdata 2550528maxresident)k 0inputs+0outputs (0major+208569minor)pagefaults 0swaps $ index-pack --stdin 214.57user 8.38system 1:31.82elapsed 242%CPU (0avgtext+0avgdata 1357328maxresident)k 8inputs+1421016outputs (0major+1222537minor)pagefaults 0swaps $ index-pack --stdin --strict 297.36user 13.77system 2:11.82elapsed 236%CPU (0avgtext+0avgdata 1875040maxresident)k 0inputs+1421016outputs (0major+1308718minor)pagefaults 0swaps $ index-pack --stdin --connectivity 231.09user 7.42system 1:37.39elapsed 244%CPU (0avgtext+0avgdata 2080816maxresident)k 0inputs+1421016outputs (0major+540069minor)pagefaults 0swaps The last one does not hold locks by duplicating object hash table per thread. As you can see the consumed memory is much higher than --stdin. In return it only adds up 1/3 of rev-list time. Maybe you should check which one is cheaper for clone case, check_everything_connected() or index-pack --strict. -- Duy -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 9/9] clone: run check_everything_connected
Jeff King p...@peff.net writes: The slowdown is really quite terrible if you try git clone --bare linux-2.6.git. Even with this, the local-clone case already misses blob corruption. So it probably makes sense to restrict it to just the non-local clone case, which already has to do more work. Probably. We may want to enable fsck even for local clones in the longer term and also have this check. Those who know their filesystem is trustworthy can do the filesystem-level copy with cp -R themselves after all. Even still, it adds a non-trivial amount of work (linux-2.6 takes something like a minute to check). I don't like the idea of declaring git clone non-safe unless you turn on transfer.fsckObjects, though. It should have the same safety as git fetch. True. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 9/9] clone: run check_everything_connected
On Tue, Mar 26, 2013 at 07:53:42AM +0700, Nguyen Thai Ngoc Duy wrote: On Tue, Mar 26, 2013 at 3:26 AM, Jeff King p...@peff.net wrote: static void update_remote_refs(const struct ref *refs, const struct ref *mapped_refs, const struct ref *remote_head_points_at, const char *branch_top, const char *msg) { + const struct ref *rm = mapped_refs; + + if (check_everything_connected(iterate_ref_map, 0, rm)) + die(_(remote did not send all necessary objects)); + if (refs) { write_remote_refs(mapped_refs); if (option_single_branch) Maybe move this after checkout, so that I can switch terminal and start working while it's verifying? And maybe an option not to check_everything_connected, instead print a big fat warning telling the user to fsck later? I tried to follow the fetch process of not installing the refs until we had verified that the objects were reasonable. It probably doesn't matter that much for clone, since you would not have simultaneous users expecting the repository to be in a reasonable state until after clone completes, though. We also would have to tweak check_everything_connected, which does something like --not --all to avoid rechecking objects we already have. But that is not too hard to do. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 9/9] clone: run check_everything_connected
On Tue, Mar 26, 2013 at 3:26 AM, Jeff King p...@peff.net wrote: static void update_remote_refs(const struct ref *refs, const struct ref *mapped_refs, const struct ref *remote_head_points_at, const char *branch_top, const char *msg) { + const struct ref *rm = mapped_refs; + + if (check_everything_connected(iterate_ref_map, 0, rm)) + die(_(remote did not send all necessary objects)); + if (refs) { write_remote_refs(mapped_refs); if (option_single_branch) Maybe move this after checkout, so that I can switch terminal and start working while it's verifying? And maybe an option not to check_everything_connected, instead print a big fat warning telling the user to fsck later? -- Duy -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html