Re: [PATCH v2 11/19] pack-objects: use bitmaps when packing objects
On Thu, Oct 31, 2013 at 3:07 AM, Jeff King p...@peff.net wrote: I think there are two cases that we need to consider: 1. We have a full repo and somebody requests a shallow clone for us. We probably do not want to use bitmaps here. In the series we have been testing, shallow clones turned off bitmaps because we do not use the internal rev_list. But as of cdab485 (upload-pack: delegate rev walking in shallow fetch to pack-objects), that distinction doesn't hold. I think we can check the use of --shallow-file instead of explicitly turning off bitmaps there. There's an (non-existing yet) case 1': somebody requests a clone and the source clone is already shallow. is_repository_shallow() could catch both cases. 2. We have a shallow clone that wants to repack. We probably want to turn off bitmap writing here. I don't think that grafts actually matter here, because pack-objects should always be looking at the true graph. It would mean that using git rev-list --use-bitmap-index does not respect the grafts, and we should probably disable it in that case (and ditto for replacements). Right. I forgot that the repo must be complete before it's grafted. -- Duy -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 11/19] pack-objects: use bitmaps when packing objects
On Sat, Oct 26, 2013 at 05:25:14PM +0700, Nguyen Thai Ngoc Duy wrote: For bitmaps to be used, the following must be true: 1. We must be packing to stdout (as a normal `pack-objects` from `upload-pack` would do). 2. There must be a .bitmap index containing at least one of the have objects that the client is asking for. 3. Bitmaps must be enabled (they are enabled by default, but can be disabled by setting `pack.usebitmaps` to false, or by using `--no-use-bitmap-index` on the command-line). If any of these is not true, we fall back to doing a normal walk of the object graph. I haven't read the bitmap creation code yet. But it probably does not matter. If the client requests a shallow fetch, you probably want to fall back to normal walk too. One other criterion I should have mentioned: we must be using the internal rev-list. That prevented us in v1.8.4.1 and earlier from using bitmaps for shallow fetches. But as of v1.8.4.2, we always use pack-objects' rev-walker. We may need to pass --no-use-bitmap-index for shallow fetches. As for repos that are themselves shallow, I do not know how doing a repack -b would fare. Probably not well. Bitmaps may be made work with shallow fetches too, I'm not sure. We could substract the shallow'd commits out. The problem is if some other commits share parts of the shallow'd commits, I'm not sure how to detect that. Yeah, I do not think shallow follows the same have/want reachability rules that we rely on for taking the set difference via bitmaps. It may be made to work eventually, but I think the important thing at this point is making sure we properly fall back to the slow method when shallow. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 11/19] pack-objects: use bitmaps when packing objects
On Wed, Oct 30, 2013 at 2:36 PM, Jeff King p...@peff.net wrote: On Sat, Oct 26, 2013 at 05:25:14PM +0700, Nguyen Thai Ngoc Duy wrote: For bitmaps to be used, the following must be true: 1. We must be packing to stdout (as a normal `pack-objects` from `upload-pack` would do). 2. There must be a .bitmap index containing at least one of the have objects that the client is asking for. 3. Bitmaps must be enabled (they are enabled by default, but can be disabled by setting `pack.usebitmaps` to false, or by using `--no-use-bitmap-index` on the command-line). If any of these is not true, we fall back to doing a normal walk of the object graph. I haven't read the bitmap creation code yet. But it probably does not matter. If the client requests a shallow fetch, you probably want to fall back to normal walk too. One other criterion I should have mentioned: we must be using the internal rev-list. That prevented us in v1.8.4.1 and earlier from using bitmaps for shallow fetches. But as of v1.8.4.2, we always use pack-objects' rev-walker. We may need to pass --no-use-bitmap-index for shallow fetches. I don't think a new option is needed. The code just needs to check if there are any commit grafts. If there are, fall back to the old way. That covers both shallow fetches and some rare grafted repos. I think refs/replace/* does not impact rev walking, so we should be fine if it's used. As for repos that are themselves shallow, I do not know how doing a repack -b would fare. Probably not well. -- Duy -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 11/19] pack-objects: use bitmaps when packing objects
On Wed, Oct 30, 2013 at 05:28:08PM +0700, Nguyen Thai Ngoc Duy wrote: One other criterion I should have mentioned: we must be using the internal rev-list. That prevented us in v1.8.4.1 and earlier from using bitmaps for shallow fetches. But as of v1.8.4.2, we always use pack-objects' rev-walker. We may need to pass --no-use-bitmap-index for shallow fetches. I don't think a new option is needed. The code just needs to check if there are any commit grafts. If there are, fall back to the old way. That covers both shallow fetches and some rare grafted repos. I think refs/replace/* does not impact rev walking, so we should be fine if it's used. I think there are two cases that we need to consider: 1. We have a full repo and somebody requests a shallow clone for us. We probably do not want to use bitmaps here. In the series we have been testing, shallow clones turned off bitmaps because we do not use the internal rev_list. But as of cdab485 (upload-pack: delegate rev walking in shallow fetch to pack-objects), that distinction doesn't hold. I think we can check the use of --shallow-file instead of explicitly turning off bitmaps there. 2. We have a shallow clone that wants to repack. We probably want to turn off bitmap writing here. I don't think that grafts actually matter here, because pack-objects should always be looking at the true graph. It would mean that using git rev-list --use-bitmap-index does not respect the grafts, and we should probably disable it in that case (and ditto for replacements). -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 11/19] pack-objects: use bitmaps when packing objects
On Fri, Oct 25, 2013 at 1:03 PM, Jeff King p...@peff.net wrote: From: Vicent Marti tan...@gmail.com In this patch, we use the bitmap API to perform the `Counting Objects` phase in pack-objects, rather than a traditional walk through the object graph. For a reasonably-packed large repo, the time to fetch and clone is often dominated by the full-object revision walk during the Counting Objects phase. Using bitmaps can reduce the CPU time required on the server (and therefore start sending the actual pack data with less delay). For bitmaps to be used, the following must be true: 1. We must be packing to stdout (as a normal `pack-objects` from `upload-pack` would do). 2. There must be a .bitmap index containing at least one of the have objects that the client is asking for. 3. Bitmaps must be enabled (they are enabled by default, but can be disabled by setting `pack.usebitmaps` to false, or by using `--no-use-bitmap-index` on the command-line). If any of these is not true, we fall back to doing a normal walk of the object graph. I haven't read the bitmap creation code yet. But it probably does not matter. If the client requests a shallow fetch, you probably want to fall back to normal walk too. Bitmaps may be made work with shallow fetches too, I'm not sure. We could substract the shallow'd commits out. The problem is if some other commits share parts of the shallow'd commits, I'm not sure how to detect that. -- Duy -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 11/19] pack-objects: use bitmaps when packing objects
From: Vicent Marti tan...@gmail.com In this patch, we use the bitmap API to perform the `Counting Objects` phase in pack-objects, rather than a traditional walk through the object graph. For a reasonably-packed large repo, the time to fetch and clone is often dominated by the full-object revision walk during the Counting Objects phase. Using bitmaps can reduce the CPU time required on the server (and therefore start sending the actual pack data with less delay). For bitmaps to be used, the following must be true: 1. We must be packing to stdout (as a normal `pack-objects` from `upload-pack` would do). 2. There must be a .bitmap index containing at least one of the have objects that the client is asking for. 3. Bitmaps must be enabled (they are enabled by default, but can be disabled by setting `pack.usebitmaps` to false, or by using `--no-use-bitmap-index` on the command-line). If any of these is not true, we fall back to doing a normal walk of the object graph. Here are some sample timings from a full pack of `torvalds/linux` (i.e. something very similar to what would be generated for a clone of the repository) that show the speedup produced by various methods: [existing graph traversal] $ time git pack-objects --all --stdout --no-use-bitmap-index \ /dev/null /dev/null Counting objects: 3237103, done. Compressing objects: 100% (508752/508752), done. Total 3237103 (delta 2699584), reused 3237103 (delta 2699584) real0m44.111s user0m42.396s sys 0m3.544s [bitmaps only, without partial pack reuse; note that pack reuse is automatic, so timing this required a patch to disable it] $ time git pack-objects --all --stdout /dev/null /dev/null Counting objects: 3237103, done. Compressing objects: 100% (508752/508752), done. Total 3237103 (delta 2699584), reused 3237103 (delta 2699584) real0m5.413s user0m5.604s sys 0m1.804s [bitmaps with pack reuse (what you get with this patch)] $ time git pack-objects --all --stdout /dev/null /dev/null Reusing existing pack: 3237103, done. Total 3237103 (delta 0), reused 0 (delta 0) real0m1.636s user0m1.460s sys 0m0.172s Signed-off-by: Vicent Marti tan...@gmail.com Signed-off-by: Jeff King p...@peff.net --- Documentation/config.txt | 6 ++ builtin/pack-objects.c | 169 --- 2 files changed, 150 insertions(+), 25 deletions(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index ab26963..a981369 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -1858,6 +1858,12 @@ pack.packSizeLimit:: Common unit suffixes of 'k', 'm', or 'g' are supported. +pack.useBitmaps:: + When true, git will use pack bitmaps (if available) when packing + to stdout (e.g., during the server side of a fetch). Defaults to + true. You should not generally need to turn this off unless + you are debugging pack bitmaps. + pager.cmd:: If the value is boolean, turns on or off pagination of the output of a particular Git subcommand when writing to a tty. diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index faf746b..d15b9db 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -19,6 +19,7 @@ #include refs.h #include streaming.h #include thread-utils.h +#include pack-bitmap.h static const char *pack_usage[] = { N_(git pack-objects --stdout [options...] [ ref-list | object-list]), @@ -57,12 +58,23 @@ static struct progress *progress_state; static int pack_compression_level = Z_DEFAULT_COMPRESSION; static int pack_compression_seen; +static struct packed_git *reuse_packfile; +static uint32_t reuse_packfile_objects; +static off_t reuse_packfile_offset; + +static int use_bitmap_index = 1; + static unsigned long delta_cache_size = 0; static unsigned long max_delta_cache_size = 256 * 1024 * 1024; static unsigned long cache_max_small_delta_size = 1000; static unsigned long window_memory_limit = 0; +enum { + OBJECT_ENTRY_EXCLUDE = (1 0), + OBJECT_ENTRY_NO_TRY_DELTA = (1 1) +}; + /* * stats */ @@ -678,6 +690,49 @@ static struct object_entry **compute_write_order(void) return wo; } +static off_t write_reused_pack(struct sha1file *f) +{ + uint8_t buffer[8192]; + off_t to_write; + int fd; + + if (!is_pack_valid(reuse_packfile)) + return 0; + + fd = git_open_noatime(reuse_packfile-pack_name); + if (fd 0) + return 0; + + if (lseek(fd, sizeof(struct pack_header), SEEK_SET) == -1) { + close(fd); + return 0; + } + + if (reuse_packfile_offset 0) + reuse_packfile_offset = reuse_packfile-pack_size - 20; + + to_write = reuse_packfile_offset - sizeof(struct pack_header); + + while