Re: [PATCH v2 11/19] pack-objects: use bitmaps when packing objects

2013-10-31 Thread Duy Nguyen
On Thu, Oct 31, 2013 at 3:07 AM, Jeff King p...@peff.net wrote:
 I think there are two cases that we need to consider:

   1. We have a full repo and somebody requests a shallow clone for us.
  We probably do not want to use bitmaps here.  In the series we have
  been testing, shallow clones turned off bitmaps because we do not
  use the internal rev_list. But as of cdab485 (upload-pack: delegate
  rev walking in shallow fetch to pack-objects), that distinction
  doesn't hold. I think we can check the use of --shallow-file
  instead of explicitly turning off bitmaps there.

There's an (non-existing yet) case 1': somebody requests a clone and
the source clone is already shallow. is_repository_shallow() could
catch both cases.

   2. We have a shallow clone that wants to repack. We probably want to
  turn off bitmap writing here. I don't think that grafts actually
  matter here, because pack-objects should always be looking at the
  true graph. It would mean that using git rev-list
  --use-bitmap-index does not respect the grafts, and we should
  probably disable it in that case (and ditto for replacements).

Right. I forgot that the repo must be complete before it's grafted.
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 11/19] pack-objects: use bitmaps when packing objects

2013-10-30 Thread Jeff King
On Sat, Oct 26, 2013 at 05:25:14PM +0700, Nguyen Thai Ngoc Duy wrote:

  For bitmaps to be used, the following must be true:
 
1. We must be packing to stdout (as a normal `pack-objects` from
   `upload-pack` would do).
 
2. There must be a .bitmap index containing at least one of the
   have objects that the client is asking for.
 
3. Bitmaps must be enabled (they are enabled by default, but can be
   disabled by setting `pack.usebitmaps` to false, or by using
   `--no-use-bitmap-index` on the command-line).
 
  If any of these is not true, we fall back to doing a normal walk of the
  object graph.
 
 I haven't read the bitmap creation code yet. But it probably does not
 matter. If the client requests a shallow fetch, you probably want to
 fall back to normal walk too.

One other criterion I should have mentioned: we must be using the
internal rev-list. That prevented us in v1.8.4.1 and earlier from using
bitmaps for shallow fetches. But as of v1.8.4.2, we always use
pack-objects' rev-walker. We may need to pass --no-use-bitmap-index for
shallow fetches.

As for repos that are themselves shallow, I do not know how doing a
repack -b would fare. Probably not well.

 Bitmaps may be made work with shallow fetches too, I'm not sure. We
 could substract the shallow'd commits out. The problem is if some
 other commits share parts of the shallow'd commits, I'm not sure how
 to detect that.

Yeah, I do not think shallow follows the same have/want reachability
rules that we rely on for taking the set difference via bitmaps. It may
be made to work eventually, but I think the important thing at this
point is making sure we properly fall back to the slow method when
shallow.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 11/19] pack-objects: use bitmaps when packing objects

2013-10-30 Thread Duy Nguyen
On Wed, Oct 30, 2013 at 2:36 PM, Jeff King p...@peff.net wrote:
 On Sat, Oct 26, 2013 at 05:25:14PM +0700, Nguyen Thai Ngoc Duy wrote:

  For bitmaps to be used, the following must be true:
 
1. We must be packing to stdout (as a normal `pack-objects` from
   `upload-pack` would do).
 
2. There must be a .bitmap index containing at least one of the
   have objects that the client is asking for.
 
3. Bitmaps must be enabled (they are enabled by default, but can be
   disabled by setting `pack.usebitmaps` to false, or by using
   `--no-use-bitmap-index` on the command-line).
 
  If any of these is not true, we fall back to doing a normal walk of the
  object graph.

 I haven't read the bitmap creation code yet. But it probably does not
 matter. If the client requests a shallow fetch, you probably want to
 fall back to normal walk too.

 One other criterion I should have mentioned: we must be using the
 internal rev-list. That prevented us in v1.8.4.1 and earlier from using
 bitmaps for shallow fetches. But as of v1.8.4.2, we always use
 pack-objects' rev-walker. We may need to pass --no-use-bitmap-index for
 shallow fetches.

I don't think a new option is needed. The code just needs to check if
there are any commit grafts. If there are, fall back to the old way.
That covers both shallow fetches and some rare grafted repos. I think
refs/replace/* does not impact rev walking, so we should be fine if
it's used.

 As for repos that are themselves shallow, I do not know how doing a
 repack -b would fare. Probably not well.
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 11/19] pack-objects: use bitmaps when packing objects

2013-10-30 Thread Jeff King
On Wed, Oct 30, 2013 at 05:28:08PM +0700, Nguyen Thai Ngoc Duy wrote:

  One other criterion I should have mentioned: we must be using the
  internal rev-list. That prevented us in v1.8.4.1 and earlier from using
  bitmaps for shallow fetches. But as of v1.8.4.2, we always use
  pack-objects' rev-walker. We may need to pass --no-use-bitmap-index for
  shallow fetches.
 
 I don't think a new option is needed. The code just needs to check if
 there are any commit grafts. If there are, fall back to the old way.
 That covers both shallow fetches and some rare grafted repos. I think
 refs/replace/* does not impact rev walking, so we should be fine if
 it's used.

I think there are two cases that we need to consider:

  1. We have a full repo and somebody requests a shallow clone for us.
 We probably do not want to use bitmaps here.  In the series we have
 been testing, shallow clones turned off bitmaps because we do not
 use the internal rev_list. But as of cdab485 (upload-pack: delegate
 rev walking in shallow fetch to pack-objects), that distinction
 doesn't hold. I think we can check the use of --shallow-file
 instead of explicitly turning off bitmaps there.

  2. We have a shallow clone that wants to repack. We probably want to
 turn off bitmap writing here. I don't think that grafts actually
 matter here, because pack-objects should always be looking at the
 true graph. It would mean that using git rev-list
 --use-bitmap-index does not respect the grafts, and we should
 probably disable it in that case (and ditto for replacements).

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 11/19] pack-objects: use bitmaps when packing objects

2013-10-26 Thread Duy Nguyen
On Fri, Oct 25, 2013 at 1:03 PM, Jeff King p...@peff.net wrote:
 From: Vicent Marti tan...@gmail.com

 In this patch, we use the bitmap API to perform the `Counting Objects`
 phase in pack-objects, rather than a traditional walk through the object
 graph. For a reasonably-packed large repo, the time to fetch and clone
 is often dominated by the full-object revision walk during the Counting
 Objects phase. Using bitmaps can reduce the CPU time required on the
 server (and therefore start sending the actual pack data with less
 delay).

 For bitmaps to be used, the following must be true:

   1. We must be packing to stdout (as a normal `pack-objects` from
  `upload-pack` would do).

   2. There must be a .bitmap index containing at least one of the
  have objects that the client is asking for.

   3. Bitmaps must be enabled (they are enabled by default, but can be
  disabled by setting `pack.usebitmaps` to false, or by using
  `--no-use-bitmap-index` on the command-line).

 If any of these is not true, we fall back to doing a normal walk of the
 object graph.

I haven't read the bitmap creation code yet. But it probably does not
matter. If the client requests a shallow fetch, you probably want to
fall back to normal walk too.

Bitmaps may be made work with shallow fetches too, I'm not sure. We
could substract the shallow'd commits out. The problem is if some
other commits share parts of the shallow'd commits, I'm not sure how
to detect that.
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 11/19] pack-objects: use bitmaps when packing objects

2013-10-25 Thread Jeff King
From: Vicent Marti tan...@gmail.com

In this patch, we use the bitmap API to perform the `Counting Objects`
phase in pack-objects, rather than a traditional walk through the object
graph. For a reasonably-packed large repo, the time to fetch and clone
is often dominated by the full-object revision walk during the Counting
Objects phase. Using bitmaps can reduce the CPU time required on the
server (and therefore start sending the actual pack data with less
delay).

For bitmaps to be used, the following must be true:

  1. We must be packing to stdout (as a normal `pack-objects` from
 `upload-pack` would do).

  2. There must be a .bitmap index containing at least one of the
 have objects that the client is asking for.

  3. Bitmaps must be enabled (they are enabled by default, but can be
 disabled by setting `pack.usebitmaps` to false, or by using
 `--no-use-bitmap-index` on the command-line).

If any of these is not true, we fall back to doing a normal walk of the
object graph.

Here are some sample timings from a full pack of `torvalds/linux` (i.e.
something very similar to what would be generated for a clone of the
repository) that show the speedup produced by various
methods:

[existing graph traversal]
$ time git pack-objects --all --stdout --no-use-bitmap-index \
/dev/null /dev/null
Counting objects: 3237103, done.
Compressing objects: 100% (508752/508752), done.
Total 3237103 (delta 2699584), reused 3237103 (delta 2699584)

real0m44.111s
user0m42.396s
sys 0m3.544s

[bitmaps only, without partial pack reuse; note that
 pack reuse is automatic, so timing this required a
 patch to disable it]
$ time git pack-objects --all --stdout /dev/null /dev/null
Counting objects: 3237103, done.
Compressing objects: 100% (508752/508752), done.
Total 3237103 (delta 2699584), reused 3237103 (delta 2699584)

real0m5.413s
user0m5.604s
sys 0m1.804s

[bitmaps with pack reuse (what you get with this patch)]
$ time git pack-objects --all --stdout /dev/null /dev/null
Reusing existing pack: 3237103, done.
Total 3237103 (delta 0), reused 0 (delta 0)

real0m1.636s
user0m1.460s
sys 0m0.172s

Signed-off-by: Vicent Marti tan...@gmail.com
Signed-off-by: Jeff King p...@peff.net
---
 Documentation/config.txt |   6 ++
 builtin/pack-objects.c   | 169 ---
 2 files changed, 150 insertions(+), 25 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index ab26963..a981369 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1858,6 +1858,12 @@ pack.packSizeLimit::
Common unit suffixes of 'k', 'm', or 'g' are
supported.
 
+pack.useBitmaps::
+   When true, git will use pack bitmaps (if available) when packing
+   to stdout (e.g., during the server side of a fetch). Defaults to
+   true. You should not generally need to turn this off unless
+   you are debugging pack bitmaps.
+
 pager.cmd::
If the value is boolean, turns on or off pagination of the
output of a particular Git subcommand when writing to a tty.
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index faf746b..d15b9db 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -19,6 +19,7 @@
 #include refs.h
 #include streaming.h
 #include thread-utils.h
+#include pack-bitmap.h
 
 static const char *pack_usage[] = {
N_(git pack-objects --stdout [options...] [ ref-list |  
object-list]),
@@ -57,12 +58,23 @@ static struct progress *progress_state;
 static int pack_compression_level = Z_DEFAULT_COMPRESSION;
 static int pack_compression_seen;
 
+static struct packed_git *reuse_packfile;
+static uint32_t reuse_packfile_objects;
+static off_t reuse_packfile_offset;
+
+static int use_bitmap_index = 1;
+
 static unsigned long delta_cache_size = 0;
 static unsigned long max_delta_cache_size = 256 * 1024 * 1024;
 static unsigned long cache_max_small_delta_size = 1000;
 
 static unsigned long window_memory_limit = 0;
 
+enum {
+   OBJECT_ENTRY_EXCLUDE = (1  0),
+   OBJECT_ENTRY_NO_TRY_DELTA = (1  1)
+};
+
 /*
  * stats
  */
@@ -678,6 +690,49 @@ static struct object_entry **compute_write_order(void)
return wo;
 }
 
+static off_t write_reused_pack(struct sha1file *f)
+{
+   uint8_t buffer[8192];
+   off_t to_write;
+   int fd;
+
+   if (!is_pack_valid(reuse_packfile))
+   return 0;
+
+   fd = git_open_noatime(reuse_packfile-pack_name);
+   if (fd  0)
+   return 0;
+
+   if (lseek(fd, sizeof(struct pack_header), SEEK_SET) == -1) {
+   close(fd);
+   return 0;
+   }
+
+   if (reuse_packfile_offset  0)
+   reuse_packfile_offset = reuse_packfile-pack_size - 20;
+
+   to_write = reuse_packfile_offset - sizeof(struct pack_header);
+
+   while