Re: [PATCH v2 10/19] pack-bitmap: add support for bitmap indexes

2013-10-30 Thread Jeff King
On Mon, Oct 28, 2013 at 08:22:07AM -0700, Junio C Hamano wrote:

  However, the code in the ewah/ directory has been hacked up a bit
  from its original, and ewah_io.c _does_ include git-compat-util.h.
  So it may make sense to consider our copy a fork and git-ify it
  more.
 
 Yeah, sounds very sensible.

I'll make them more git-ish for the next re-roll, then.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 10/19] pack-bitmap: add support for bitmap indexes

2013-10-30 Thread Jeff King
On Sat, Oct 26, 2013 at 05:14:16PM +0700, Nguyen Thai Ngoc Duy wrote:

 If it's not mentioned yet, maybe you should note that this code
 currently supports only one pack with .bitmap file.

Yes, there's really not any point in having multiple .bitmap files, as
the bitmaps need to be closed over the set of objects. So you would have
one base pack with a bunch of bitmaps. But any pack that builds on
that cannot be bitmapped.

JGit has the same restriction, mostly because it keeps the
implementation much saner, and there is not really a good reason to lift
it (if you want more things bitmapped, you probably also want them
packed to share deltas).

I'm happy to mention that somewhere, but I'm not sure where is most
appropriate.

  diff --git a/khash.h b/khash.h
 [...]
 I notice the line continuations '\' in this file look more aligned if
 tab length is set to 4. No idea how many emacs users out there but it
 probably does not harm to put
 
 /* -*- mode: c; tab-width: 4; -*- */
 
 at the beginning of this file? Another option is realign the file,
 which I doubt is good because this file is imported.

Yes, it doesn't follow our normal whitespace guidelines. I refrained
from tweaking it for the reason you mentioned (kwset.c is in a similar
boat).

I'd use:

  /* vim: set ts=4 */

myself. :) We have resisted having such modelines so far, leaving it
instead to developers to configure their editor as appropriate. It is
harder when there is an oddball file like this, though. I don't have a
strong opinion either way.

 I don't see any mechanism to protect us from corrupt .bitmap files. If
 .bitmap is not very large, maybe just check the trailing checksum in
 the file when we open it? Else maybe add a crc32 or something after
 each commit bitmap in .bitmap v2 and only verify the ones we actually
 use?

The .bitmap for the kernel is about 6MB of bitmaps, plus about 12MB for
the name-hash cache. The loading procedure needs to read through the
whole bitmap section, as the records are variable width and there is no
index. But to verify the sha1 we'd have to go through the extra 12MB of
name-hash cache, which we would otherwise not need to touch.

Adding a crc would not be too big a deal. However, I'd really like to
focus first on getting the v1 reader/writer merged. That's set in stone
due to JGit, so anything we do for v2 would want to build on top anyway.

  +static int open_pack_bitmap(void)
  +{
  +   struct packed_git *p;
  +
  +   assert(!bitmap_git.map  !bitmap_git.loaded);
  +
  +   prepare_packed_git();
  +   for (p = packed_git; p; p = p-next) {
  +   if (open_pack_bitmap_1(p) == 0)
  +   return 0;
 
 It maybe a good idea to go on anyway, checking for another .bitmap.
 Just warn the user about that if found.

Yeah, that sounds reasonable. I'll squash this in:

diff --git a/pack-bitmap.c b/pack-bitmap.c
index 459b587..078f7c6 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -265,6 +265,12 @@ static int open_pack_bitmap_1(struct packed_git *packfile)
return -1;
}
 
+   if (bitmap_git.pack) {
+   warning(ignoring extra bitmap file: %s, idx_name);
+   close(fd);
+   return -1;
+   }
+
bitmap_git.pack = packfile;
bitmap_git.map_size = xsize_t(st.st_size);
bitmap_git.map = xmmap(NULL, bitmap_git.map_size, PROT_READ, 
MAP_PRIVATE, fd, 0);
@@ -325,16 +331,17 @@ char *pack_bitmap_filename(struct packed_git *p)
 static int open_pack_bitmap(void)
 {
struct packed_git *p;
+   int ret = -1;
 
assert(!bitmap_git.map  !bitmap_git.loaded);
 
prepare_packed_git();
for (p = packed_git; p; p = p-next) {
if (open_pack_bitmap_1(p) == 0)
-   return 0;
+   ret = 0;
}
 
-   return -1;
+   return ret;
 }
 
 int prepare_bitmap_git(void)

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 10/19] pack-bitmap: add support for bitmap indexes

2013-10-28 Thread Junio C Hamano
Jeff King p...@peff.net writes:

 On Fri, Oct 25, 2013 at 08:26:29PM -0400, Jeff King wrote:

 On Fri, Oct 25, 2013 at 04:06:19PM -0700, Junio C Hamano wrote:
 
  Jeff King p...@peff.net writes:
  
   diff --git a/pack-bitmap.c b/pack-bitmap.c
   new file mode 100644
   index 000..73c52fd
   --- /dev/null
   +++ b/pack-bitmap.c
   @@ -0,0 +1,965 @@
   +#include stdlib.h
   +
   +#include cache.h
  
  You among all people already know why this is bad, no?
 
 Yes, I am well aware of why we do not want it. I thought I removed that,
 but it appears I missed one. Sorry.

 In addition to that one, there are a few other system header includes:

   - the ewah/*.c files include the necessary standard headers, and do
 not include git-compat-util.h at all. I do not really consider these
 git code, but rather a black box we have imported into the tree
 to ease the dependency chain. So in that sense, they operate like
 xdiff/*.c or compat/regex/*.c, which also compile on
 their own (and can get away with it because they are mostly standard
 C and do not do system things.

Right; I didn't comment on the bare inclusions of system headers in
these files exactly because I shared that reasoning.

 However, the code in the ewah/ directory has been hacked up a bit
 from its original, and ewah_io.c _does_ include git-compat-util.h.
 So it may make sense to consider our copy a fork and git-ify it
 more.

Yeah, sounds very sensible.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 10/19] pack-bitmap: add support for bitmap indexes

2013-10-26 Thread Jeff King
On Fri, Oct 25, 2013 at 08:26:29PM -0400, Jeff King wrote:

 On Fri, Oct 25, 2013 at 04:06:19PM -0700, Junio C Hamano wrote:
 
  Jeff King p...@peff.net writes:
  
   diff --git a/pack-bitmap.c b/pack-bitmap.c
   new file mode 100644
   index 000..73c52fd
   --- /dev/null
   +++ b/pack-bitmap.c
   @@ -0,0 +1,965 @@
   +#include stdlib.h
   +
   +#include cache.h
  
  You among all people already know why this is bad, no?
 
 Yes, I am well aware of why we do not want it. I thought I removed that,
 but it appears I missed one. Sorry.

In addition to that one, there are a few other system header includes:

  - the ewah/*.c files include the necessary standard headers, and do
not include git-compat-util.h at all. I do not really consider these
git code, but rather a black box we have imported into the tree
to ease the dependency chain. So in that sense, they operate like
xdiff/*.c or compat/regex/*.c, which also compile on
their own (and can get away with it because they are mostly standard
C and do not do system things.

However, the code in the ewah/ directory has been hacked up a bit
from its original, and ewah_io.c _does_ include git-compat-util.h.
So it may make sense to consider our copy a fork and git-ify it
more. The upstream Java EWAH library is active and separate, but the
C port done by Vicent, while theoretically a separate project, does
not actually have any users besides git (though presumably libgit2
will build on it at some point).

  - Regardless of the decision above, ewah/ewok.h includes stdlib.h,
which will pollute git source files which use it.

  - Ditto for khash.h.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 10/19] pack-bitmap: add support for bitmap indexes

2013-10-26 Thread Duy Nguyen
If it's not mentioned yet, maybe you should note that this code
currently supports only one pack with .bitmap file.

On Fri, Oct 25, 2013 at 1:03 PM, Jeff King p...@peff.net wrote:
 diff --git a/khash.h b/khash.h
 new file mode 100644
 index 000..0fdf39d
 --- /dev/null
 +++ b/khash.h
 @@ -0,0 +1,342 @@

I notice the line continuations '\' in this file look more aligned if
tab length is set to 4. No idea how many emacs users out there but it
probably does not harm to put

/* -*- mode: c; tab-width: 4; -*- */

at the beginning of this file? Another option is realign the file,
which I doubt is good because this file is imported.

 +static int load_pack_bitmap(void)
 +{
 +   assert(bitmap_git.map  !bitmap_git.loaded);
 +
 +   bitmap_git.bitmaps = kh_init_sha1();
 +   bitmap_git.ext_index.positions = kh_init_sha1_pos();
 +   bitmap_git.reverse_index = revindex_for_pack(bitmap_git.pack);
 +
 +   if (!(bitmap_git.commits = read_bitmap_1(bitmap_git)) ||
 +   !(bitmap_git.trees = read_bitmap_1(bitmap_git)) ||
 +   !(bitmap_git.blobs = read_bitmap_1(bitmap_git)) ||
 +   !(bitmap_git.tags = read_bitmap_1(bitmap_git)))
 +   goto failed;
 +
 +   if (load_bitmap_entries_v1(bitmap_git)  0)
 +   goto failed;

I don't see any mechanism to protect us from corrupt .bitmap files. If
.bitmap is not very large, maybe just check the trailing checksum in
the file when we open it? Else maybe add a crc32 or something after
each commit bitmap in .bitmap v2 and only verify the ones we actually
use?

 +static int open_pack_bitmap(void)
 +{
 +   struct packed_git *p;
 +
 +   assert(!bitmap_git.map  !bitmap_git.loaded);
 +
 +   prepare_packed_git();
 +   for (p = packed_git; p; p = p-next) {
 +   if (open_pack_bitmap_1(p) == 0)
 +   return 0;

It maybe a good idea to go on anyway, checking for another .bitmap.
Just warn the user about that if found.

 +   }
 +
 +   return -1;
 +}
 +
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 10/19] pack-bitmap: add support for bitmap indexes

2013-10-25 Thread Jeff King
From: Vicent Marti tan...@gmail.com

A bitmap index is a `.bitmap` file that can be found inside
`$GIT_DIR/objects/pack/`, next to its corresponding packfile, and
contains precalculated reachability information for selected commits.
The full specification of the format for these bitmap indexes can be found
in `Documentation/technical/bitmap-format.txt`.

For a given commit SHA1, if it happens to be available in the bitmap
index, its bitmap will represent every single object that is reachable
from the commit itself. The nth bit in the bitmap is the nth object in
the packfile; if it's set to 1, the object is reachable.

By using the bitmaps available in the index, this commit implements
several new functions:

- `prepare_bitmap_git`
- `prepare_bitmap_walk`
- `traverse_bitmap_commit_list`
- `reuse_partial_packfile_from_bitmap`

The `prepare_bitmap_walk` function tries to build a bitmap of all the
objects that can be reached from the commit roots of a given `rev_info`
struct by using the following algorithm:

- If all the interesting commits for a revision walk are available in
the index, the resulting reachability bitmap is the bitwise OR of all
the individual bitmaps.

- When the full set of WANTs is not available in the index, we perform a
partial revision walk using the commits that don't have bitmaps as
roots, and limiting the revision walk as soon as we reach a commit that
has a corresponding bitmap. The earlier OR'ed bitmap with all the
indexed commits can now be completed as this walk progresses, so the end
result is the full reachability list.

- For revision walks with a HAVEs set (a set of commits that are deemed
uninteresting), first we perform the same method as for the WANTs, but
using our HAVEs as roots, in order to obtain a full reachability bitmap
of all the uninteresting commits. This bitmap then can be used to:

a) limit the subsequent walk when building the WANTs bitmap
b) finding the final set of interesting commits by performing an
   AND-NOT of the WANTs and the HAVEs.

If `prepare_bitmap_walk` runs successfully, the resulting bitmap is
stored and the equivalent of a `traverse_commit_list` call can be
performed by using `traverse_bitmap_commit_list`; the bitmap version
of this call yields the objects straight from the packfile index
(without having to look them up or parse them) and hence is several
orders of magnitude faster.

As an extra optimization, when `prepare_bitmap_walk` succeeds, the
`reuse_partial_packfile_from_bitmap` call can be attempted: it will find
the amount of objects at the beginning of the on-disk packfile that can
be reused as-is, and return an offset into the packfile. The source
packfile can then be loaded and the bytes up to `offset` can be written
directly to the result without having to consider the entires inside the
packfile individually.

If the `prepare_bitmap_walk` call fails (e.g. because no bitmap files
are available), the `rev_info` struct is left untouched, and can be used
to perform a manual rev-walk using `traverse_commit_list`.

Hence, this new set of functions are a generic API that allows to
perform the equivalent of

git rev-list --objects [roots...] [^uninteresting...]

for any set of commits, even if they don't have specific bitmaps
generated for them.

In further patches, we'll use this bitmap traversal optimization to
speed up the `pack-objects` and `rev-list` commands.

Signed-off-by: Vicent Marti tan...@gmail.com
Signed-off-by: Jeff King p...@peff.net
---
 Makefile  |   2 +
 khash.h   | 342 +
 pack-bitmap.c | 965 ++
 pack-bitmap.h |  46 +++
 4 files changed, 1355 insertions(+)
 create mode 100644 khash.h
 create mode 100644 pack-bitmap.c
 create mode 100644 pack-bitmap.h

diff --git a/Makefile b/Makefile
index eacb940..737766c 100644
--- a/Makefile
+++ b/Makefile
@@ -699,6 +699,7 @@ LIB_H += object.h
 LIB_H += pack-objects.h
 LIB_H += pack-revindex.h
 LIB_H += pack.h
+LIB_H += pack-bitmap.h
 LIB_H += parse-options.h
 LIB_H += patch-ids.h
 LIB_H += pathspec.h
@@ -837,6 +838,7 @@ LIB_OBJS += notes-cache.o
 LIB_OBJS += notes-merge.o
 LIB_OBJS += notes-utils.o
 LIB_OBJS += object.o
+LIB_OBJS += pack-bitmap.o
 LIB_OBJS += pack-check.o
 LIB_OBJS += pack-objects.o
 LIB_OBJS += pack-revindex.o
diff --git a/khash.h b/khash.h
new file mode 100644
index 000..0fdf39d
--- /dev/null
+++ b/khash.h
@@ -0,0 +1,342 @@
+/* The MIT License
+
+   Copyright (c) 2008, 2009, 2011 by Attractive Chaos attrac...@live.co.uk
+
+   Permission is hereby granted, free of charge, to any person obtaining
+   a copy of this software and associated documentation files (the
+   Software), to deal in the Software without restriction, including
+   without limitation the rights to use, copy, modify, merge, publish,
+   distribute, sublicense, and/or sell copies of the Software, and to
+   permit persons to whom the Software is 

Re: [PATCH v2 10/19] pack-bitmap: add support for bitmap indexes

2013-10-25 Thread Junio C Hamano
Jeff King p...@peff.net writes:

 diff --git a/pack-bitmap.c b/pack-bitmap.c
 new file mode 100644
 index 000..73c52fd
 --- /dev/null
 +++ b/pack-bitmap.c
 @@ -0,0 +1,965 @@
 +#include stdlib.h
 +
 +#include cache.h

You among all people already know why this is bad, no?

I'll remove the first two lines while queuing.  Thanks.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html