Re: [PATCH v2 10/19] pack-bitmap: add support for bitmap indexes
On Mon, Oct 28, 2013 at 08:22:07AM -0700, Junio C Hamano wrote: However, the code in the ewah/ directory has been hacked up a bit from its original, and ewah_io.c _does_ include git-compat-util.h. So it may make sense to consider our copy a fork and git-ify it more. Yeah, sounds very sensible. I'll make them more git-ish for the next re-roll, then. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 10/19] pack-bitmap: add support for bitmap indexes
On Sat, Oct 26, 2013 at 05:14:16PM +0700, Nguyen Thai Ngoc Duy wrote: If it's not mentioned yet, maybe you should note that this code currently supports only one pack with .bitmap file. Yes, there's really not any point in having multiple .bitmap files, as the bitmaps need to be closed over the set of objects. So you would have one base pack with a bunch of bitmaps. But any pack that builds on that cannot be bitmapped. JGit has the same restriction, mostly because it keeps the implementation much saner, and there is not really a good reason to lift it (if you want more things bitmapped, you probably also want them packed to share deltas). I'm happy to mention that somewhere, but I'm not sure where is most appropriate. diff --git a/khash.h b/khash.h [...] I notice the line continuations '\' in this file look more aligned if tab length is set to 4. No idea how many emacs users out there but it probably does not harm to put /* -*- mode: c; tab-width: 4; -*- */ at the beginning of this file? Another option is realign the file, which I doubt is good because this file is imported. Yes, it doesn't follow our normal whitespace guidelines. I refrained from tweaking it for the reason you mentioned (kwset.c is in a similar boat). I'd use: /* vim: set ts=4 */ myself. :) We have resisted having such modelines so far, leaving it instead to developers to configure their editor as appropriate. It is harder when there is an oddball file like this, though. I don't have a strong opinion either way. I don't see any mechanism to protect us from corrupt .bitmap files. If .bitmap is not very large, maybe just check the trailing checksum in the file when we open it? Else maybe add a crc32 or something after each commit bitmap in .bitmap v2 and only verify the ones we actually use? The .bitmap for the kernel is about 6MB of bitmaps, plus about 12MB for the name-hash cache. The loading procedure needs to read through the whole bitmap section, as the records are variable width and there is no index. But to verify the sha1 we'd have to go through the extra 12MB of name-hash cache, which we would otherwise not need to touch. Adding a crc would not be too big a deal. However, I'd really like to focus first on getting the v1 reader/writer merged. That's set in stone due to JGit, so anything we do for v2 would want to build on top anyway. +static int open_pack_bitmap(void) +{ + struct packed_git *p; + + assert(!bitmap_git.map !bitmap_git.loaded); + + prepare_packed_git(); + for (p = packed_git; p; p = p-next) { + if (open_pack_bitmap_1(p) == 0) + return 0; It maybe a good idea to go on anyway, checking for another .bitmap. Just warn the user about that if found. Yeah, that sounds reasonable. I'll squash this in: diff --git a/pack-bitmap.c b/pack-bitmap.c index 459b587..078f7c6 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -265,6 +265,12 @@ static int open_pack_bitmap_1(struct packed_git *packfile) return -1; } + if (bitmap_git.pack) { + warning(ignoring extra bitmap file: %s, idx_name); + close(fd); + return -1; + } + bitmap_git.pack = packfile; bitmap_git.map_size = xsize_t(st.st_size); bitmap_git.map = xmmap(NULL, bitmap_git.map_size, PROT_READ, MAP_PRIVATE, fd, 0); @@ -325,16 +331,17 @@ char *pack_bitmap_filename(struct packed_git *p) static int open_pack_bitmap(void) { struct packed_git *p; + int ret = -1; assert(!bitmap_git.map !bitmap_git.loaded); prepare_packed_git(); for (p = packed_git; p; p = p-next) { if (open_pack_bitmap_1(p) == 0) - return 0; + ret = 0; } - return -1; + return ret; } int prepare_bitmap_git(void) -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 10/19] pack-bitmap: add support for bitmap indexes
Jeff King p...@peff.net writes: On Fri, Oct 25, 2013 at 08:26:29PM -0400, Jeff King wrote: On Fri, Oct 25, 2013 at 04:06:19PM -0700, Junio C Hamano wrote: Jeff King p...@peff.net writes: diff --git a/pack-bitmap.c b/pack-bitmap.c new file mode 100644 index 000..73c52fd --- /dev/null +++ b/pack-bitmap.c @@ -0,0 +1,965 @@ +#include stdlib.h + +#include cache.h You among all people already know why this is bad, no? Yes, I am well aware of why we do not want it. I thought I removed that, but it appears I missed one. Sorry. In addition to that one, there are a few other system header includes: - the ewah/*.c files include the necessary standard headers, and do not include git-compat-util.h at all. I do not really consider these git code, but rather a black box we have imported into the tree to ease the dependency chain. So in that sense, they operate like xdiff/*.c or compat/regex/*.c, which also compile on their own (and can get away with it because they are mostly standard C and do not do system things. Right; I didn't comment on the bare inclusions of system headers in these files exactly because I shared that reasoning. However, the code in the ewah/ directory has been hacked up a bit from its original, and ewah_io.c _does_ include git-compat-util.h. So it may make sense to consider our copy a fork and git-ify it more. Yeah, sounds very sensible. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 10/19] pack-bitmap: add support for bitmap indexes
On Fri, Oct 25, 2013 at 08:26:29PM -0400, Jeff King wrote: On Fri, Oct 25, 2013 at 04:06:19PM -0700, Junio C Hamano wrote: Jeff King p...@peff.net writes: diff --git a/pack-bitmap.c b/pack-bitmap.c new file mode 100644 index 000..73c52fd --- /dev/null +++ b/pack-bitmap.c @@ -0,0 +1,965 @@ +#include stdlib.h + +#include cache.h You among all people already know why this is bad, no? Yes, I am well aware of why we do not want it. I thought I removed that, but it appears I missed one. Sorry. In addition to that one, there are a few other system header includes: - the ewah/*.c files include the necessary standard headers, and do not include git-compat-util.h at all. I do not really consider these git code, but rather a black box we have imported into the tree to ease the dependency chain. So in that sense, they operate like xdiff/*.c or compat/regex/*.c, which also compile on their own (and can get away with it because they are mostly standard C and do not do system things. However, the code in the ewah/ directory has been hacked up a bit from its original, and ewah_io.c _does_ include git-compat-util.h. So it may make sense to consider our copy a fork and git-ify it more. The upstream Java EWAH library is active and separate, but the C port done by Vicent, while theoretically a separate project, does not actually have any users besides git (though presumably libgit2 will build on it at some point). - Regardless of the decision above, ewah/ewok.h includes stdlib.h, which will pollute git source files which use it. - Ditto for khash.h. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 10/19] pack-bitmap: add support for bitmap indexes
If it's not mentioned yet, maybe you should note that this code currently supports only one pack with .bitmap file. On Fri, Oct 25, 2013 at 1:03 PM, Jeff King p...@peff.net wrote: diff --git a/khash.h b/khash.h new file mode 100644 index 000..0fdf39d --- /dev/null +++ b/khash.h @@ -0,0 +1,342 @@ I notice the line continuations '\' in this file look more aligned if tab length is set to 4. No idea how many emacs users out there but it probably does not harm to put /* -*- mode: c; tab-width: 4; -*- */ at the beginning of this file? Another option is realign the file, which I doubt is good because this file is imported. +static int load_pack_bitmap(void) +{ + assert(bitmap_git.map !bitmap_git.loaded); + + bitmap_git.bitmaps = kh_init_sha1(); + bitmap_git.ext_index.positions = kh_init_sha1_pos(); + bitmap_git.reverse_index = revindex_for_pack(bitmap_git.pack); + + if (!(bitmap_git.commits = read_bitmap_1(bitmap_git)) || + !(bitmap_git.trees = read_bitmap_1(bitmap_git)) || + !(bitmap_git.blobs = read_bitmap_1(bitmap_git)) || + !(bitmap_git.tags = read_bitmap_1(bitmap_git))) + goto failed; + + if (load_bitmap_entries_v1(bitmap_git) 0) + goto failed; I don't see any mechanism to protect us from corrupt .bitmap files. If .bitmap is not very large, maybe just check the trailing checksum in the file when we open it? Else maybe add a crc32 or something after each commit bitmap in .bitmap v2 and only verify the ones we actually use? +static int open_pack_bitmap(void) +{ + struct packed_git *p; + + assert(!bitmap_git.map !bitmap_git.loaded); + + prepare_packed_git(); + for (p = packed_git; p; p = p-next) { + if (open_pack_bitmap_1(p) == 0) + return 0; It maybe a good idea to go on anyway, checking for another .bitmap. Just warn the user about that if found. + } + + return -1; +} + -- Duy -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 10/19] pack-bitmap: add support for bitmap indexes
From: Vicent Marti tan...@gmail.com A bitmap index is a `.bitmap` file that can be found inside `$GIT_DIR/objects/pack/`, next to its corresponding packfile, and contains precalculated reachability information for selected commits. The full specification of the format for these bitmap indexes can be found in `Documentation/technical/bitmap-format.txt`. For a given commit SHA1, if it happens to be available in the bitmap index, its bitmap will represent every single object that is reachable from the commit itself. The nth bit in the bitmap is the nth object in the packfile; if it's set to 1, the object is reachable. By using the bitmaps available in the index, this commit implements several new functions: - `prepare_bitmap_git` - `prepare_bitmap_walk` - `traverse_bitmap_commit_list` - `reuse_partial_packfile_from_bitmap` The `prepare_bitmap_walk` function tries to build a bitmap of all the objects that can be reached from the commit roots of a given `rev_info` struct by using the following algorithm: - If all the interesting commits for a revision walk are available in the index, the resulting reachability bitmap is the bitwise OR of all the individual bitmaps. - When the full set of WANTs is not available in the index, we perform a partial revision walk using the commits that don't have bitmaps as roots, and limiting the revision walk as soon as we reach a commit that has a corresponding bitmap. The earlier OR'ed bitmap with all the indexed commits can now be completed as this walk progresses, so the end result is the full reachability list. - For revision walks with a HAVEs set (a set of commits that are deemed uninteresting), first we perform the same method as for the WANTs, but using our HAVEs as roots, in order to obtain a full reachability bitmap of all the uninteresting commits. This bitmap then can be used to: a) limit the subsequent walk when building the WANTs bitmap b) finding the final set of interesting commits by performing an AND-NOT of the WANTs and the HAVEs. If `prepare_bitmap_walk` runs successfully, the resulting bitmap is stored and the equivalent of a `traverse_commit_list` call can be performed by using `traverse_bitmap_commit_list`; the bitmap version of this call yields the objects straight from the packfile index (without having to look them up or parse them) and hence is several orders of magnitude faster. As an extra optimization, when `prepare_bitmap_walk` succeeds, the `reuse_partial_packfile_from_bitmap` call can be attempted: it will find the amount of objects at the beginning of the on-disk packfile that can be reused as-is, and return an offset into the packfile. The source packfile can then be loaded and the bytes up to `offset` can be written directly to the result without having to consider the entires inside the packfile individually. If the `prepare_bitmap_walk` call fails (e.g. because no bitmap files are available), the `rev_info` struct is left untouched, and can be used to perform a manual rev-walk using `traverse_commit_list`. Hence, this new set of functions are a generic API that allows to perform the equivalent of git rev-list --objects [roots...] [^uninteresting...] for any set of commits, even if they don't have specific bitmaps generated for them. In further patches, we'll use this bitmap traversal optimization to speed up the `pack-objects` and `rev-list` commands. Signed-off-by: Vicent Marti tan...@gmail.com Signed-off-by: Jeff King p...@peff.net --- Makefile | 2 + khash.h | 342 + pack-bitmap.c | 965 ++ pack-bitmap.h | 46 +++ 4 files changed, 1355 insertions(+) create mode 100644 khash.h create mode 100644 pack-bitmap.c create mode 100644 pack-bitmap.h diff --git a/Makefile b/Makefile index eacb940..737766c 100644 --- a/Makefile +++ b/Makefile @@ -699,6 +699,7 @@ LIB_H += object.h LIB_H += pack-objects.h LIB_H += pack-revindex.h LIB_H += pack.h +LIB_H += pack-bitmap.h LIB_H += parse-options.h LIB_H += patch-ids.h LIB_H += pathspec.h @@ -837,6 +838,7 @@ LIB_OBJS += notes-cache.o LIB_OBJS += notes-merge.o LIB_OBJS += notes-utils.o LIB_OBJS += object.o +LIB_OBJS += pack-bitmap.o LIB_OBJS += pack-check.o LIB_OBJS += pack-objects.o LIB_OBJS += pack-revindex.o diff --git a/khash.h b/khash.h new file mode 100644 index 000..0fdf39d --- /dev/null +++ b/khash.h @@ -0,0 +1,342 @@ +/* The MIT License + + Copyright (c) 2008, 2009, 2011 by Attractive Chaos attrac...@live.co.uk + + Permission is hereby granted, free of charge, to any person obtaining + a copy of this software and associated documentation files (the + Software), to deal in the Software without restriction, including + without limitation the rights to use, copy, modify, merge, publish, + distribute, sublicense, and/or sell copies of the Software, and to + permit persons to whom the Software is
Re: [PATCH v2 10/19] pack-bitmap: add support for bitmap indexes
Jeff King p...@peff.net writes: diff --git a/pack-bitmap.c b/pack-bitmap.c new file mode 100644 index 000..73c52fd --- /dev/null +++ b/pack-bitmap.c @@ -0,0 +1,965 @@ +#include stdlib.h + +#include cache.h You among all people already know why this is bad, no? I'll remove the first two lines while queuing. Thanks. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html