Re: [PATCH 0/19] pack bitmaps
This is only to tentatively work-around the compilation breakages; the fixes need to be split into the respective patches that introduce breakages when the series is rerolled (the one I sent for pack-bitmap.c separately is also included in this message). Thanks. ewah/ewah_bitmap.c | 22 -- ewah/ewah_io.c | 44 ++-- pack-bitmap-write.c | 2 -- pack-bitmap.c | 13 ++--- 4 files changed, 48 insertions(+), 33 deletions(-) diff --git a/ewah/ewah_bitmap.c b/ewah/ewah_bitmap.c index b74a1eb..7986720 100644 --- a/ewah/ewah_bitmap.c +++ b/ewah/ewah_bitmap.c @@ -65,6 +65,8 @@ static void buffer_push_rlw(struct ewah_bitmap *self, eword_t value) static size_t add_empty_words(struct ewah_bitmap *self, int v, size_t number) { + eword_t runlen; + eword_t can_add; size_t added = 0; if (rlw_get_run_bit(self->rlw) != v && rlw_size(self->rlw) == 0) { @@ -76,8 +78,8 @@ static size_t add_empty_words(struct ewah_bitmap *self, int v, size_t number) added++; } - eword_t runlen = rlw_get_running_len(self->rlw); - eword_t can_add = min_size(number, RLW_LARGEST_RUNNING_COUNT - runlen); + runlen = rlw_get_running_len(self->rlw); + can_add = min_size(number, RLW_LARGEST_RUNNING_COUNT - runlen); rlw_set_running_len(self->rlw, runlen + can_add); number -= can_add; @@ -426,6 +428,8 @@ void ewah_xor( rlwit_init(&rlw_j, ewah_j); while (rlwit_word_size(&rlw_i) > 0 && rlwit_word_size(&rlw_j) > 0) { + size_t literals; + while (rlw_i.rlw.running_len > 0 || rlw_j.rlw.running_len > 0) { struct rlw_iterator *prey, *predator; size_t index; @@ -446,7 +450,7 @@ void ewah_xor( rlwit_discard_first_words(predator, predator->rlw.running_len); } - size_t literals = min_size(rlw_i.rlw.literal_words, rlw_j.rlw.literal_words); + literals = min_size(rlw_i.rlw.literal_words, rlw_j.rlw.literal_words); if (literals) { size_t k; @@ -484,6 +488,8 @@ void ewah_and( rlwit_init(&rlw_j, ewah_j); while (rlwit_word_size(&rlw_i) > 0 && rlwit_word_size(&rlw_j) > 0) { + size_t literals; + while (rlw_i.rlw.running_len > 0 || rlw_j.rlw.running_len > 0) { struct rlw_iterator *prey, *predator; @@ -507,7 +513,7 @@ void ewah_and( } } - size_t literals = min_size(rlw_i.rlw.literal_words, rlw_j.rlw.literal_words); + literals = min_size(rlw_i.rlw.literal_words, rlw_j.rlw.literal_words); if (literals) { size_t k; @@ -545,6 +551,8 @@ void ewah_and_not( rlwit_init(&rlw_j, ewah_j); while (rlwit_word_size(&rlw_i) > 0 && rlwit_word_size(&rlw_j) > 0) { + size_t literals; + while (rlw_i.rlw.running_len > 0 || rlw_j.rlw.running_len > 0) { struct rlw_iterator *prey, *predator; @@ -572,7 +580,7 @@ void ewah_and_not( } } - size_t literals = min_size(rlw_i.rlw.literal_words, rlw_j.rlw.literal_words); + literals = min_size(rlw_i.rlw.literal_words, rlw_j.rlw.literal_words); if (literals) { size_t k; @@ -610,6 +618,8 @@ void ewah_or( rlwit_init(&rlw_j, ewah_j); while (rlwit_word_size(&rlw_i) > 0 && rlwit_word_size(&rlw_j) > 0) { + size_t literals; + while (rlw_i.rlw.running_len > 0 || rlw_j.rlw.running_len > 0) { struct rlw_iterator *prey, *predator; @@ -634,7 +644,7 @@ void ewah_or( } } - size_t literals = min_size(rlw_i.rlw.literal_words, rlw_j.rlw.literal_words); + literals = min_size(rlw_i.rlw.literal_words, rlw_j.rlw.literal_words); if (literals) { size_t k; diff --git a/ewah/ewah_io.c b/ewah/ewah_io.c index db6c062..05c51d9 100644 --- a/ewah/ewah_io.c +++ b/ewah/ewah_io.c @@ -58,19 +58,26 @@ int ewah_serialize_to(struct ewah_bitmap *self, eword_t dump[2048]; const size_t words_per_dump = sizeof(dump) / sizeof(eword_t); - /* 32 bit -- bit size fr the map */ - uint32_t bitsize = htonl((uint32_t)self->bit_size); + /* 32 bit -- bit size for the map */ + uint32_t bitsize; + /* 32 bit -- number of compressed 64-bit words */ + uint32_t word_count; + /* 64 bit x N -- compressed words */ + const eword_t *buffer = self->buffer; + size_t words_left; + + /* 32 bit -- position for the RLW */ + uint32_t rlw_pos; + + bitsize = htonl((uint32_t)self->bit_size); if
Re: [PATCH 0/19] pack bitmaps
Jeff King writes: > A similar series has been running on github.com for the past couple of > months, though not every repository has had bitmaps turned on (but some > very busy ones have). We've hopefully squeezed out all of the bugs and > corner cases over that time. However, I did rebase this on a more modern > version of "master"; among other conflicts, this required porting the > git-repack changes from shell to C. So it's entirely possible I've > introduced new bugs. :) > > The idea and original implementation for bitmaps comes from Shawn and > Colby, of course. The hard work in this series was done by Vicent Marti, > and he is credited as the author in most of the patches. I've added some > window dressing and helped a little with debugging and review. But along > with Vicent, I should be able to help with answering questions for > review, and as time goes on, I'm familiar enough with the code to deal > with bugs and reviewing future changes. Woo-hoo. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/19] pack bitmaps
This series implements JGit-style pack bitmaps to speed up fetching and cloning. For example, here is a simulation of the server side of a clone of a fully-packed kernel repo (measuring actual clones is harder, because the client does a lot of work on resolving deltas): [before] $ time git pack-objects --all --stdout /dev/null Counting objects: 3237103, done. Compressing objects: 100% (508752/508752), done. Total 3237103 (delta 2699584), reused 3237103 (delta 2699584) real0m44.111s user0m42.396s sys 0m3.544s [after] $ time git pack-objects --all --stdout /dev/null Reusing existing pack: 3237103, done. Total 3237103 (delta 0), reused 0 (delta 0) real0m1.636s user0m1.460s sys 0m0.172s This helps eliminate load on the server side, but it also means that we actually start transferring objects way faster, which means the clones finish faster. If you look at current clones of torvalds/linux from kernel.org, it's almost two minutes before they actually start sending you any data, during which time the client is twiddling its thumbs. The bitmaps implemented here are compatible with those produced by JGit. We can read JGit-produced bitmaps, and JGit can read ours. The one exception is the final patch, which adds an optional name-hash cache. It's added in such a way that existing implementations can ignore it, and is marked with a flag in the header. However, JGit is very picky about the "flags" field; it will reject any bitmap index with a flag it does not know about. The patches are: [01/19]: sha1write: make buffer const-correct [02/19]: revindex: Export new APIs [03/19]: pack-objects: Refactor the packing list [04/19]: pack-objects: factor out name_hash [05/19]: revision: allow setting custom limiter function [06/19]: sha1_file: export `git_open_noatime` [07/19]: compat: add endianness helpers [08/19]: ewah: compressed bitmap implementation Refactoring and support for the rest of the series. [09/19]: documentation: add documentation for the bitmap format [10/19]: pack-bitmap: add support for bitmap indexes [11/19]: pack-objects: use bitmaps when packing objects [12/19]: rev-list: add bitmap mode to speed up object lists Bitmap reading (you can test it against JGit at this point by running "jgit debug-gc", and then cloning or running rev-list). [13/19]: pack-objects: implement bitmap writing [14/19]: repack: stop using magic number for ARRAY_SIZE(exts) [15/19]: repack: turn exts array into array-of-struct [16/19]: repack: handle optional files created by pack-objects [17/19]: repack: consider bitmaps when performing repacks Bitmap writing (you can test against JGit by running "git repack -adb", and then running "jgit daemon" to serve the result). [18/19]: t: add basic bitmap functionality tests With reading and writing, we can do our own tests. [19/19]: pack-bitmap: implement optional name_hash cache And this is our extension. A similar series has been running on github.com for the past couple of months, though not every repository has had bitmaps turned on (but some very busy ones have). We've hopefully squeezed out all of the bugs and corner cases over that time. However, I did rebase this on a more modern version of "master"; among other conflicts, this required porting the git-repack changes from shell to C. So it's entirely possible I've introduced new bugs. :) The idea and original implementation for bitmaps comes from Shawn and Colby, of course. The hard work in this series was done by Vicent Marti, and he is credited as the author in most of the patches. I've added some window dressing and helped a little with debugging and review. But along with Vicent, I should be able to help with answering questions for review, and as time goes on, I'm familiar enough with the code to deal with bugs and reviewing future changes. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html