Re: [PATCH 0/19] pack bitmaps

2013-10-24 Thread Junio C Hamano
This is only to tentatively work-around the compilation breakages;
the fixes need to be split into the respective patches that
introduce breakages when the series is rerolled (the one I sent for
pack-bitmap.c separately is also included in this message).

Thanks.

 ewah/ewah_bitmap.c  | 22 --
 ewah/ewah_io.c  | 44 ++--
 pack-bitmap-write.c |  2 --
 pack-bitmap.c   | 13 ++---
 4 files changed, 48 insertions(+), 33 deletions(-)

diff --git a/ewah/ewah_bitmap.c b/ewah/ewah_bitmap.c
index b74a1eb..7986720 100644
--- a/ewah/ewah_bitmap.c
+++ b/ewah/ewah_bitmap.c
@@ -65,6 +65,8 @@ static void buffer_push_rlw(struct ewah_bitmap *self, eword_t 
value)
 
 static size_t add_empty_words(struct ewah_bitmap *self, int v, size_t number)
 {
+   eword_t runlen;
+   eword_t can_add;
size_t added = 0;
 
if (rlw_get_run_bit(self->rlw) != v && rlw_size(self->rlw) == 0) {
@@ -76,8 +78,8 @@ static size_t add_empty_words(struct ewah_bitmap *self, int 
v, size_t number)
added++;
}
 
-   eword_t runlen = rlw_get_running_len(self->rlw);
-   eword_t can_add = min_size(number, RLW_LARGEST_RUNNING_COUNT - runlen);
+   runlen = rlw_get_running_len(self->rlw);
+   can_add = min_size(number, RLW_LARGEST_RUNNING_COUNT - runlen);
 
rlw_set_running_len(self->rlw, runlen + can_add);
number -= can_add;
@@ -426,6 +428,8 @@ void ewah_xor(
rlwit_init(&rlw_j, ewah_j);
 
while (rlwit_word_size(&rlw_i) > 0 && rlwit_word_size(&rlw_j) > 0) {
+   size_t literals;
+
while (rlw_i.rlw.running_len > 0 || rlw_j.rlw.running_len > 0) {
struct rlw_iterator *prey, *predator;
size_t index;
@@ -446,7 +450,7 @@ void ewah_xor(
rlwit_discard_first_words(predator, 
predator->rlw.running_len);
}
 
-   size_t literals = min_size(rlw_i.rlw.literal_words, 
rlw_j.rlw.literal_words);
+   literals = min_size(rlw_i.rlw.literal_words, 
rlw_j.rlw.literal_words);
 
if (literals) {
size_t k;
@@ -484,6 +488,8 @@ void ewah_and(
rlwit_init(&rlw_j, ewah_j);
 
while (rlwit_word_size(&rlw_i) > 0 && rlwit_word_size(&rlw_j) > 0) {
+   size_t literals;
+
while (rlw_i.rlw.running_len > 0 || rlw_j.rlw.running_len > 0) {
struct rlw_iterator *prey, *predator;
 
@@ -507,7 +513,7 @@ void ewah_and(
}
}
 
-   size_t literals = min_size(rlw_i.rlw.literal_words, 
rlw_j.rlw.literal_words);
+   literals = min_size(rlw_i.rlw.literal_words, 
rlw_j.rlw.literal_words);
 
if (literals) {
size_t k;
@@ -545,6 +551,8 @@ void ewah_and_not(
rlwit_init(&rlw_j, ewah_j);
 
while (rlwit_word_size(&rlw_i) > 0 && rlwit_word_size(&rlw_j) > 0) {
+   size_t literals;
+
while (rlw_i.rlw.running_len > 0 || rlw_j.rlw.running_len > 0) {
struct rlw_iterator *prey, *predator;
 
@@ -572,7 +580,7 @@ void ewah_and_not(
}
}
 
-   size_t literals = min_size(rlw_i.rlw.literal_words, 
rlw_j.rlw.literal_words);
+   literals = min_size(rlw_i.rlw.literal_words, 
rlw_j.rlw.literal_words);
 
if (literals) {
size_t k;
@@ -610,6 +618,8 @@ void ewah_or(
rlwit_init(&rlw_j, ewah_j);
 
while (rlwit_word_size(&rlw_i) > 0 && rlwit_word_size(&rlw_j) > 0) {
+   size_t literals;
+
while (rlw_i.rlw.running_len > 0 || rlw_j.rlw.running_len > 0) {
struct rlw_iterator *prey, *predator;
 
@@ -634,7 +644,7 @@ void ewah_or(
}
}
 
-   size_t literals = min_size(rlw_i.rlw.literal_words, 
rlw_j.rlw.literal_words);
+   literals = min_size(rlw_i.rlw.literal_words, 
rlw_j.rlw.literal_words);
 
if (literals) {
size_t k;
diff --git a/ewah/ewah_io.c b/ewah/ewah_io.c
index db6c062..05c51d9 100644
--- a/ewah/ewah_io.c
+++ b/ewah/ewah_io.c
@@ -58,19 +58,26 @@ int ewah_serialize_to(struct ewah_bitmap *self,
eword_t dump[2048];
const size_t words_per_dump = sizeof(dump) / sizeof(eword_t);
 
-   /* 32 bit -- bit size fr the map */
-   uint32_t bitsize =  htonl((uint32_t)self->bit_size);
+   /* 32 bit -- bit size for the map */
+   uint32_t bitsize;
+   /* 32 bit -- number of compressed 64-bit words */
+   uint32_t word_count;
+   /* 64 bit x N -- compressed words */
+   const eword_t *buffer = self->buffer;
+   size_t words_left;
+
+   /* 32 bit -- position for the RLW */
+   uint32_t rlw_pos;
+
+   bitsize =  htonl((uint32_t)self->bit_size);
if

Re: [PATCH 0/19] pack bitmaps

2013-10-24 Thread Junio C Hamano
Jeff King  writes:

> A similar series has been running on github.com for the past couple of
> months, though not every repository has had bitmaps turned on (but some
> very busy ones have).  We've hopefully squeezed out all of the bugs and
> corner cases over that time. However, I did rebase this on a more modern
> version of "master"; among other conflicts, this required porting the
> git-repack changes from shell to C. So it's entirely possible I've
> introduced new bugs. :)
>
> The idea and original implementation for bitmaps comes from Shawn and
> Colby, of course. The hard work in this series was done by Vicent Marti,
> and he is credited as the author in most of the patches. I've added some
> window dressing and helped a little with debugging and review. But along
> with Vicent, I should be able to help with answering questions for
> review, and as time goes on, I'm familiar enough with the code to deal
> with bugs and reviewing future changes.

Woo-hoo.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/19] pack bitmaps

2013-10-24 Thread Jeff King
This series implements JGit-style pack bitmaps to speed up fetching and
cloning. For example, here is a simulation of the server side of a clone
of a fully-packed kernel repo (measuring actual clones is harder,
because the client does a lot of work on resolving deltas):

   [before]
   $ time git pack-objects --all --stdout /dev/null
   Counting objects: 3237103, done.
   Compressing objects: 100% (508752/508752), done.
   Total 3237103 (delta 2699584), reused 3237103 (delta 2699584)

   real0m44.111s
   user0m42.396s
   sys 0m3.544s


   [after]
   $ time git pack-objects --all --stdout /dev/null
   Reusing existing pack: 3237103, done.
   Total 3237103 (delta 0), reused 0 (delta 0)

   real0m1.636s
   user0m1.460s
   sys 0m0.172s


This helps eliminate load on the server side, but it also means that we
actually start transferring objects way faster, which means the clones
finish faster. If you look at current clones of torvalds/linux from
kernel.org, it's almost two minutes before they actually start sending
you any data, during which time the client is twiddling its thumbs.

The bitmaps implemented here are compatible with those produced by JGit.
We can read JGit-produced bitmaps, and JGit can read ours. The one
exception is the final patch, which adds an optional name-hash cache.
It's added in such a way that existing implementations can ignore it,
and is marked with a flag in the header. However, JGit is very picky
about the "flags" field; it will reject any bitmap index with a flag it
does not know about.

The patches are:

  [01/19]: sha1write: make buffer const-correct
  [02/19]: revindex: Export new APIs
  [03/19]: pack-objects: Refactor the packing list
  [04/19]: pack-objects: factor out name_hash
  [05/19]: revision: allow setting custom limiter function
  [06/19]: sha1_file: export `git_open_noatime`
  [07/19]: compat: add endianness helpers
  [08/19]: ewah: compressed bitmap implementation

Refactoring and support for the rest of the series.

  [09/19]: documentation: add documentation for the bitmap format
  [10/19]: pack-bitmap: add support for bitmap indexes
  [11/19]: pack-objects: use bitmaps when packing objects
  [12/19]: rev-list: add bitmap mode to speed up object lists

Bitmap reading (you can test it against JGit at this point by
running "jgit debug-gc", and then cloning or running rev-list).

  [13/19]: pack-objects: implement bitmap writing
  [14/19]: repack: stop using magic number for ARRAY_SIZE(exts)
  [15/19]: repack: turn exts array into array-of-struct
  [16/19]: repack: handle optional files created by pack-objects
  [17/19]: repack: consider bitmaps when performing repacks

Bitmap writing (you can test against JGit by running
"git repack -adb", and then running "jgit daemon" to
serve the result).

  [18/19]: t: add basic bitmap functionality tests

With reading and writing, we can do our own tests.

  [19/19]: pack-bitmap: implement optional name_hash cache

And this is our extension.

A similar series has been running on github.com for the past couple of
months, though not every repository has had bitmaps turned on (but some
very busy ones have).  We've hopefully squeezed out all of the bugs and
corner cases over that time. However, I did rebase this on a more modern
version of "master"; among other conflicts, this required porting the
git-repack changes from shell to C. So it's entirely possible I've
introduced new bugs. :)

The idea and original implementation for bitmaps comes from Shawn and
Colby, of course. The hard work in this series was done by Vicent Marti,
and he is credited as the author in most of the patches. I've added some
window dressing and helped a little with debugging and review. But along
with Vicent, I should be able to help with answering questions for
review, and as time goes on, I'm familiar enough with the code to deal
with bugs and reviewing future changes.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html