Re: [RFC PATCH 2/5] Add delta-islands.{c,h}
On Mon, Aug 6, 2018 at 8:54 PM Christian Couder wrote: > > Is it worth it? The "pahole" comment in this file is up to date. We > > use 80 bytes per object. This series makes the struct 88 bytes (I've > > just rerun pahole). > > Did you run it on V1 or on V2? I guess on V2, but then what do you > think about converting the 'layer' field into a bit field, which might > be simpler and save space? V2. I kinda ignored the layer field because Jeff was suggesting that it may not be needed. It may be better to just move it out though because it's not that complicated and even if we have enough bits left, they may be spread out that it's hard to reorder so that they're grouped together and use can use them. > > On linux repo with 12M objects, "git pack-objects > > --all" needs extra 96MB memory even if this feature is not used. So > > yes I still think it's worth moving these fields out of struct > > object_entry. > > And what about the fields already in struct object_entry? While I am > at it, I think I could move some of them too if it is really so worth Way ahead of you. In v2.17.0 this struct is 136 bytes so going down to 80 bytes now is a big memory saving. I think I squeezed everything possible and was a bit too aggressive that I created a new performance regression in v2.18.0 ;-) -- Duy
Re: [RFC PATCH 2/5] Add delta-islands.{c,h}
On Mon, Aug 6, 2018 at 5:53 PM, Duy Nguyen wrote: > On Sun, Aug 5, 2018 at 8:53 PM Christian Couder > wrote: >> >> As you can see the patch 6/6 (in the v2 of this patch series that I >> just sent) moves `unsigned int tree_depth` from 'struct object_entry' >> to 'struct packing_data'. I am not sure that I did it right and that >> it is worth it though as it is a bit more complex. > > It is a bit more complex than I expected. But I think if you go with > Jeff's suggestion (i.e. think of the new tree_depth array an extension > of objects array) it's a bit simpler: you access both arrays using the > same index, both arrays should have the same size and be realloc'd at > the same time in packlist_alloc(). Ok, I will take a look at doing that to simplify things. Thanks to Peff and you for that suggestion! > Is it worth it? The "pahole" comment in this file is up to date. We > use 80 bytes per object. This series makes the struct 88 bytes (I've > just rerun pahole). Did you run it on V1 or on V2? I guess on V2, but then what do you think about converting the 'layer' field into a bit field, which might be simpler and save space? > On linux repo with 12M objects, "git pack-objects > --all" needs extra 96MB memory even if this feature is not used. So > yes I still think it's worth moving these fields out of struct > object_entry. And what about the fields already in struct object_entry? While I am at it, I think I could move some of them too if it is really so worth it.
Re: [RFC PATCH 2/5] Add delta-islands.{c,h}
On Sun, Aug 5, 2018 at 8:53 PM Christian Couder wrote: > > On Sun, Jul 22, 2018 at 10:50 AM, Duy Nguyen wrote: > > On Sun, Jul 22, 2018 at 7:51 AM Christian Couder > > wrote: > > >> diff --git a/pack-objects.h b/pack-objects.h > >> index edf74dabdd..8eecd67991 100644 > >> --- a/pack-objects.h > >> +++ b/pack-objects.h > >> @@ -100,6 +100,10 @@ struct object_entry { > >> unsigned type_:TYPE_BITS; > >> unsigned no_try_delta:1; > >> unsigned in_pack_type:TYPE_BITS; /* could be delta */ > >> + > >> + unsigned int tree_depth; /* should be repositioned for packing? */ > >> + unsigned char layer; > >> + > > > > This looks very much like an optional feature. To avoid increasing > > pack-objects memory usage for common case, please move this to struct > > packing_data. > > As you can see the patch 6/6 (in the v2 of this patch series that I > just sent) moves `unsigned int tree_depth` from 'struct object_entry' > to 'struct packing_data'. I am not sure that I did it right and that > it is worth it though as it is a bit more complex. It is a bit more complex than I expected. But I think if you go with Jeff's suggestion (i.e. think of the new tree_depth array an extension of objects array) it's a bit simpler: you access both arrays using the same index, both arrays should have the same size and be realloc'd at the same time in packlist_alloc(). Is it worth it? The "pahole" comment in this file is up to date. We use 80 bytes per object. This series makes the struct 88 bytes (I've just rerun pahole). On linux repo with 12M objects, "git pack-objects --all" needs extra 96MB memory even if this feature is not used. So yes I still think it's worth moving these fields out of struct object_entry. -- Duy
Re: [RFC PATCH 2/5] Add delta-islands.{c,h}
On Sun, Aug 05, 2018 at 08:53:19PM +0200, Christian Couder wrote: > - 'layer' is used in add_to_write_order() which is called from many > places in add_descendants_to_write_order() and compute_write_order() > for example like this: > > for (s = DELTA_SIBLING(e); s; s = DELTA_SIBLING(s)) { > add_to_write_order(wo, endp, s); > } > > So it seems to me that moving 'layer' to 'struct packing_data' would > require changing the DELTA_SIBLING macros or adding a hack to easily > find the index in an array corresponding to a 'struct object_entry'. Right, that hack is how the various the existing memory-shrinking macros work. Every "struct object_entry *" that we deal with _must_ be in the list in "packing_data", so that we can compute an index as "entry - to_pack.objects". So I think Duy is just suggesting: void oe_set_layer(struct packing_data *pack, struct object_entry *entry, int layer) { if (!pack->layers) ALLOC_ARRAY(pack->layers, pack->nr_objects); pack->layers[entry - pack->objects] = layer; } int oe_get_layer(struct packing_data *pack, struct object_entry *entry) { if (!pack->layers) return 0; return pack->layers[entry - pack->nr_objects]; } That said, I wonder if need to cache the layer at all. We compute the layers all at once in the new compute_pack_layers(). But it is just a linear walk over the objects, asking for each "is this in the core island?". Could we just ask "is this in the core island?" when we consider each object? That's a hash lookup each time we ask instead of looking at our int, but I'd think we would only ask in linear proportion to the number of objects. So there's no algorithmic speedup, though maybe a constant-time one (for two layers, I'd expect we'd probably ask about each object twice). > - I don't see why it is different from other fields in 'struct > object_entry' that are also used in compute_write_order(), for > example: > > uint32_t delta_idx;/* delta base object */ > uint32_t delta_child_idx; /* deltified objects who bases me */ > uint32_t delta_sibling_idx; /* other deltified objects who ... */ > > Would we also want to move these fields to 'struct packing_data'? Why > or why not? If we want to move them, then it might be a good idea to > do all the moving at the same time to make sure we get something > coherent. We actually use those multiple times. They're reset and used in compute_write_order(), but I think we use them earlier as part of the delta search (at the very least we use delta_idx; maybe no the child/sibling stuff). -Peff
Re: [RFC PATCH 2/5] Add delta-islands.{c,h}
On Sun, Jul 22, 2018 at 10:50 AM, Duy Nguyen wrote: > On Sun, Jul 22, 2018 at 7:51 AM Christian Couder > wrote: >> diff --git a/pack-objects.h b/pack-objects.h >> index edf74dabdd..8eecd67991 100644 >> --- a/pack-objects.h >> +++ b/pack-objects.h >> @@ -100,6 +100,10 @@ struct object_entry { >> unsigned type_:TYPE_BITS; >> unsigned no_try_delta:1; >> unsigned in_pack_type:TYPE_BITS; /* could be delta */ >> + >> + unsigned int tree_depth; /* should be repositioned for packing? */ >> + unsigned char layer; >> + > > This looks very much like an optional feature. To avoid increasing > pack-objects memory usage for common case, please move this to struct > packing_data. As you can see the patch 6/6 (in the v2 of this patch series that I just sent) moves `unsigned int tree_depth` from 'struct object_entry' to 'struct packing_data'. I am not sure that I did it right and that it is worth it though as it is a bit more complex. About `unsigned char layer` I am even less sure that it's worth it for the following reasons: - 'struct object_entry' contains bit fields as its last members and then the following comment: /* * pahole results on 64-bit linux (gcc and clang) * * size: 80, bit_padding: 20 bits, holes: 8 bits * * and on 32-bit (gcc) * * size: 76, bit_padding: 20 bits, holes: 8 bits */ I am not sure if this comment is still up to date, but if it true maybe there is enough space in the padding to add 'layer' as a 8 bit bit field somewhere without increasing the size of 'struct object_entry'. - 'layer' is used in add_to_write_order() which is called from many places in add_descendants_to_write_order() and compute_write_order() for example like this: for (s = DELTA_SIBLING(e); s; s = DELTA_SIBLING(s)) { add_to_write_order(wo, endp, s); } So it seems to me that moving 'layer' to 'struct packing_data' would require changing the DELTA_SIBLING macros or adding a hack to easily find the index in an array corresponding to a 'struct object_entry'. - I don't see why it is different from other fields in 'struct object_entry' that are also used in compute_write_order(), for example: uint32_t delta_idx;/* delta base object */ uint32_t delta_child_idx; /* deltified objects who bases me */ uint32_t delta_sibling_idx; /* other deltified objects who ... */ Would we also want to move these fields to 'struct packing_data'? Why or why not? If we want to move them, then it might be a good idea to do all the moving at the same time to make sure we get something coherent. Thanks, Christian.
Re: [RFC PATCH 2/5] Add delta-islands.{c,h}
On Tue, Jul 24, 2018 at 09:47:29AM -0700, Junio C Hamano wrote: > > +/* > > + * Allocate a new bitmap; if "old" is not NULL, the new bitmap will be a > > copy > > + * of "old". Otherwise, the new bitmap is empty. > > + */ > > +static struct island_bitmap *island_bitmap_new(const struct island_bitmap > > *old) > > +{ > > + size_t size = sizeof(struct island_bitmap) + (island_bitmap_size * 4); > > + struct island_bitmap *b = xcalloc(1, size); > > Is one of the variants of flex array macros applicable here? Hmm, maybe. The tricky thing about this bitmap (and this touches on your "why another bitmap" question below) is that we have a _ton_ of them (one per object). So we want to do what we can to keep memory usage down. So there are two weird things: - these bitmaps do not know their own size. They are all exactly the same size (one bit per island), and we store the size only once in the whole program. - they are copy-on-write. This helps because there's a lot of locality to the maps within the object graph. But it also means sometimes we initialize empty, and sometimes from another array. So I think it might work to do: if (old) { /* copy bits from existing bitmap */ FLEX_ALLOC_MEM(b, bits, old->bits, st_mult(island_bitmap_size, 4)); } else { /* start with all-zero bits courtesy of xcalloc */ FLEX_ALLOC_MEM(b, bits, NULL, 0); } That'd still waste an extra byte per struct (or 4, I guess, due to padding), since we unconditionally NUL-terminate the flex-struct (since they were really designed around string storage). > > +static int island_bitmap_get(struct island_bitmap *self, uint32_t i) > > +{ > > + return (self->bits[ISLAND_BITMAP_BLOCK(i)] & ISLAND_BITMAP_MASK(i)) != > > 0; > > +} > > + > > Not necessarily a complaint, but do we need another implementation > of bitmap here, or the compressed bitmap used for the pack bitmap is > unsuited for the purpose of this thing (e.g. perhaps it is overkill, > as we won't be shooting for saving disk footprint of a bitmap that > we are not going to save on disk anyway)? Mostly it's about memory savings, as I discussed above. We _could_ use ewah bitmaps instead of naive ones here. Those are bad for random access to bits, but I think most of the operations are "is this bitmap a subset of this other one" which can be done by walking them linearly. I doubt the compression would be all that significant, though. They do best when there's a lot of locality in the bitmap. Here our bits are the islands, grouped and ordered by some subset of the refname. I don't think there's any reason to think that refs/heads/a and refs/heads/b would be correlated. OTOH, if most objects are in most islands (e.g., because the islands represent a bunch of forks of some "base" history), then you might get a big run of 1's for those objects. I don't think we really experimented with it. To be honest, I don't recall how much measuring we did for the refcounted bitmaps, either. So I'm not sure how much the copy-on-write saves (though my vague recollection is that it was worth doing if you have a lot of islands). > > + /* > > +* We process only trees, as commits and tags have already been handled > > +* (and passed their marks on to root trees, as well. We must make sure > > +* to process them in descending tree-depth order so that marks > > +* propagate down the tree properly, even if a sub-tree is found in > > +* multiple parent trees. > > +*/ > > + todo = xmalloc(to_pack->nr_objects * sizeof(*todo)); > > + for (i = 0; i < to_pack->nr_objects; i++) { > > + if (oe_type(_pack->objects[i]) == OBJ_TREE) > > + todo[nr++] = _pack->objects[i]; > > + } > > + qsort(todo, nr, sizeof(*todo), cmp_tree_depth); > > Hmph, at this stage nobody actually seems to set tree_depth; I am > wondering how a tree that is at the root in one commit but appears > as a subdirectory in another commit is handled by this code. The tree depth is set when we actually walk the graph collecting objects, and we use the max depth at which we've seen it. That said, this happens in show_object(), which I think we'd avoid entering twice for the same object (due to list-objects setting the SEEN flag). So I suspect that yes, you could have a case where a tree has multiple depths, and we'd visit them topologically out of order. And that could lead to objects reachable from that tree missing some of their islands. Something like: - refs/remotes/123/master points to commit C which points to tree X, which points to blob B - refs/remotes/456/master points to commit D which points to tree Y, which points to tree X, which points to blob B We mark C with island "123" and D with island "456", and ditto for their root trees X and Y. Then we sort the trees by depth, with the intent that Y would come before its child X. But due to the bogus depth, X may come first. We propagate the mark for 123 down
Re: [RFC PATCH 2/5] Add delta-islands.{c,h}
On Sun, Jul 22, 2018 at 07:48:33AM +0200, Christian Couder wrote: > + /* > + * We process only trees, as commits and tags have already been handled > + * (and passed their marks on to root trees, as well. We must make sure > + * to process them in descending tree-depth order so that marks > + * propagate down the tree properly, even if a sub-tree is found in > + * multiple parent trees. > + */ > + todo = xmalloc(to_pack->nr_objects * sizeof(*todo)); I was fiddling with "make coccicheck", and it looks like this code could stand some modernization. This could use ALLOC_ARRAY(). > + for (i = 0; i < to_pack->nr_objects; i++) { > + if (oe_type(_pack->objects[i]) == OBJ_TREE) > + todo[nr++] = _pack->objects[i]; > + } > + qsort(todo, nr, sizeof(*todo), cmp_tree_depth); And this QSORT(). There are a few others, I won't list them all. The only tricky one I see is: > + free(tree->buffer); > + tree->buffer = NULL; > + tree->object.parsed = 0; This suggests FREE_AND_NULL(), but I think it actually the whole block should become a call to free_tree_buffer(). -Peff
Re: [RFC PATCH 2/5] Add delta-islands.{c,h}
Christian Couder writes: > From: Jeff King > > Hosting providers that allow users to "fork" existing > repos want as much as possible those forks to use a > small amount of disk space. "... want those forks to consume as little amount of disk space as possible?" Or perhaps "... want those forks to share as much disk space as possible"? > Using alternates to keep all the objects from all the forks into a > unique central repo is way to do that, but ... s/is way to/is a way to/, I guess, but that makes it sound as if the series invented a way to share objects without using alternates. > it can have some > drawbacks. Especially when packing the central repo, deltas will > be created between objects from different forks. > > This can make cloning or fetching a fork much slower and much more > CPU intensive as Git might have to compute new deltas for many > objects to avoid sending objects from a different fork. > > Delta islands is a way to store objects from different forks in > the same repo and packfile without having deltas between objects > from different forks. I think another paragraph between the above two is needed to briefly outline the central idea (which would make it clear why that is called "delta islands") is called for. Perhaps Because the inefficiency primarily arises when an object is delitified against another object that does not exist in the same fork, we partion objects into sets that appear in the same fork, and define "delta islands". When finding delta base, we do not allow an object outside the same island to be considered as its base. or something along that line. Perhaps that would even make the last paragraph unnecessary. > +struct island_bitmap { > + uint32_t refcount; > + uint32_t bits[]; > +}; > + > +static uint32_t island_bitmap_size; > + > +/* > + * Allocate a new bitmap; if "old" is not NULL, the new bitmap will be a copy > + * of "old". Otherwise, the new bitmap is empty. > + */ > +static struct island_bitmap *island_bitmap_new(const struct island_bitmap > *old) > +{ > + size_t size = sizeof(struct island_bitmap) + (island_bitmap_size * 4); > + struct island_bitmap *b = xcalloc(1, size); Is one of the variants of flex array macros applicable here? > + if (old) > + memcpy(b, old, size); > + > + b->refcount = 1; > + return b; > +} > + > +static void island_bitmap_or(struct island_bitmap *a, const struct > island_bitmap *b) > +{ > + uint32_t i; > + > + for (i = 0; i < island_bitmap_size; ++i) > + a->bits[i] |= b->bits[i]; > +} > + > +static int island_bitmap_is_subset(struct island_bitmap *self, > + struct island_bitmap *super) > +{ > + uint32_t i; > + > + if (self == super) > + return 1; > + > + for (i = 0; i < island_bitmap_size; ++i) { > + if ((self->bits[i] & super->bits[i]) != self->bits[i]) > + return 0; > + } > + > + return 1; > +} > +#define ISLAND_BITMAP_BLOCK(x) (x / 32) > +#define ISLAND_BITMAP_MASK(x) (1 << (x % 32)) > + > +static void island_bitmap_set(struct island_bitmap *self, uint32_t i) > +{ > + self->bits[ISLAND_BITMAP_BLOCK(i)] |= ISLAND_BITMAP_MASK(i); > +} > + > +static int island_bitmap_get(struct island_bitmap *self, uint32_t i) > +{ > + return (self->bits[ISLAND_BITMAP_BLOCK(i)] & ISLAND_BITMAP_MASK(i)) != > 0; > +} > + Not necessarily a complaint, but do we need another implementation of bitmap here, or the compressed bitmap used for the pack bitmap is unsuited for the purpose of this thing (e.g. perhaps it is overkill, as we won't be shooting for saving disk footprint of a bitmap that we are not going to save on disk anyway)? > +static int cmp_tree_depth(const void *va, const void *vb) > +{ > + struct object_entry *a = *(struct object_entry **)va; > + struct object_entry *b = *(struct object_entry **)vb; > + return a->tree_depth - b->tree_depth; > +} > + > +void resolve_tree_islands(int progress, struct packing_data *to_pack) > +{ > + struct progress *progress_state = NULL; > + struct object_entry **todo; > + int nr = 0; > + int i; > + > + if (!island_marks) > + return; > + > + /* > + * We process only trees, as commits and tags have already been handled > + * (and passed their marks on to root trees, as well. We must make sure > + * to process them in descending tree-depth order so that marks > + * propagate down the tree properly, even if a sub-tree is found in > + * multiple parent trees. > + */ > + todo = xmalloc(to_pack->nr_objects * sizeof(*todo)); > + for (i = 0; i < to_pack->nr_objects; i++) { > + if (oe_type(_pack->objects[i]) == OBJ_TREE) > + todo[nr++] = _pack->objects[i]; > + } > + qsort(todo, nr, sizeof(*todo), cmp_tree_depth); Hmph, at this stage nobody actually seems to set tree_depth; I am wondering
Re: [RFC PATCH 2/5] Add delta-islands.{c,h}
On Sun, Jul 22, 2018 at 10:50 AM, Duy Nguyen wrote: > On Sun, Jul 22, 2018 at 7:51 AM Christian Couder > wrote: >> +pack.island:: >> + A regular expression configuring a set of delta islands. See >> + "DELTA ISLANDS" in linkgit:git-pack-objects[1] for details. >> + > > That section is not added until 3/5 though. Yeah, so I guess it is better to move this hunk to 3/5 and keep pack.island undocumented until the delta islands code is actually used by pack-objects. >> diff --git a/delta-islands.c b/delta-islands.c >> new file mode 100644 >> index 00..645fe966c5 >> --- /dev/null >> +++ b/delta-islands.c >> @@ -0,0 +1,490 @@ >> +#include "builtin.h" > > A bit weird that builtin.h would be needed... Yeah, I will get rid of that include in the next iteration. >> + if (progress) >> + progress_state = start_progress("Propagating island marks", >> nr); > > _() (same comment for other strings too) Ok, the strings will be marked for translation in the next iteration. >> diff --git a/pack-objects.h b/pack-objects.h >> index edf74dabdd..8eecd67991 100644 >> --- a/pack-objects.h >> +++ b/pack-objects.h >> @@ -100,6 +100,10 @@ struct object_entry { >> unsigned type_:TYPE_BITS; >> unsigned no_try_delta:1; >> unsigned in_pack_type:TYPE_BITS; /* could be delta */ >> + >> + unsigned int tree_depth; /* should be repositioned for packing? */ >> + unsigned char layer; >> + > > This looks very much like an optional feature. To avoid increasing > pack-objects memory usage for common case, please move this to struct > packing_data. Ok, I will take a look at that. Thanks for the review, Christian.
Re: [RFC PATCH 2/5] Add delta-islands.{c,h}
On Sun, Jul 22, 2018 at 7:51 AM Christian Couder wrote: > diff --git a/Documentation/config.txt b/Documentation/config.txt > index a32172a43c..f682e92a1a 100644 > --- a/Documentation/config.txt > +++ b/Documentation/config.txt > @@ -2542,6 +2542,10 @@ Note that changing the compression level will not > automatically recompress > all existing objects. You can force recompression by passing the -F option > to linkgit:git-repack[1]. > > +pack.island:: > + A regular expression configuring a set of delta islands. See > + "DELTA ISLANDS" in linkgit:git-pack-objects[1] for details. > + That section is not added until 3/5 though. > diff --git a/delta-islands.c b/delta-islands.c > new file mode 100644 > index 00..645fe966c5 > --- /dev/null > +++ b/delta-islands.c > @@ -0,0 +1,490 @@ > +#include "builtin.h" A bit weird that builtin.h would be needed... > +void resolve_tree_islands(int progress, struct packing_data *to_pack) > +{ > + struct progress *progress_state = NULL; > + struct object_entry **todo; > + int nr = 0; > + int i; > + > + if (!island_marks) > + return; > + > + /* > +* We process only trees, as commits and tags have already been > handled > +* (and passed their marks on to root trees, as well. We must make > sure > +* to process them in descending tree-depth order so that marks > +* propagate down the tree properly, even if a sub-tree is found in > +* multiple parent trees. > +*/ > + todo = xmalloc(to_pack->nr_objects * sizeof(*todo)); > + for (i = 0; i < to_pack->nr_objects; i++) { > + if (oe_type(_pack->objects[i]) == OBJ_TREE) > + todo[nr++] = _pack->objects[i]; > + } > + qsort(todo, nr, sizeof(*todo), cmp_tree_depth); > + > + if (progress) > + progress_state = start_progress("Propagating island marks", > nr); _() (same comment for other strings too) > diff --git a/pack-objects.h b/pack-objects.h > index edf74dabdd..8eecd67991 100644 > --- a/pack-objects.h > +++ b/pack-objects.h > @@ -100,6 +100,10 @@ struct object_entry { > unsigned type_:TYPE_BITS; > unsigned no_try_delta:1; > unsigned in_pack_type:TYPE_BITS; /* could be delta */ > + > + unsigned int tree_depth; /* should be repositioned for packing? */ > + unsigned char layer; > + This looks very much like an optional feature. To avoid increasing pack-objects memory usage for common case, please move this to struct packing_data. > unsigned preferred_base:1; /* > * we do not pack this, but is available > * to be used as the base object to delta > -- > 2.18.0.237.gffdb1dbdaa -- Duy
[RFC PATCH 2/5] Add delta-islands.{c,h}
From: Jeff King Hosting providers that allow users to "fork" existing repos want as much as possible those forks to use a small amount of disk space. Using alternates to keep all the objects from all the forks into a unique central repo is way to do that, but it can have some drawbacks. Especially when packing the central repo, deltas will be created between objects from different forks. This can make cloning or fetching a fork much slower and much more CPU intensive as Git might have to compute new deltas for many objects to avoid sending objects from a different fork. Delta islands is a way to store objects from different forks in the same repo and packfile without having deltas between objects from different forks. This patch implements the delta islands mechanism in "delta-islands.{c,h}", but does not yet make use of it. A few new fields are added in 'struct object_entry' in "pack-objects.h" though. The new "pack.island" config variable which can be used to configure the different islands is described a bit in "Documentation/config.txt", but the real documentation will follow in a patch that actually uses delta islands in "builtin/pack-objects.c". Signed-off-by: Jeff King Signed-off-by: Christian Couder --- Documentation/config.txt | 4 + Makefile | 1 + delta-islands.c | 490 +++ delta-islands.h | 11 + pack-objects.h | 4 + 5 files changed, 510 insertions(+) create mode 100644 delta-islands.c create mode 100644 delta-islands.h diff --git a/Documentation/config.txt b/Documentation/config.txt index a32172a43c..f682e92a1a 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -2542,6 +2542,10 @@ Note that changing the compression level will not automatically recompress all existing objects. You can force recompression by passing the -F option to linkgit:git-repack[1]. +pack.island:: + A regular expression configuring a set of delta islands. See + "DELTA ISLANDS" in linkgit:git-pack-objects[1] for details. + pack.deltaCacheSize:: The maximum memory in bytes used for caching deltas in linkgit:git-pack-objects[1] before writing them out to a pack. diff --git a/Makefile b/Makefile index 08e5c54549..2804298977 100644 --- a/Makefile +++ b/Makefile @@ -840,6 +840,7 @@ LIB_OBJS += csum-file.o LIB_OBJS += ctype.o LIB_OBJS += date.o LIB_OBJS += decorate.o +LIB_OBJS += delta-islands.o LIB_OBJS += diffcore-break.o LIB_OBJS += diffcore-delta.o LIB_OBJS += diffcore-order.o diff --git a/delta-islands.c b/delta-islands.c new file mode 100644 index 00..645fe966c5 --- /dev/null +++ b/delta-islands.c @@ -0,0 +1,490 @@ +#include "builtin.h" +#include "cache.h" +#include "attr.h" +#include "object.h" +#include "blob.h" +#include "commit.h" +#include "tag.h" +#include "tree.h" +#include "delta.h" +#include "pack.h" +#include "tree-walk.h" +#include "diff.h" +#include "revision.h" +#include "list-objects.h" +#include "progress.h" +#include "refs.h" +#include "khash.h" +#include "pack-bitmap.h" +#include "pack-objects.h" +#include "delta-islands.h" +#include "sha1-array.h" +#include "config.h" + +KHASH_INIT(str, const char *, void *, 1, kh_str_hash_func, kh_str_hash_equal) + +static khash_sha1 *island_marks; +static unsigned island_counter; +static unsigned island_counter_core; + +static kh_str_t *remote_islands; + +struct remote_island { + uint64_t hash; + struct oid_array oids; +}; + +struct island_bitmap { + uint32_t refcount; + uint32_t bits[]; +}; + +static uint32_t island_bitmap_size; + +/* + * Allocate a new bitmap; if "old" is not NULL, the new bitmap will be a copy + * of "old". Otherwise, the new bitmap is empty. + */ +static struct island_bitmap *island_bitmap_new(const struct island_bitmap *old) +{ + size_t size = sizeof(struct island_bitmap) + (island_bitmap_size * 4); + struct island_bitmap *b = xcalloc(1, size); + + if (old) + memcpy(b, old, size); + + b->refcount = 1; + return b; +} + +static void island_bitmap_or(struct island_bitmap *a, const struct island_bitmap *b) +{ + uint32_t i; + + for (i = 0; i < island_bitmap_size; ++i) + a->bits[i] |= b->bits[i]; +} + +static int island_bitmap_is_subset(struct island_bitmap *self, + struct island_bitmap *super) +{ + uint32_t i; + + if (self == super) + return 1; + + for (i = 0; i < island_bitmap_size; ++i) { + if ((self->bits[i] & super->bits[i]) != self->bits[i]) + return 0; + } + + return 1; +} + +#define ISLAND_BITMAP_BLOCK(x) (x / 32) +#define ISLAND_BITMAP_MASK(x) (1 << (x % 32)) + +static void island_bitmap_set(struct island_bitmap *self, uint32_t i) +{ + self->bits[ISLAND_BITMAP_BLOCK(i)] |= ISLAND_BITMAP_MASK(i); +} + +static int island_bitmap_get(struct island_bitmap *self, uint32_t i) +{ +