Re: [PATCH 5/6] pack-bitmap: save "have" bitmap from walk
On Fri, Aug 31, 2018 at 05:23:17PM +0200, Ævar Arnfjörð Bjarmason wrote: > On Tue, Aug 21 2018, Jeff King wrote: > > > +int bitmap_has_sha1_in_uninteresting(struct bitmap_index *bitmap_git, > > +const unsigned char *sha1) > > +{ > > + int pos; > > + > > + if (!bitmap_git) > > + return 0; /* no bitmap loaded */ > > + if (!bitmap_git->result) > > + BUG("failed to perform bitmap walk before querying"); > > Some part of what calls this completely breaks pushing from the "next" > branch when you have local bitmaps (we *really* should have some tests > for this...). Yikes, thanks for reporting. I agree we need better tests here. This assertion is totally bogus, as traverse_bitmap_commit_list() will actually free (and NULL) the result field! This is a holdover from the original bitmap code, where bitmap_git was a static, and this is how you might prepare a second walk (though AFAIK only ever do one walk per process). These days prepare_bitmap_walk() actually returns a fresh bitmap_index struct. So there's no way to know if traverse_bitmap_commit_list() was called. But as it turns out, we don't care. We want to know if "bitmap_git->have" was set up, which is done in prepare_bitmap_walk(). So there's no way[1] to get a bitmap_git that hasn't been properly set up. This BUG() check can just go away. I'll prepare a patch and some tests later tonight. -Peff [1] Actually, there is also prepare_bitmap_git(), but it is not really for general use by callers. It should be made static, or better yet, I suspect it can be folded into its callers.
Re: [PATCH 5/6] pack-bitmap: save "have" bitmap from walk
On Tue, Aug 21 2018, Jeff King wrote: > +int bitmap_has_sha1_in_uninteresting(struct bitmap_index *bitmap_git, > + const unsigned char *sha1) > +{ > + int pos; > + > + if (!bitmap_git) > + return 0; /* no bitmap loaded */ > + if (!bitmap_git->result) > + BUG("failed to perform bitmap walk before querying"); Some part of what calls this completely breaks pushing from the "next" branch when you have local bitmaps (we *really* should have some tests for this...). I don't have time to dig now, but just on my local git.git on next @ b1634b371dc2e46f9b43c45fd1857c2e2688f96e: u git (next $=) $ git repack -Ad --window=10 --depth=10 --write-bitmap-index --pack-kept-objects Enumerating objects: 616909, done. Counting objects: 100% (616909/616909), done. Delta compression using up to 8 threads Compressing objects: 100% (146475/146475), done. Writing objects: 100% (616909/616909), done. Selecting bitmap commits: 168027, done. BBuilding bitmaps: 100% (338/338), done. Total 616909 (delta 497494), reused 582609 (delta 467530) u git (next $=) $ ./git --exec-path=$PWD push avar next:avar/next-push Enumerating objects: 1330, done. BUG: pack-bitmap.c:1132: failed to perform bitmap walk before querying error: pack-objects died of signal 6 error: pack-objects died of signal 6 error: remote unpack failed: eof before pack header was fully read error: failed to push some refs to 'g...@github.com:avar/git.git' Removing the bitmap makes it work again: u git (next $=) $ rm .git/objects/pack/pack-496088d9464cd79dfcac50dd0d72dcd6faee8253.bitmap rm: remove write-protected regular file '.git/objects/pack/pack-496088d9464cd79dfcac50dd0d72dcd6faee8253.bitmap'? y u git (next $=) $ ./git --exec-path=$PWD push avar next:avar/next-push Enumerating objects: 2834, done. Counting objects: 100% (1799/1799), done. Delta compression using up to 8 threads Compressing objects: 100% (591/591), done. Writing objects: 100% (1390/1390), 430.11 KiB | 15.36 MiB/s, done. Total 1390 (delta 1100), reused 1072 (delta 798) remote: Resolving deltas: 100% (1100/1100), completed with 214 local objects. To github.com:avar/git.git * [new branch]next -> avar/next-push Today I deployed next + my patches @ work. Broke pushes in one place where repacks were being done manually with --write-bitmap-index.
Re: [PATCH 5/6] pack-bitmap: save "have" bitmap from walk
On Tue, Aug 21, 2018 at 03:47:36PM -0400, Derrick Stolee wrote: > On 8/21/2018 3:07 PM, Jeff King wrote: > > When we do a bitmap walk, we save the result, which > > represents (WANTs & ~HAVEs); i.e., every object we care > > about visiting in our walk. However, we throw away the > > haves bitmap, which can sometimes be useful, too. Save it > > and provide an access function so code which has performed a > > walk can query it. > > This makes a lot of sense. Based on the amount of time the "Counting > Objects" blog post [1] spent talking about delta bases, I would have assumed > this "haves" bitmap was already part of it. But, I can also see how it could > be dropped if you are focusing on performance for 'git clone'. I think that blog post was written after we were already using this patch. ;) > > A few notes on the accessor interface: > > > > - the bitmap code calls these "haves" because it grew out > > of the want/have negotiation for fetches. But really, > > these are simply the objects that would be flagged > > UNINTERESTING in a regular traversal. Let's use that > > more universal nomenclature for the external module > > interface. We may want to change the internal naming > > inside the bitmap code, but that's outside the scope of > > this patch. > > I saw the uninteresting-vs-haves name confusion in the patch below, but I > agree with your logic here. > > Sorry that I'm late to the party, but I was interested in the topic. This has already missed the freeze for v2.19, so I think there is plenty of time. More review would be welcome. -Peff
Re: [PATCH 5/6] pack-bitmap: save "have" bitmap from walk
On 8/21/2018 3:07 PM, Jeff King wrote: When we do a bitmap walk, we save the result, which represents (WANTs & ~HAVEs); i.e., every object we care about visiting in our walk. However, we throw away the haves bitmap, which can sometimes be useful, too. Save it and provide an access function so code which has performed a walk can query it. This makes a lot of sense. Based on the amount of time the "Counting Objects" blog post [1] spent talking about delta bases, I would have assumed this "haves" bitmap was already part of it. But, I can also see how it could be dropped if you are focusing on performance for 'git clone'. A few notes on the accessor interface: - the bitmap code calls these "haves" because it grew out of the want/have negotiation for fetches. But really, these are simply the objects that would be flagged UNINTERESTING in a regular traversal. Let's use that more universal nomenclature for the external module interface. We may want to change the internal naming inside the bitmap code, but that's outside the scope of this patch. I saw the uninteresting-vs-haves name confusion in the patch below, but I agree with your logic here. Sorry that I'm late to the party, but I was interested in the topic. Thanks, -Stolee [1] https://githubengineering.com/counting-objects/ "Counting Objects" by Vincent Martí
[PATCH 5/6] pack-bitmap: save "have" bitmap from walk
When we do a bitmap walk, we save the result, which represents (WANTs & ~HAVEs); i.e., every object we care about visiting in our walk. However, we throw away the haves bitmap, which can sometimes be useful, too. Save it and provide an access function so code which has performed a walk can query it. A few notes on the accessor interface: - the bitmap code calls these "haves" because it grew out of the want/have negotiation for fetches. But really, these are simply the objects that would be flagged UNINTERESTING in a regular traversal. Let's use that more universal nomenclature for the external module interface. We may want to change the internal naming inside the bitmap code, but that's outside the scope of this patch. - it still uses a bare "sha1" rather than "oid". That's true of all of the bitmap code. And in this particular instance, our caller in pack-objects is dealing with the bare sha1 that comes from a packed REF_DELTA (we're pointing directly to the mmap'd pack on disk). That's something we'll have to deal with as we transition to a new hash, but we can wait and see how the caller ends up being fixed and adjust this interface accordingly. Signed-off-by: Jeff King --- pack-bitmap.c | 25 - pack-bitmap.h | 7 +++ 2 files changed, 31 insertions(+), 1 deletion(-) diff --git a/pack-bitmap.c b/pack-bitmap.c index f0a1937a1c..c3231ef9ef 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -86,6 +86,9 @@ struct bitmap_index { /* Bitmap result of the last performed walk */ struct bitmap *result; + /* "have" bitmap from the last performed walk */ + struct bitmap *haves; + /* Version of the bitmap index */ unsigned int version; @@ -759,8 +762,8 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs) bitmap_and_not(wants_bitmap, haves_bitmap); bitmap_git->result = wants_bitmap; + bitmap_git->haves = haves_bitmap; - bitmap_free(haves_bitmap); return bitmap_git; cleanup: @@ -1114,5 +1117,25 @@ void free_bitmap_index(struct bitmap_index *b) free(b->ext_index.objects); free(b->ext_index.hashes); bitmap_free(b->result); + bitmap_free(b->haves); free(b); } + +int bitmap_has_sha1_in_uninteresting(struct bitmap_index *bitmap_git, +const unsigned char *sha1) +{ + int pos; + + if (!bitmap_git) + return 0; /* no bitmap loaded */ + if (!bitmap_git->result) + BUG("failed to perform bitmap walk before querying"); + if (!bitmap_git->haves) + return 0; /* walk had no "haves" */ + + pos = bitmap_position_packfile(bitmap_git, sha1); + if (pos < 0) + return 0; + + return bitmap_get(bitmap_git->haves, pos); +} diff --git a/pack-bitmap.h b/pack-bitmap.h index 8a04741e12..c633bf5238 100644 --- a/pack-bitmap.h +++ b/pack-bitmap.h @@ -53,6 +53,13 @@ int rebuild_existing_bitmaps(struct bitmap_index *, struct packing_data *mapping khash_sha1 *reused_bitmaps, int show_progress); void free_bitmap_index(struct bitmap_index *); +/* + * After a traversal has been performed on the bitmap_index, this can be + * queried to see if a particular object was reachable from any of the + * objects flagged as UNINTERESTING. + */ +int bitmap_has_sha1_in_uninteresting(struct bitmap_index *, const unsigned char *sha1); + void bitmap_writer_show_progress(int show); void bitmap_writer_set_checksum(unsigned char *sha1); void bitmap_writer_build_type_index(struct packing_data *to_pack, -- 2.19.0.rc0.398.g138a08f6f6
Re: [PATCH 5/6] pack-bitmap: save "have" bitmap from walk
On Fri, Aug 17, 2018 at 03:39:29PM -0700, Stefan Beller wrote: > > diff --git a/pack-bitmap.h b/pack-bitmap.h > > index 4555907dee..02a60ce670 100644 > > --- a/pack-bitmap.h > > +++ b/pack-bitmap.h > > @@ -50,6 +50,13 @@ int rebuild_existing_bitmaps(struct bitmap_index *, > > struct packing_data *mapping > > khash_sha1 *reused_bitmaps, int show_progress); > > void free_bitmap_index(struct bitmap_index *); > > > > +/* > > + * After a traversal has been performed on the bitmap_index, this can be > > + * queried to see if a particular object was reachable from any of the > > + * objects flagged as UNINTERESTING. > > If the traversal has not been performed, we pretend the > object was not reachable? If the traversal hasn't been performed, the results are not defined (though I suspect yeah, it happens to say "no"). > Is this a good API design, as it can be used when you do not > have done all preparations? similarly to prepare_bitmap_walk > we could have > > if (!bitmap_git->result) > BUG("failed to perform bitmap walk before querying"); That seems like a reasonable precaution. > > +int bitmap_has_sha1_in_uninteresting(struct bitmap_index *, const unsigned > > char *sha1); > > You seem to have rebased it to master resolving conflicts only. ;-) > Do we want to talk about object ids here instead? See the discussion in the commit message. -Peff
Re: [PATCH 5/6] pack-bitmap: save "have" bitmap from walk
> diff --git a/pack-bitmap.h b/pack-bitmap.h > index 4555907dee..02a60ce670 100644 > --- a/pack-bitmap.h > +++ b/pack-bitmap.h > @@ -50,6 +50,13 @@ int rebuild_existing_bitmaps(struct bitmap_index *, struct > packing_data *mapping > khash_sha1 *reused_bitmaps, int show_progress); > void free_bitmap_index(struct bitmap_index *); > > +/* > + * After a traversal has been performed on the bitmap_index, this can be > + * queried to see if a particular object was reachable from any of the > + * objects flagged as UNINTERESTING. If the traversal has not been performed, we pretend the object was not reachable? Is this a good API design, as it can be used when you do not have done all preparations? similarly to prepare_bitmap_walk we could have if (!bitmap_git->result) BUG("failed to perform bitmap walk before querying"); > +int bitmap_has_sha1_in_uninteresting(struct bitmap_index *, const unsigned > char *sha1); You seem to have rebased it to master resolving conflicts only. ;-) Do we want to talk about object ids here instead? (This is what I get to think about when reviewing this series "bottom up". I use "git log -w -p master..HEAD" after applying the patches, probably I should also use --reverse, such that I get to see the commit message before the code for each commit and yet only need to scroll in one direction.)
[PATCH 5/6] pack-bitmap: save "have" bitmap from walk
When we do a bitmap walk, we save the result, which represents (WANTs & ~HAVEs); i.e., every object we care about visiting in our walk. However, we throw away the haves bitmap, which can sometimes be useful, too. Save it and provide an access function so code which has performed a walk can query it. A few notes on the accessor interface: - the bitmap code calls these "haves" because it grew out of the want/have negotiation for fetches. But really, these are simply the objects that would be flagged UNINTERESTING in a regular traversal. Let's use that more universal nomenclature for the external module interface. We may want to change the internal naming inside the bitmap code, but that's outside the scope of this patch. - it still uses a bare "sha1" rather than "oid". That's true of all of the bitmap code. And in this particular instance, our caller in pack-objects is dealing with the bare sha1 that comes from a packed REF_DELTA (we're pointing directly to the mmap'd pack on disk). That's something we'll have to deal with as we transition to a new hash, but we can wait and see how the caller ends up being fixed and adjust this interface accordingly. Signed-off-by: Jeff King --- Funny story: the earlier version of this series called it bitmap_have(). That caused a bug later when somebody tried to build on it, thinking it was "does the bitmap have this object in the result". Oops. Hence the more descriptive name. pack-bitmap.c | 23 ++- pack-bitmap.h | 7 +++ 2 files changed, 29 insertions(+), 1 deletion(-) diff --git a/pack-bitmap.c b/pack-bitmap.c index f0a1937a1c..76fd93a3de 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -86,6 +86,9 @@ struct bitmap_index { /* Bitmap result of the last performed walk */ struct bitmap *result; + /* "have" bitmap from the last performed walk */ + struct bitmap *haves; + /* Version of the bitmap index */ unsigned int version; @@ -759,8 +762,8 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs) bitmap_and_not(wants_bitmap, haves_bitmap); bitmap_git->result = wants_bitmap; + bitmap_git->haves = haves_bitmap; - bitmap_free(haves_bitmap); return bitmap_git; cleanup: @@ -1114,5 +1117,23 @@ void free_bitmap_index(struct bitmap_index *b) free(b->ext_index.objects); free(b->ext_index.hashes); bitmap_free(b->result); + bitmap_free(b->haves); free(b); } + +int bitmap_has_sha1_in_uninteresting(struct bitmap_index *bitmap_git, +const unsigned char *sha1) +{ + int pos; + + if (!bitmap_git) + return 0; /* no bitmap loaded */ + if (!bitmap_git->haves) + return 0; /* walk had no "haves" */ + + pos = bitmap_position_packfile(bitmap_git, sha1); + if (pos < 0) + return 0; + + return bitmap_get(bitmap_git->haves, pos); +} diff --git a/pack-bitmap.h b/pack-bitmap.h index 4555907dee..02a60ce670 100644 --- a/pack-bitmap.h +++ b/pack-bitmap.h @@ -50,6 +50,13 @@ int rebuild_existing_bitmaps(struct bitmap_index *, struct packing_data *mapping khash_sha1 *reused_bitmaps, int show_progress); void free_bitmap_index(struct bitmap_index *); +/* + * After a traversal has been performed on the bitmap_index, this can be + * queried to see if a particular object was reachable from any of the + * objects flagged as UNINTERESTING. + */ +int bitmap_has_sha1_in_uninteresting(struct bitmap_index *, const unsigned char *sha1); + void bitmap_writer_show_progress(int show); void bitmap_writer_set_checksum(unsigned char *sha1); void bitmap_writer_build_type_index(struct packing_data *to_pack, -- 2.18.0.1205.g3878b1e64a
[PATCH 5/6] pack-bitmap: save have bitmap from walk
When we do a bitmap walk, we save the result, which represents (WANTs ~HAVEs); i.e., every object we care about visiting in our walk. However, we throw away the haves bitmap, which can sometimes be useful, too. Save it and provide an access function so code which has performed a walk can query it. Signed-off-by: Jeff King p...@peff.net --- pack-bitmap.c | 21 - pack-bitmap.h | 2 ++ 2 files changed, 22 insertions(+), 1 deletion(-) diff --git a/pack-bitmap.c b/pack-bitmap.c index a31e529..f7d417b 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -86,6 +86,9 @@ static struct bitmap_index { /* Bitmap result of the last performed walk */ struct bitmap *result; + /* have bitmap from the last performed walk */ + struct bitmap *haves; + /* Version of the bitmap index */ unsigned int version; @@ -769,8 +772,8 @@ int prepare_bitmap_walk(struct rev_info *revs) bitmap_and_not(wants_bitmap, haves_bitmap); bitmap_git.result = wants_bitmap; + bitmap_git.haves = haves_bitmap; - bitmap_free(haves_bitmap); return 0; } @@ -1097,3 +1100,19 @@ int rebuild_existing_bitmaps(struct packing_data *mapping, bitmap_free(rebuild); return 0; } + +int bitmap_have(const unsigned char *sha1) +{ + int pos; + + if (!bitmap_git.loaded) + return 0; /* no bitmap loaded */ + if (!bitmap_git.haves) + return 0; /* walk had no haves */ + + pos = bitmap_position_packfile(sha1); + if (pos 0) + return 0; + + return bitmap_get(bitmap_git.haves, pos); +} diff --git a/pack-bitmap.h b/pack-bitmap.h index 8b7f4e9..a63ee6b 100644 --- a/pack-bitmap.h +++ b/pack-bitmap.h @@ -49,6 +49,8 @@ int prepare_bitmap_walk(struct rev_info *revs); int reuse_partial_packfile_from_bitmap(struct packed_git **packfile, uint32_t *entries, off_t *up_to); int rebuild_existing_bitmaps(struct packing_data *mapping, khash_sha1 *reused_bitmaps, int show_progress); +int bitmap_have(const unsigned char *sha1); + void bitmap_writer_show_progress(int show); void bitmap_writer_set_checksum(unsigned char *sha1); void bitmap_writer_build_type_index(struct pack_idx_entry **index, uint32_t index_nr); -- 1.9.1.601.g7ec968e -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html