Re: [PATCH 5/6] pack-bitmap: save "have" bitmap from walk

2018-08-31 Thread Jeff King
On Fri, Aug 31, 2018 at 05:23:17PM +0200, Ævar Arnfjörð Bjarmason wrote:

> On Tue, Aug 21 2018, Jeff King wrote:
> 
> > +int bitmap_has_sha1_in_uninteresting(struct bitmap_index *bitmap_git,
> > +const unsigned char *sha1)
> > +{
> > +   int pos;
> > +
> > +   if (!bitmap_git)
> > +   return 0; /* no bitmap loaded */
> > +   if (!bitmap_git->result)
> > +   BUG("failed to perform bitmap walk before querying");
> 
> Some part of what calls this completely breaks pushing from the "next"
> branch when you have local bitmaps (we *really* should have some tests
> for this...).

Yikes, thanks for reporting. I agree we need better tests here.

This assertion is totally bogus, as traverse_bitmap_commit_list() will
actually free (and NULL) the result field! This is a holdover from the
original bitmap code, where bitmap_git was a static, and this is how you
might prepare a second walk (though AFAIK only ever do one walk per
process). These days prepare_bitmap_walk() actually returns a fresh
bitmap_index struct.

So there's no way to know if traverse_bitmap_commit_list() was called.
But as it turns out, we don't care. We want to know if
"bitmap_git->have" was set up, which is done in prepare_bitmap_walk().
So there's no way[1] to get a bitmap_git that hasn't been properly set
up. This BUG() check can just go away.

I'll prepare a patch and some tests later tonight.

-Peff

[1] Actually, there is also prepare_bitmap_git(), but it is not really
for general use by callers. It should be made static, or better yet,
I suspect it can be folded into its callers.


Re: [PATCH 5/6] pack-bitmap: save "have" bitmap from walk

2018-08-31 Thread Ævar Arnfjörð Bjarmason


On Tue, Aug 21 2018, Jeff King wrote:

> +int bitmap_has_sha1_in_uninteresting(struct bitmap_index *bitmap_git,
> +  const unsigned char *sha1)
> +{
> + int pos;
> +
> + if (!bitmap_git)
> + return 0; /* no bitmap loaded */
> + if (!bitmap_git->result)
> + BUG("failed to perform bitmap walk before querying");

Some part of what calls this completely breaks pushing from the "next"
branch when you have local bitmaps (we *really* should have some tests
for this...).

I don't have time to dig now, but just on my local git.git on next @
b1634b371dc2e46f9b43c45fd1857c2e2688f96e:

u git (next $=) $ git repack -Ad --window=10 --depth=10 
--write-bitmap-index --pack-kept-objects
Enumerating objects: 616909, done.
Counting objects: 100% (616909/616909), done.
Delta compression using up to 8 threads
Compressing objects: 100% (146475/146475), done.
Writing objects: 100% (616909/616909), done.
Selecting bitmap commits: 168027, done. 

  BBuilding bitmaps: 100% 
(338/338), done.
Total 616909 (delta 497494), reused 582609 (delta 467530)
u git (next $=) $ ./git --exec-path=$PWD push avar next:avar/next-push
Enumerating objects: 1330, done.
BUG: pack-bitmap.c:1132: failed to perform bitmap walk before querying
error: pack-objects died of signal 6
error: pack-objects died of signal 6
error: remote unpack failed: eof before pack header was fully read
error: failed to push some refs to 'g...@github.com:avar/git.git'

Removing the bitmap makes it work again:

u git (next $=) $ rm 
.git/objects/pack/pack-496088d9464cd79dfcac50dd0d72dcd6faee8253.bitmap
rm: remove write-protected regular file 
'.git/objects/pack/pack-496088d9464cd79dfcac50dd0d72dcd6faee8253.bitmap'? y
u git (next $=) $ ./git --exec-path=$PWD push avar next:avar/next-push
Enumerating objects: 2834, done.
Counting objects: 100% (1799/1799), done.
Delta compression using up to 8 threads
Compressing objects: 100% (591/591), done.
Writing objects: 100% (1390/1390), 430.11 KiB | 15.36 MiB/s, done.
Total 1390 (delta 1100), reused 1072 (delta 798)
remote: Resolving deltas: 100% (1100/1100), completed with 214 local 
objects.
To github.com:avar/git.git
 * [new branch]next -> avar/next-push

Today I deployed next + my patches @ work. Broke pushes in one place
where repacks were being done manually with --write-bitmap-index.


Re: [PATCH 5/6] pack-bitmap: save "have" bitmap from walk

2018-08-21 Thread Jeff King
On Tue, Aug 21, 2018 at 03:47:36PM -0400, Derrick Stolee wrote:

> On 8/21/2018 3:07 PM, Jeff King wrote:
> > When we do a bitmap walk, we save the result, which
> > represents (WANTs & ~HAVEs); i.e., every object we care
> > about visiting in our walk. However, we throw away the
> > haves bitmap, which can sometimes be useful, too. Save it
> > and provide an access function so code which has performed a
> > walk can query it.
> 
> This makes a lot of sense. Based on the amount of time the "Counting
> Objects" blog post [1] spent talking about delta bases, I would have assumed
> this "haves" bitmap was already part of it. But, I can also see how it could
> be dropped if you are focusing on performance for 'git clone'.

I think that blog post was written after we were already using this
patch. ;)

> > A few notes on the accessor interface:
> > 
> >   - the bitmap code calls these "haves" because it grew out
> > of the want/have negotiation for fetches. But really,
> > these are simply the objects that would be flagged
> > UNINTERESTING in a regular traversal. Let's use that
> > more universal nomenclature for the external module
> > interface. We may want to change the internal naming
> > inside the bitmap code, but that's outside the scope of
> > this patch.
> 
> I saw the uninteresting-vs-haves name confusion in the patch below, but I
> agree with your logic here.
> 
> Sorry that I'm late to the party, but I was interested in the topic.

This has already missed the freeze for v2.19, so I think there is plenty
of time. More review would be welcome.

-Peff


Re: [PATCH 5/6] pack-bitmap: save "have" bitmap from walk

2018-08-21 Thread Derrick Stolee

On 8/21/2018 3:07 PM, Jeff King wrote:

When we do a bitmap walk, we save the result, which
represents (WANTs & ~HAVEs); i.e., every object we care
about visiting in our walk. However, we throw away the
haves bitmap, which can sometimes be useful, too. Save it
and provide an access function so code which has performed a
walk can query it.


This makes a lot of sense. Based on the amount of time the "Counting 
Objects" blog post [1] spent talking about delta bases, I would have 
assumed this "haves" bitmap was already part of it. But, I can also see 
how it could be dropped if you are focusing on performance for 'git clone'.




A few notes on the accessor interface:

  - the bitmap code calls these "haves" because it grew out
of the want/have negotiation for fetches. But really,
these are simply the objects that would be flagged
UNINTERESTING in a regular traversal. Let's use that
more universal nomenclature for the external module
interface. We may want to change the internal naming
inside the bitmap code, but that's outside the scope of
this patch.


I saw the uninteresting-vs-haves name confusion in the patch below, but 
I agree with your logic here.


Sorry that I'm late to the party, but I was interested in the topic.

Thanks,

-Stolee

[1] https://githubengineering.com/counting-objects/

    "Counting Objects" by Vincent Martí



[PATCH 5/6] pack-bitmap: save "have" bitmap from walk

2018-08-21 Thread Jeff King
When we do a bitmap walk, we save the result, which
represents (WANTs & ~HAVEs); i.e., every object we care
about visiting in our walk. However, we throw away the
haves bitmap, which can sometimes be useful, too. Save it
and provide an access function so code which has performed a
walk can query it.

A few notes on the accessor interface:

 - the bitmap code calls these "haves" because it grew out
   of the want/have negotiation for fetches. But really,
   these are simply the objects that would be flagged
   UNINTERESTING in a regular traversal. Let's use that
   more universal nomenclature for the external module
   interface. We may want to change the internal naming
   inside the bitmap code, but that's outside the scope of
   this patch.

 - it still uses a bare "sha1" rather than "oid". That's
   true of all of the bitmap code. And in this particular
   instance, our caller in pack-objects is dealing with the
   bare sha1 that comes from a packed REF_DELTA (we're
   pointing directly to the mmap'd pack on disk). That's
   something we'll have to deal with as we transition to a
   new hash, but we can wait and see how the caller ends up
   being fixed and adjust this interface accordingly.

Signed-off-by: Jeff King 
---
 pack-bitmap.c | 25 -
 pack-bitmap.h |  7 +++
 2 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index f0a1937a1c..c3231ef9ef 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -86,6 +86,9 @@ struct bitmap_index {
/* Bitmap result of the last performed walk */
struct bitmap *result;
 
+   /* "have" bitmap from the last performed walk */
+   struct bitmap *haves;
+
/* Version of the bitmap index */
unsigned int version;
 
@@ -759,8 +762,8 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info 
*revs)
bitmap_and_not(wants_bitmap, haves_bitmap);
 
bitmap_git->result = wants_bitmap;
+   bitmap_git->haves = haves_bitmap;
 
-   bitmap_free(haves_bitmap);
return bitmap_git;
 
 cleanup:
@@ -1114,5 +1117,25 @@ void free_bitmap_index(struct bitmap_index *b)
free(b->ext_index.objects);
free(b->ext_index.hashes);
bitmap_free(b->result);
+   bitmap_free(b->haves);
free(b);
 }
+
+int bitmap_has_sha1_in_uninteresting(struct bitmap_index *bitmap_git,
+const unsigned char *sha1)
+{
+   int pos;
+
+   if (!bitmap_git)
+   return 0; /* no bitmap loaded */
+   if (!bitmap_git->result)
+   BUG("failed to perform bitmap walk before querying");
+   if (!bitmap_git->haves)
+   return 0; /* walk had no "haves" */
+
+   pos = bitmap_position_packfile(bitmap_git, sha1);
+   if (pos < 0)
+   return 0;
+
+   return bitmap_get(bitmap_git->haves, pos);
+}
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 8a04741e12..c633bf5238 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -53,6 +53,13 @@ int rebuild_existing_bitmaps(struct bitmap_index *, struct 
packing_data *mapping
 khash_sha1 *reused_bitmaps, int show_progress);
 void free_bitmap_index(struct bitmap_index *);
 
+/*
+ * After a traversal has been performed on the bitmap_index, this can be
+ * queried to see if a particular object was reachable from any of the
+ * objects flagged as UNINTERESTING.
+ */
+int bitmap_has_sha1_in_uninteresting(struct bitmap_index *, const unsigned 
char *sha1);
+
 void bitmap_writer_show_progress(int show);
 void bitmap_writer_set_checksum(unsigned char *sha1);
 void bitmap_writer_build_type_index(struct packing_data *to_pack,
-- 
2.19.0.rc0.398.g138a08f6f6



Re: [PATCH 5/6] pack-bitmap: save "have" bitmap from walk

2018-08-17 Thread Jeff King
On Fri, Aug 17, 2018 at 03:39:29PM -0700, Stefan Beller wrote:

> > diff --git a/pack-bitmap.h b/pack-bitmap.h
> > index 4555907dee..02a60ce670 100644
> > --- a/pack-bitmap.h
> > +++ b/pack-bitmap.h
> > @@ -50,6 +50,13 @@ int rebuild_existing_bitmaps(struct bitmap_index *, 
> > struct packing_data *mapping
> >  khash_sha1 *reused_bitmaps, int show_progress);
> >  void free_bitmap_index(struct bitmap_index *);
> >
> > +/*
> > + * After a traversal has been performed on the bitmap_index, this can be
> > + * queried to see if a particular object was reachable from any of the
> > + * objects flagged as UNINTERESTING.
> 
> If the traversal has not been performed, we pretend the
> object was not reachable?

If the traversal hasn't been performed, the results are not defined
(though I suspect yeah, it happens to say "no").

> Is this a good API design, as it can be used when you do not
> have done all preparations? similarly to prepare_bitmap_walk
> we could have
> 
> if (!bitmap_git->result)
> BUG("failed to perform bitmap walk before querying");

That seems like a reasonable precaution.

> > +int bitmap_has_sha1_in_uninteresting(struct bitmap_index *, const unsigned 
> > char *sha1);
> 
> You seem to have rebased it to master resolving conflicts only. ;-)
> Do we want to talk about object ids here instead?

See the discussion in the commit message.

-Peff


Re: [PATCH 5/6] pack-bitmap: save "have" bitmap from walk

2018-08-17 Thread Stefan Beller
> diff --git a/pack-bitmap.h b/pack-bitmap.h
> index 4555907dee..02a60ce670 100644
> --- a/pack-bitmap.h
> +++ b/pack-bitmap.h
> @@ -50,6 +50,13 @@ int rebuild_existing_bitmaps(struct bitmap_index *, struct 
> packing_data *mapping
>  khash_sha1 *reused_bitmaps, int show_progress);
>  void free_bitmap_index(struct bitmap_index *);
>
> +/*
> + * After a traversal has been performed on the bitmap_index, this can be
> + * queried to see if a particular object was reachable from any of the
> + * objects flagged as UNINTERESTING.

If the traversal has not been performed, we pretend the
object was not reachable?

Is this a good API design, as it can be used when you do not
have done all preparations? similarly to prepare_bitmap_walk
we could have

if (!bitmap_git->result)
BUG("failed to perform bitmap walk before querying");

> +int bitmap_has_sha1_in_uninteresting(struct bitmap_index *, const unsigned 
> char *sha1);

You seem to have rebased it to master resolving conflicts only. ;-)
Do we want to talk about object ids here instead?

(This is what I get to think about when reviewing this series
"bottom up". I use "git log -w -p master..HEAD" after applying
the patches, probably I should also use --reverse, such that I
get to see the commit message before the code for each commit
and yet only need to scroll in one direction.)


[PATCH 5/6] pack-bitmap: save "have" bitmap from walk

2018-08-17 Thread Jeff King
When we do a bitmap walk, we save the result, which
represents (WANTs & ~HAVEs); i.e., every object we care
about visiting in our walk. However, we throw away the
haves bitmap, which can sometimes be useful, too. Save it
and provide an access function so code which has performed a
walk can query it.

A few notes on the accessor interface:

 - the bitmap code calls these "haves" because it grew out
   of the want/have negotiation for fetches. But really,
   these are simply the objects that would be flagged
   UNINTERESTING in a regular traversal. Let's use that
   more universal nomenclature for the external module
   interface. We may want to change the internal naming
   inside the bitmap code, but that's outside the scope of
   this patch.

 - it still uses a bare "sha1" rather than "oid". That's
   true of all of the bitmap code. And in this particular
   instance, our caller in pack-objects is dealing with the
   bare sha1 that comes from a packed REF_DELTA (we're
   pointing directly to the mmap'd pack on disk). That's
   something we'll have to deal with as we transition to a
   new hash, but we can wait and see how the caller ends up
   being fixed and adjust this interface accordingly.

Signed-off-by: Jeff King 
---
Funny story: the earlier version of this series called it bitmap_have().
That caused a bug later when somebody tried to build on it, thinking it
was "does the bitmap have this object in the result". Oops. Hence the
more descriptive name.

 pack-bitmap.c | 23 ++-
 pack-bitmap.h |  7 +++
 2 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index f0a1937a1c..76fd93a3de 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -86,6 +86,9 @@ struct bitmap_index {
/* Bitmap result of the last performed walk */
struct bitmap *result;
 
+   /* "have" bitmap from the last performed walk */
+   struct bitmap *haves;
+
/* Version of the bitmap index */
unsigned int version;
 
@@ -759,8 +762,8 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info 
*revs)
bitmap_and_not(wants_bitmap, haves_bitmap);
 
bitmap_git->result = wants_bitmap;
+   bitmap_git->haves = haves_bitmap;
 
-   bitmap_free(haves_bitmap);
return bitmap_git;
 
 cleanup:
@@ -1114,5 +1117,23 @@ void free_bitmap_index(struct bitmap_index *b)
free(b->ext_index.objects);
free(b->ext_index.hashes);
bitmap_free(b->result);
+   bitmap_free(b->haves);
free(b);
 }
+
+int bitmap_has_sha1_in_uninteresting(struct bitmap_index *bitmap_git,
+const unsigned char *sha1)
+{
+   int pos;
+
+   if (!bitmap_git)
+   return 0; /* no bitmap loaded */
+   if (!bitmap_git->haves)
+   return 0; /* walk had no "haves" */
+
+   pos = bitmap_position_packfile(bitmap_git, sha1);
+   if (pos < 0)
+   return 0;
+
+   return bitmap_get(bitmap_git->haves, pos);
+}
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 4555907dee..02a60ce670 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -50,6 +50,13 @@ int rebuild_existing_bitmaps(struct bitmap_index *, struct 
packing_data *mapping
 khash_sha1 *reused_bitmaps, int show_progress);
 void free_bitmap_index(struct bitmap_index *);
 
+/*
+ * After a traversal has been performed on the bitmap_index, this can be
+ * queried to see if a particular object was reachable from any of the
+ * objects flagged as UNINTERESTING.
+ */
+int bitmap_has_sha1_in_uninteresting(struct bitmap_index *, const unsigned 
char *sha1);
+
 void bitmap_writer_show_progress(int show);
 void bitmap_writer_set_checksum(unsigned char *sha1);
 void bitmap_writer_build_type_index(struct packing_data *to_pack,
-- 
2.18.0.1205.g3878b1e64a



[PATCH 5/6] pack-bitmap: save have bitmap from walk

2014-03-26 Thread Jeff King
When we do a bitmap walk, we save the result, which
represents (WANTs  ~HAVEs); i.e., every object we care
about visiting in our walk. However, we throw away the
haves bitmap, which can sometimes be useful, too. Save it
and provide an access function so code which has performed a
walk can query it.

Signed-off-by: Jeff King p...@peff.net
---
 pack-bitmap.c | 21 -
 pack-bitmap.h |  2 ++
 2 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index a31e529..f7d417b 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -86,6 +86,9 @@ static struct bitmap_index {
/* Bitmap result of the last performed walk */
struct bitmap *result;
 
+   /* have bitmap from the last performed walk */
+   struct bitmap *haves;
+
/* Version of the bitmap index */
unsigned int version;
 
@@ -769,8 +772,8 @@ int prepare_bitmap_walk(struct rev_info *revs)
bitmap_and_not(wants_bitmap, haves_bitmap);
 
bitmap_git.result = wants_bitmap;
+   bitmap_git.haves = haves_bitmap;
 
-   bitmap_free(haves_bitmap);
return 0;
 }
 
@@ -1097,3 +1100,19 @@ int rebuild_existing_bitmaps(struct packing_data 
*mapping,
bitmap_free(rebuild);
return 0;
 }
+
+int bitmap_have(const unsigned char *sha1)
+{
+   int pos;
+
+   if (!bitmap_git.loaded)
+   return 0; /* no bitmap loaded */
+   if (!bitmap_git.haves)
+   return 0; /* walk had no haves */
+
+   pos = bitmap_position_packfile(sha1);
+   if (pos  0)
+   return 0;
+
+   return bitmap_get(bitmap_git.haves, pos);
+}
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 8b7f4e9..a63ee6b 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -49,6 +49,8 @@ int prepare_bitmap_walk(struct rev_info *revs);
 int reuse_partial_packfile_from_bitmap(struct packed_git **packfile, uint32_t 
*entries, off_t *up_to);
 int rebuild_existing_bitmaps(struct packing_data *mapping, khash_sha1 
*reused_bitmaps, int show_progress);
 
+int bitmap_have(const unsigned char *sha1);
+
 void bitmap_writer_show_progress(int show);
 void bitmap_writer_set_checksum(unsigned char *sha1);
 void bitmap_writer_build_type_index(struct pack_idx_entry **index, uint32_t 
index_nr);
-- 
1.9.1.601.g7ec968e

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html