Re: [PATCH 1/2 v8] pack-objects: respect --local/--honor-pack-keep/--incremental when bitmap is in use

2016-09-13 Thread Kirill Smelkov
On Mon, Sep 12, 2016 at 11:23:18PM -0700, Junio C Hamano wrote:
> Kirill Smelkov  writes:
> 
> > +static int want_found_object(int exclude, struct packed_git *p)
> > +{
> > +   if (exclude)
> > +   return 1;
> > +   if (incremental)
> > +   return 0;
> > +
> > +   /*
> > +* When asked to do --local (do not include an object that appears in a
> > +* pack we borrow from elsewhere) or --honor-pack-keep (do not include
> > +* an object that appears in a pack marked with .keep), finding a pack
> > +* that matches the criteria is sufficient for us to decide to omit it.
> > +* However, even if this pack does not satisfy the criteria, we need to
> > +* make sure no copy of this object appears in _any_ pack that makes us
> > +* to omit the object, so we need to check all the packs.
> > +*
> > +* We can however first check whether these options can possible matter;
> > +* if they do not matter we know we want the object in generated pack.
> > +* Otherwise, we signal "-1" at the end to tell the caller that we do
> > +* not know either way, and it needs to check more packs.
> > +*/
> > +   if (!ignore_packed_keep &&
> > +   (!local || !have_non_local_packs))
> > +   return 1;
> > +
> > +   if (local && !p->pack_local)
> > +   return 0;
> > +   if (ignore_packed_keep && p->pack_local && p->pack_keep)
> > +   return 0;
> > +
> > +   /* we don't know yet; keep looking for more packs */
> > +   return -1;
> > +}
> 
> Moving this logic out to this helper made the main logic in the
> caller easier to grasp.
> 
> > @@ -958,15 +993,30 @@ static int want_object_in_pack(const unsigned char 
> > *sha1,
> >off_t *found_offset)
> >  {
> > struct packed_git *p;
> > +   int want;
> >  
> > if (!exclude && local && has_loose_object_nonlocal(sha1))
> > return 0;
> >  
> > +   /*
> > +* If we already know the pack object lives in, start checks from that
> > +* pack - in the usual case when neither --local was given nor .keep 
> > files
> > +* are present we will determine the answer right now.
> > +*/
> > +   if (*found_pack) {
> > +   want = want_found_object(exclude, *found_pack);
> > +   if (want != -1)
> > +   return want;
> > +   }
> >  
> > for (p = packed_git; p; p = p->next) {
> > +   off_t offset;
> > +
> > +   if (p == *found_pack)
> > +   offset = *found_offset;
> > +   else
> > +   offset = find_pack_entry_one(sha1, p);
> > +
> > if (offset) {
> > if (!*found_pack) {
> > if (!is_pack_valid(p))
> > @@ -974,31 +1024,9 @@ static int want_object_in_pack(const unsigned char 
> > *sha1,
> > *found_offset = offset;
> > *found_pack = p;
> > }
> > +   want = want_found_object(exclude, p);
> > +   if (want != -1)
> > +   return want;
> > }
> > }
> 
> As Peff noted in his earlier review, however, MRU code needed to be
> grafted in to the caller (an update to the MRU list was done in the
> code that was moved to the want_found_object() helper).  I think I
> did it correctly, which ended up looking like this:
> 
> want = want_found_object(exclude, p);
> if (!exclude && want > 0)
> mru_mark(packed_git_mru, entry);
> if (want != -1)
> return want;
> 
> I somewhat feel that it is ugly that the helper knows about exclude
> (i.e. in the original code, we immediately returned 1 without
> futzing with the MRU when we find an entry that is to be excluded,
> which now is done in the helper), and the caller also knows about
> exclude (i.e. the caller knows that the helper may return positive
> in two cases, it knows that MRU marking needs to happen only one of
> the two cases, and it also knows that "exclude" is what
> differentiates between the two cases) at the same time.
> 
> But probably the reason why I feel it ugly is only because I knew
> how the original looked like.  I dunno.

Junio, the code above is correct semantic merge of pack-mru and my
topic, because in pack-mru if found and exclude=1, 1 was returned
without marking found pack.

But I wonder: even if we exclude an object, we were still looking for it
in packs, and when we found it, we found the corresponding pack too. So,
that pack _was_ most-recently-used, and it is correct to mark it as MRU.

We can do the simplification in the follow-up patch after the merge, so
merge does not change semantics and it is all bisectable, etc.

Jeff?


Re: [PATCH 1/2 v8] pack-objects: respect --local/--honor-pack-keep/--incremental when bitmap is in use

2016-09-12 Thread Junio C Hamano
Kirill Smelkov  writes:

> +static int want_found_object(int exclude, struct packed_git *p)
> +{
> + if (exclude)
> + return 1;
> + if (incremental)
> + return 0;
> +
> + /*
> +  * When asked to do --local (do not include an object that appears in a
> +  * pack we borrow from elsewhere) or --honor-pack-keep (do not include
> +  * an object that appears in a pack marked with .keep), finding a pack
> +  * that matches the criteria is sufficient for us to decide to omit it.
> +  * However, even if this pack does not satisfy the criteria, we need to
> +  * make sure no copy of this object appears in _any_ pack that makes us
> +  * to omit the object, so we need to check all the packs.
> +  *
> +  * We can however first check whether these options can possible matter;
> +  * if they do not matter we know we want the object in generated pack.
> +  * Otherwise, we signal "-1" at the end to tell the caller that we do
> +  * not know either way, and it needs to check more packs.
> +  */
> + if (!ignore_packed_keep &&
> + (!local || !have_non_local_packs))
> + return 1;
> +
> + if (local && !p->pack_local)
> + return 0;
> + if (ignore_packed_keep && p->pack_local && p->pack_keep)
> + return 0;
> +
> + /* we don't know yet; keep looking for more packs */
> + return -1;
> +}

Moving this logic out to this helper made the main logic in the
caller easier to grasp.

> @@ -958,15 +993,30 @@ static int want_object_in_pack(const unsigned char 
> *sha1,
>  off_t *found_offset)
>  {
>   struct packed_git *p;
> + int want;
>  
>   if (!exclude && local && has_loose_object_nonlocal(sha1))
>   return 0;
>  
> + /*
> +  * If we already know the pack object lives in, start checks from that
> +  * pack - in the usual case when neither --local was given nor .keep 
> files
> +  * are present we will determine the answer right now.
> +  */
> + if (*found_pack) {
> + want = want_found_object(exclude, *found_pack);
> + if (want != -1)
> + return want;
> + }
>  
>   for (p = packed_git; p; p = p->next) {
> + off_t offset;
> +
> + if (p == *found_pack)
> + offset = *found_offset;
> + else
> + offset = find_pack_entry_one(sha1, p);
> +
>   if (offset) {
>   if (!*found_pack) {
>   if (!is_pack_valid(p))
> @@ -974,31 +1024,9 @@ static int want_object_in_pack(const unsigned char 
> *sha1,
>   *found_offset = offset;
>   *found_pack = p;
>   }
> + want = want_found_object(exclude, p);
> + if (want != -1)
> + return want;
>   }
>   }

As Peff noted in his earlier review, however, MRU code needed to be
grafted in to the caller (an update to the MRU list was done in the
code that was moved to the want_found_object() helper).  I think I
did it correctly, which ended up looking like this:

want = want_found_object(exclude, p);
if (!exclude && want > 0)
mru_mark(packed_git_mru, entry);
if (want != -1)
return want;

I somewhat feel that it is ugly that the helper knows about exclude
(i.e. in the original code, we immediately returned 1 without
futzing with the MRU when we find an entry that is to be excluded,
which now is done in the helper), and the caller also knows about
exclude (i.e. the caller knows that the helper may return positive
in two cases, it knows that MRU marking needs to happen only one of
the two cases, and it also knows that "exclude" is what
differentiates between the two cases) at the same time.

But probably the reason why I feel it ugly is only because I knew
how the original looked like.  I dunno.





[PATCH 1/2 v8] pack-objects: respect --local/--honor-pack-keep/--incremental when bitmap is in use

2016-09-10 Thread Kirill Smelkov
Since 6b8fda2d (pack-objects: use bitmaps when packing objects) there
are two codepaths in pack-objects: with & without using bitmap
reachability index.

However add_object_entry_from_bitmap(), despite its non-bitmapped
counterpart add_object_entry(), in no way does check for whether --local
or --honor-pack-keep or --incremental should be respected. In
non-bitmapped codepath this is handled in want_object_in_pack(), but
bitmapped codepath has simply no such checking at all.

The bitmapped codepath however was allowing to pass in all those options
and with bitmap indices still being used under such conditions -
potentially giving wrong output (e.g. including objects from non-local or
.keep'ed pack).

We can easily fix this by noting the following: when an object comes to
add_object_entry_from_bitmap() it can come for two reasons:

1. entries coming from main pack covered by bitmap index, and
2. object coming from, possibly alternate, loose or other packs.

"2" can be already handled by want_object_in_pack() and to cover
"1" we can teach want_object_in_pack() to expect that *found_pack can be
non-NULL, meaning calling client already found object's pack entry.

In want_object_in_pack() we care to start the checks from already found
pack, if we have one, this way determining the answer right away
in case neither --local nor --honour-pack-keep are active. In
particular, as p5310-pack-bitmaps.sh shows (3 consecutive runs), we do
not do harm to served-with-bitmap clones performance-wise:

Test  56dfeb62  this tree
-
5310.2: repack to disk9.08(8.20+0.25)   9.09(8.14+0.32) +0.1%
5310.3: simulated clone   1.92(2.12+0.08)   1.93(2.12+0.09) +0.5%
5310.4: simulated fetch   0.82(1.07+0.04)   0.82(1.06+0.04) +0.0%
5310.6: partial bitmap1.96(2.42+0.13)   1.95(2.40+0.15) -0.5%

Test  56dfeb62  this tree
-
5310.2: repack to disk9.11(8.16+0.32)   9.11(8.19+0.28) +0.0%
5310.3: simulated clone   1.93(2.14+0.07)   1.92(2.11+0.10) -0.5%
5310.4: simulated fetch   0.82(1.06+0.04)   0.82(1.04+0.05) +0.0%
5310.6: partial bitmap1.95(2.38+0.16)   1.94(2.39+0.14) -0.5%

Test  56dfeb62  this tree
-
5310.2: repack to disk9.13(8.17+0.31)   9.07(8.13+0.28) -0.7%
5310.3: simulated clone   1.92(2.13+0.07)   1.91(2.12+0.06) -0.5%
5310.4: simulated fetch   0.82(1.08+0.03)   0.82(1.08+0.03) +0.0%
5310.6: partial bitmap1.96(2.43+0.14)   1.96(2.42+0.14) +0.0%

with delta timings showing they are all within noise from run to run.

In the general case we do not want to call find_pack_entry_one() more than
once, because it is expensive. This patch splits the loop in
want_object_in_pack() into two parts: finding the object and seeing if it
impacts our choice to include it in the pack. We may call the inexpensive
want_found_object() twice, but we will never call find_pack_entry_one() if we
do not need to.

I appreciate help and discussing this change with Junio C Hamano and
Jeff King.

Signed-off-by: Kirill Smelkov 
Signed-off-by: Junio C Hamano 
---
 builtin/pack-objects.c  | 97 -
 t/t5310-pack-bitmaps.sh | 92 ++
 2 files changed, 156 insertions(+), 33 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index c4c2a3c..19668d3 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -944,13 +944,48 @@ static int have_duplicate_entry(const unsigned char *sha1,
return 1;
 }
 
+static int want_found_object(int exclude, struct packed_git *p)
+{
+   if (exclude)
+   return 1;
+   if (incremental)
+   return 0;
+
+   /*
+* When asked to do --local (do not include an object that appears in a
+* pack we borrow from elsewhere) or --honor-pack-keep (do not include
+* an object that appears in a pack marked with .keep), finding a pack
+* that matches the criteria is sufficient for us to decide to omit it.
+* However, even if this pack does not satisfy the criteria, we need to
+* make sure no copy of this object appears in _any_ pack that makes us
+* to omit the object, so we need to check all the packs.
+*
+* We can however first check whether these options can possible matter;
+* if they do not matter we know we want the object in generated pack.
+* Otherwise, we signal "-1" at the end to tell the caller that we do
+* not know either way, and it needs to check more packs.
+*/
+   if (!ignore_packed_keep &&
+   (!local || !have_non_local_packs))
+   return 1;
+
+   if (local && !p->pack_local)