Re: [PATCH 1/2 v8] pack-objects: respect --local/--honor-pack-keep/--incremental when bitmap is in use
On Mon, Sep 12, 2016 at 11:23:18PM -0700, Junio C Hamano wrote: > Kirill Smelkov writes: > > > +static int want_found_object(int exclude, struct packed_git *p) > > +{ > > + if (exclude) > > + return 1; > > + if (incremental) > > + return 0; > > + > > + /* > > +* When asked to do --local (do not include an object that appears in a > > +* pack we borrow from elsewhere) or --honor-pack-keep (do not include > > +* an object that appears in a pack marked with .keep), finding a pack > > +* that matches the criteria is sufficient for us to decide to omit it. > > +* However, even if this pack does not satisfy the criteria, we need to > > +* make sure no copy of this object appears in _any_ pack that makes us > > +* to omit the object, so we need to check all the packs. > > +* > > +* We can however first check whether these options can possible matter; > > +* if they do not matter we know we want the object in generated pack. > > +* Otherwise, we signal "-1" at the end to tell the caller that we do > > +* not know either way, and it needs to check more packs. > > +*/ > > + if (!ignore_packed_keep && > > + (!local || !have_non_local_packs)) > > + return 1; > > + > > + if (local && !p->pack_local) > > + return 0; > > + if (ignore_packed_keep && p->pack_local && p->pack_keep) > > + return 0; > > + > > + /* we don't know yet; keep looking for more packs */ > > + return -1; > > +} > > Moving this logic out to this helper made the main logic in the > caller easier to grasp. > > > @@ -958,15 +993,30 @@ static int want_object_in_pack(const unsigned char > > *sha1, > >off_t *found_offset) > > { > > struct packed_git *p; > > + int want; > > > > if (!exclude && local && has_loose_object_nonlocal(sha1)) > > return 0; > > > > + /* > > +* If we already know the pack object lives in, start checks from that > > +* pack - in the usual case when neither --local was given nor .keep > > files > > +* are present we will determine the answer right now. > > +*/ > > + if (*found_pack) { > > + want = want_found_object(exclude, *found_pack); > > + if (want != -1) > > + return want; > > + } > > > > for (p = packed_git; p; p = p->next) { > > + off_t offset; > > + > > + if (p == *found_pack) > > + offset = *found_offset; > > + else > > + offset = find_pack_entry_one(sha1, p); > > + > > if (offset) { > > if (!*found_pack) { > > if (!is_pack_valid(p)) > > @@ -974,31 +1024,9 @@ static int want_object_in_pack(const unsigned char > > *sha1, > > *found_offset = offset; > > *found_pack = p; > > } > > + want = want_found_object(exclude, p); > > + if (want != -1) > > + return want; > > } > > } > > As Peff noted in his earlier review, however, MRU code needed to be > grafted in to the caller (an update to the MRU list was done in the > code that was moved to the want_found_object() helper). I think I > did it correctly, which ended up looking like this: > > want = want_found_object(exclude, p); > if (!exclude && want > 0) > mru_mark(packed_git_mru, entry); > if (want != -1) > return want; > > I somewhat feel that it is ugly that the helper knows about exclude > (i.e. in the original code, we immediately returned 1 without > futzing with the MRU when we find an entry that is to be excluded, > which now is done in the helper), and the caller also knows about > exclude (i.e. the caller knows that the helper may return positive > in two cases, it knows that MRU marking needs to happen only one of > the two cases, and it also knows that "exclude" is what > differentiates between the two cases) at the same time. > > But probably the reason why I feel it ugly is only because I knew > how the original looked like. I dunno. Junio, the code above is correct semantic merge of pack-mru and my topic, because in pack-mru if found and exclude=1, 1 was returned without marking found pack. But I wonder: even if we exclude an object, we were still looking for it in packs, and when we found it, we found the corresponding pack too. So, that pack _was_ most-recently-used, and it is correct to mark it as MRU. We can do the simplification in the follow-up patch after the merge, so merge does not change semantics and it is all bisectable, etc. Jeff?
Re: [PATCH 1/2 v8] pack-objects: respect --local/--honor-pack-keep/--incremental when bitmap is in use
Kirill Smelkov writes: > +static int want_found_object(int exclude, struct packed_git *p) > +{ > + if (exclude) > + return 1; > + if (incremental) > + return 0; > + > + /* > + * When asked to do --local (do not include an object that appears in a > + * pack we borrow from elsewhere) or --honor-pack-keep (do not include > + * an object that appears in a pack marked with .keep), finding a pack > + * that matches the criteria is sufficient for us to decide to omit it. > + * However, even if this pack does not satisfy the criteria, we need to > + * make sure no copy of this object appears in _any_ pack that makes us > + * to omit the object, so we need to check all the packs. > + * > + * We can however first check whether these options can possible matter; > + * if they do not matter we know we want the object in generated pack. > + * Otherwise, we signal "-1" at the end to tell the caller that we do > + * not know either way, and it needs to check more packs. > + */ > + if (!ignore_packed_keep && > + (!local || !have_non_local_packs)) > + return 1; > + > + if (local && !p->pack_local) > + return 0; > + if (ignore_packed_keep && p->pack_local && p->pack_keep) > + return 0; > + > + /* we don't know yet; keep looking for more packs */ > + return -1; > +} Moving this logic out to this helper made the main logic in the caller easier to grasp. > @@ -958,15 +993,30 @@ static int want_object_in_pack(const unsigned char > *sha1, > off_t *found_offset) > { > struct packed_git *p; > + int want; > > if (!exclude && local && has_loose_object_nonlocal(sha1)) > return 0; > > + /* > + * If we already know the pack object lives in, start checks from that > + * pack - in the usual case when neither --local was given nor .keep > files > + * are present we will determine the answer right now. > + */ > + if (*found_pack) { > + want = want_found_object(exclude, *found_pack); > + if (want != -1) > + return want; > + } > > for (p = packed_git; p; p = p->next) { > + off_t offset; > + > + if (p == *found_pack) > + offset = *found_offset; > + else > + offset = find_pack_entry_one(sha1, p); > + > if (offset) { > if (!*found_pack) { > if (!is_pack_valid(p)) > @@ -974,31 +1024,9 @@ static int want_object_in_pack(const unsigned char > *sha1, > *found_offset = offset; > *found_pack = p; > } > + want = want_found_object(exclude, p); > + if (want != -1) > + return want; > } > } As Peff noted in his earlier review, however, MRU code needed to be grafted in to the caller (an update to the MRU list was done in the code that was moved to the want_found_object() helper). I think I did it correctly, which ended up looking like this: want = want_found_object(exclude, p); if (!exclude && want > 0) mru_mark(packed_git_mru, entry); if (want != -1) return want; I somewhat feel that it is ugly that the helper knows about exclude (i.e. in the original code, we immediately returned 1 without futzing with the MRU when we find an entry that is to be excluded, which now is done in the helper), and the caller also knows about exclude (i.e. the caller knows that the helper may return positive in two cases, it knows that MRU marking needs to happen only one of the two cases, and it also knows that "exclude" is what differentiates between the two cases) at the same time. But probably the reason why I feel it ugly is only because I knew how the original looked like. I dunno.
[PATCH 1/2 v8] pack-objects: respect --local/--honor-pack-keep/--incremental when bitmap is in use
Since 6b8fda2d (pack-objects: use bitmaps when packing objects) there are two codepaths in pack-objects: with & without using bitmap reachability index. However add_object_entry_from_bitmap(), despite its non-bitmapped counterpart add_object_entry(), in no way does check for whether --local or --honor-pack-keep or --incremental should be respected. In non-bitmapped codepath this is handled in want_object_in_pack(), but bitmapped codepath has simply no such checking at all. The bitmapped codepath however was allowing to pass in all those options and with bitmap indices still being used under such conditions - potentially giving wrong output (e.g. including objects from non-local or .keep'ed pack). We can easily fix this by noting the following: when an object comes to add_object_entry_from_bitmap() it can come for two reasons: 1. entries coming from main pack covered by bitmap index, and 2. object coming from, possibly alternate, loose or other packs. "2" can be already handled by want_object_in_pack() and to cover "1" we can teach want_object_in_pack() to expect that *found_pack can be non-NULL, meaning calling client already found object's pack entry. In want_object_in_pack() we care to start the checks from already found pack, if we have one, this way determining the answer right away in case neither --local nor --honour-pack-keep are active. In particular, as p5310-pack-bitmaps.sh shows (3 consecutive runs), we do not do harm to served-with-bitmap clones performance-wise: Test 56dfeb62 this tree - 5310.2: repack to disk9.08(8.20+0.25) 9.09(8.14+0.32) +0.1% 5310.3: simulated clone 1.92(2.12+0.08) 1.93(2.12+0.09) +0.5% 5310.4: simulated fetch 0.82(1.07+0.04) 0.82(1.06+0.04) +0.0% 5310.6: partial bitmap1.96(2.42+0.13) 1.95(2.40+0.15) -0.5% Test 56dfeb62 this tree - 5310.2: repack to disk9.11(8.16+0.32) 9.11(8.19+0.28) +0.0% 5310.3: simulated clone 1.93(2.14+0.07) 1.92(2.11+0.10) -0.5% 5310.4: simulated fetch 0.82(1.06+0.04) 0.82(1.04+0.05) +0.0% 5310.6: partial bitmap1.95(2.38+0.16) 1.94(2.39+0.14) -0.5% Test 56dfeb62 this tree - 5310.2: repack to disk9.13(8.17+0.31) 9.07(8.13+0.28) -0.7% 5310.3: simulated clone 1.92(2.13+0.07) 1.91(2.12+0.06) -0.5% 5310.4: simulated fetch 0.82(1.08+0.03) 0.82(1.08+0.03) +0.0% 5310.6: partial bitmap1.96(2.43+0.14) 1.96(2.42+0.14) +0.0% with delta timings showing they are all within noise from run to run. In the general case we do not want to call find_pack_entry_one() more than once, because it is expensive. This patch splits the loop in want_object_in_pack() into two parts: finding the object and seeing if it impacts our choice to include it in the pack. We may call the inexpensive want_found_object() twice, but we will never call find_pack_entry_one() if we do not need to. I appreciate help and discussing this change with Junio C Hamano and Jeff King. Signed-off-by: Kirill Smelkov Signed-off-by: Junio C Hamano --- builtin/pack-objects.c | 97 - t/t5310-pack-bitmaps.sh | 92 ++ 2 files changed, 156 insertions(+), 33 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index c4c2a3c..19668d3 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -944,13 +944,48 @@ static int have_duplicate_entry(const unsigned char *sha1, return 1; } +static int want_found_object(int exclude, struct packed_git *p) +{ + if (exclude) + return 1; + if (incremental) + return 0; + + /* +* When asked to do --local (do not include an object that appears in a +* pack we borrow from elsewhere) or --honor-pack-keep (do not include +* an object that appears in a pack marked with .keep), finding a pack +* that matches the criteria is sufficient for us to decide to omit it. +* However, even if this pack does not satisfy the criteria, we need to +* make sure no copy of this object appears in _any_ pack that makes us +* to omit the object, so we need to check all the packs. +* +* We can however first check whether these options can possible matter; +* if they do not matter we know we want the object in generated pack. +* Otherwise, we signal "-1" at the end to tell the caller that we do +* not know either way, and it needs to check more packs. +*/ + if (!ignore_packed_keep && + (!local || !have_non_local_packs)) + return 1; + + if (local && !p->pack_local)