Re: [PATCH 1/2 v5] pack-objects: respect --local/--honor-pack-keep/--incremental when bitmap is in use
Kirill Smelkovwrites: > On Thu, Aug 18, 2016 at 01:52:22PM -0400, Jeff King wrote: > > > > Good to know there is no regression. It is curious that there is a > > slight _improvement_ across the board. Do we have an explanation for > > that? It seems odd that noise would be so consistent. > > Yes, I too thought it and it turned out to be t/perf/run does not copy > config.mak.autogen & friends to build/ and I'm using autoconf with > CFLAGS="-march=native -O3 ..." > > Junio, I could not resist to the following: > ... > With corrected t/perf/run the timings are more realistic - e.g. 3 > consecutive runs of `./run 56dfeb62 . ./p5310-pack-bitmaps.sh`: Wow, that's what I call an exchange with quality during a review ;-) Thanks for the curiosity and digging it to the root cause of the anomaly. Some GNUism/bashism in the way copying is spelled in the patch bothers me, but that is easily fixable. Thanks.
Re: [PATCH 1/2 v5] pack-objects: respect --local/--honor-pack-keep/--incremental when bitmap is in use
On Thu, Aug 18, 2016 at 01:52:22PM -0400, Jeff King wrote: > On Tue, Aug 09, 2016 at 10:31:43PM +0300, Kirill Smelkov wrote: > > > Since 6b8fda2d (pack-objects: use bitmaps when packing objects) there > > are two codepaths in pack-objects: with & without using bitmap > > reachability index. > > Sorry, I got distracted from reviewing these patches. I'll give them a > detailed look now and hopefully we can finalize the topic. Jeff, thanks for feedback. On my side I'm sorry for the delay because I was travelling and only recently got back to work. > > In want_object_in_pack() we care to start the checks from already found > > pack, if we have one, this way determining the answer right away > > in case neither --local nor --honour-pack-keep are active. In > > particular, as p5310-pack-bitmaps.sh shows, we do not do harm to > > served-with-bitmap clones performance-wise: > > > > Test 56dfeb62 this tree > > - > > 5310.2: repack to disk9.63(8.67+0.33) 9.47(8.55+0.28) -1.7% > > 5310.3: simulated clone 2.07(2.17+0.12) 2.03(2.14+0.12) -1.9% > > 5310.4: simulated fetch 0.78(1.03+0.02) 0.76(1.00+0.03) -2.6% > > 5310.6: partial bitmap1.97(2.43+0.15) 1.92(2.36+0.14) -2.5% > > > > with all differences strangely showing we are a bit faster now, but > > probably all being within noise. > > Good to know there is no regression. It is curious that there is a > slight _improvement_ across the board. Do we have an explanation for > that? It seems odd that noise would be so consistent. Yes, I too thought it and it turned out to be t/perf/run does not copy config.mak.autogen & friends to build/ and I'm using autoconf with CFLAGS="-march=native -O3 ..." Junio, I could not resist to the following: 8< From: Kirill SmelkovSubject: [PATCH] t/perf/run: Don't forget to copy config.mak.autogen & friends to build area Otherwise for people who use autotools-based configure in main worktree, the performance testing results will be inconsistent as work and build trees could be using e.g. different optimization levels. See e.g. http://public-inbox.org/git/20160818175222.bmm3ivjheokf2...@sigill.intra.peff.net/ for example. NOTE config.status has to be copied because otherwise without it the build would want to run reconfigure this way loosing just copied config.mak.autogen. Signed-off-by: Kirill Smelkov --- t/perf/run | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/t/perf/run b/t/perf/run index cfd7012..aa383c2 100755 --- a/t/perf/run +++ b/t/perf/run @@ -30,7 +30,7 @@ unpack_git_rev () { } build_git_rev () { rev=$1 - cp ../../config.mak build/$rev/config.mak + cp -t build/$rev ../../{config.mak,config.mak.autogen,config.status} (cd build/$rev && make $GIT_PERF_MAKE_OPTS) || die "failed to build revision '$mydir'" } -- 2.9.2.701.gf965a18.dirty 8< With corrected t/perf/run the timings are more realistic - e.g. 3 consecutive runs of `./run 56dfeb62 . ./p5310-pack-bitmaps.sh`: Test 56dfeb62 this tree - 5310.2: repack to disk9.08(8.20+0.25) 9.09(8.14+0.32) +0.1% 5310.3: simulated clone 1.92(2.12+0.08) 1.93(2.12+0.09) +0.5% 5310.4: simulated fetch 0.82(1.07+0.04) 0.82(1.06+0.04) +0.0% 5310.6: partial bitmap1.96(2.42+0.13) 1.95(2.40+0.15) -0.5% Test 56dfeb62 this tree - 5310.2: repack to disk9.11(8.16+0.32) 9.11(8.19+0.28) +0.0% 5310.3: simulated clone 1.93(2.14+0.07) 1.92(2.11+0.10) -0.5% 5310.4: simulated fetch 0.82(1.06+0.04) 0.82(1.04+0.05) +0.0% 5310.6: partial bitmap1.95(2.38+0.16) 1.94(2.39+0.14) -0.5% Test 56dfeb62 this tree - 5310.2: repack to disk9.13(8.17+0.31) 9.07(8.13+0.28) -0.7% 5310.3: simulated clone 1.92(2.13+0.07) 1.91(2.12+0.06) -0.5% 5310.4: simulated fetch 0.82(1.08+0.03) 0.82(1.08+0.03) +0.0% 5310.6: partial bitmap1.96(2.43+0.14) 1.96(2.42+0.14) +0.0% > > And in the general case we care not to have duplicate > > find_pack_entry_one(*found_pack) calls. Worst what can happen is we can > > call want_found_object(*found_pack) -- newly introduced helper for > > checking whether we want object -- twice, but since want_found_object() > > is very lightweight it does not make any difference. > > I had trouble parsing this. I think maybe: > > In the general case we do not want to call find_pack_entry_one() more > than once, because it is expensive. This patch splits the loop in > want_object_in_pack() into two parts: finding the object and seeing if > it impacts our choice to include it in the pack. We may
Re: [PATCH 1/2 v5] pack-objects: respect --local/--honor-pack-keep/--incremental when bitmap is in use
On Tue, Aug 09, 2016 at 10:31:43PM +0300, Kirill Smelkov wrote: > Since 6b8fda2d (pack-objects: use bitmaps when packing objects) there > are two codepaths in pack-objects: with & without using bitmap > reachability index. Sorry, I got distracted from reviewing these patches. I'll give them a detailed look now and hopefully we can finalize the topic. > In want_object_in_pack() we care to start the checks from already found > pack, if we have one, this way determining the answer right away > in case neither --local nor --honour-pack-keep are active. In > particular, as p5310-pack-bitmaps.sh shows, we do not do harm to > served-with-bitmap clones performance-wise: > > Test 56dfeb62 this tree > - > 5310.2: repack to disk9.63(8.67+0.33) 9.47(8.55+0.28) -1.7% > 5310.3: simulated clone 2.07(2.17+0.12) 2.03(2.14+0.12) -1.9% > 5310.4: simulated fetch 0.78(1.03+0.02) 0.76(1.00+0.03) -2.6% > 5310.6: partial bitmap1.97(2.43+0.15) 1.92(2.36+0.14) -2.5% > > with all differences strangely showing we are a bit faster now, but > probably all being within noise. Good to know there is no regression. It is curious that there is a slight _improvement_ across the board. Do we have an explanation for that? It seems odd that noise would be so consistent. > And in the general case we care not to have duplicate > find_pack_entry_one(*found_pack) calls. Worst what can happen is we can > call want_found_object(*found_pack) -- newly introduced helper for > checking whether we want object -- twice, but since want_found_object() > is very lightweight it does not make any difference. I had trouble parsing this. I think maybe: In the general case we do not want to call find_pack_entry_one() more than once, because it is expensive. This patch splits the loop in want_object_in_pack() into two parts: finding the object and seeing if it impacts our choice to include it in the pack. We may call the inexpensive want_found_object() twice, but we will never call find_pack_entry_one() if we do not need to. > +static int want_found_object(int exclude, struct packed_git *p) > +{ > + if (exclude) > + return 1; > + if (incremental) > + return 0; > + > + /* > + * When asked to do --local (do not include an object that appears in a > + * pack we borrow from elsewhere) or --honor-pack-keep (do not include > + * an object that appears in a pack marked with .keep), finding a pack > + * that matches the criteria is sufficient for us to decide to omit it. > + * However, even if this pack does not satisfy the criteria, we need to > + * make sure no copy of this object appears in _any_ pack that makes us > + * to omit the object, so we need to check all the packs. Signal that by > + * returning -1 to the caller. > + */ > + if (!ignore_packed_keep && > + (!local || !have_non_local_packs)) > + return 1; Hmm. The comment says "-1", but the return says "1". That is because the comment is describing the return that happens at the end. :) I wonder if the last sentence should be: We can check here whether these options can possibly matter; if not, we can return early from the function here. Otherwise, we signal "-1" at the end to tell the caller that we do not know either way, and it needs to check more packs. > - *found_pack = NULL; > - *found_offset = 0; > + /* > + * If we already know the pack object lives in, start checks from that > + * pack - in the usual case when neither --local was given nor .keep > files > + * are present we will determine the answer right now. > + */ > + if (*found_pack) { > + want = want_found_object(exclude, *found_pack); > + if (want != -1) > + return want; > + } Looks correct. Though it is not really "start checks from..." anymore, but rather "do a quick check to see if we can quit early, and otherwise start the loop". That might be nitpicking, though. > for (p = packed_git; p; p = p->next) { > - off_t offset = find_pack_entry_one(sha1, p); > + off_t offset; > + > + if (p == *found_pack) > + offset = *found_offset; > + else > + offset = find_pack_entry_one(sha1, p); > + This hunk will conflict with the MRU optimizations in 'next', but I think the resolution should be pretty trivial. > static int add_object_entry(const unsigned char *sha1, enum object_type type, > const char *name, int exclude) > { > - struct packed_git *found_pack; > - off_t found_offset; > + struct packed_git *found_pack = NULL; > + off_t found_offset = 0; I think technically we don't need to initialize found_offset here (it is considered only if *found_pack is not
[PATCH 1/2 v5] pack-objects: respect --local/--honor-pack-keep/--incremental when bitmap is in use
Since 6b8fda2d (pack-objects: use bitmaps when packing objects) there are two codepaths in pack-objects: with & without using bitmap reachability index. However add_object_entry_from_bitmap(), despite its non-bitmapped counterpart add_object_entry(), in no way does check for whether --local or --honor-pack-keep or --incremental should be respected. In non-bitmapped codepath this is handled in want_object_in_pack(), but bitmapped codepath has simply no such checking at all. The bitmapped codepath however was allowing to pass in all those options and with bitmap indices still being used under such conditions - potentially giving wrong output (e.g. including objects from non-local or .keep'ed pack). We can easily fix this by noting the following: when an object comes to add_object_entry_from_bitmap() it can come for two reasons: 1. entries coming from main pack covered by bitmap index, and 2. object coming from, possibly alternate, loose or other packs. "2" can be already handled by want_object_in_pack() and to cover "1" we can teach want_object_in_pack() to expect that *found_pack can be non-NULL, meaning calling client already found object's pack entry. In want_object_in_pack() we care to start the checks from already found pack, if we have one, this way determining the answer right away in case neither --local nor --honour-pack-keep are active. In particular, as p5310-pack-bitmaps.sh shows, we do not do harm to served-with-bitmap clones performance-wise: Test 56dfeb62 this tree - 5310.2: repack to disk9.63(8.67+0.33) 9.47(8.55+0.28) -1.7% 5310.3: simulated clone 2.07(2.17+0.12) 2.03(2.14+0.12) -1.9% 5310.4: simulated fetch 0.78(1.03+0.02) 0.76(1.00+0.03) -2.6% 5310.6: partial bitmap1.97(2.43+0.15) 1.92(2.36+0.14) -2.5% with all differences strangely showing we are a bit faster now, but probably all being within noise. And in the general case we care not to have duplicate find_pack_entry_one(*found_pack) calls. Worst what can happen is we can call want_found_object(*found_pack) -- newly introduced helper for checking whether we want object -- twice, but since want_found_object() is very lightweight it does not make any difference. I appreciate help and discussing this change with Junio C Hamano and Jeff King. Signed-off-by: Kirill Smelkov--- builtin/pack-objects.c | 93 +++-- t/t5310-pack-bitmaps.sh | 92 2 files changed, 152 insertions(+), 33 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index c4c2a3c..b1007f2 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -944,13 +944,44 @@ static int have_duplicate_entry(const unsigned char *sha1, return 1; } +static int want_found_object(int exclude, struct packed_git *p) +{ + if (exclude) + return 1; + if (incremental) + return 0; + + /* +* When asked to do --local (do not include an object that appears in a +* pack we borrow from elsewhere) or --honor-pack-keep (do not include +* an object that appears in a pack marked with .keep), finding a pack +* that matches the criteria is sufficient for us to decide to omit it. +* However, even if this pack does not satisfy the criteria, we need to +* make sure no copy of this object appears in _any_ pack that makes us +* to omit the object, so we need to check all the packs. Signal that by +* returning -1 to the caller. +*/ + if (!ignore_packed_keep && + (!local || !have_non_local_packs)) + return 1; + + if (local && !p->pack_local) + return 0; + if (ignore_packed_keep && p->pack_local && p->pack_keep) + return 0; + + /* we don't know yet; keep looking for more packs */ + return -1; +} + /* * Check whether we want the object in the pack (e.g., we do not want * objects found in non-local stores if the "--local" option was used). * - * As a side effect of this check, we will find the packed version of this - * object, if any. We therefore pass out the pack information to avoid having - * to look it up again later. + * If the caller already knows an existing pack it wants to take the object + * from, that is passed in *found_pack and *found_offset; otherwise this + * function finds if there is any pack that has the object and returns the pack + * and its offset in these variables. */ static int want_object_in_pack(const unsigned char *sha1, int exclude, @@ -958,15 +989,30 @@ static int want_object_in_pack(const unsigned char *sha1, off_t *found_offset) { struct packed_git *p; + int want; if (!exclude && local &&