Re: [PATCH v1] checkout: optionally speed up "git checkout -b foo"
On Thu, Jul 26, 2018 at 2:59 PM Eric Sunshine wrote: > On Thu, Jul 26, 2018 at 11:04 AM Junio C Hamano wrote: > > If there were a simple and futureproof way to tell the option > > parsing loop to notice any feature other than "-b newbranch" was > > used, then such a whitelisting may be a viable way [...] > > I'm wondering if a two-stage parse-options invocations could make this > potential maintenance problem more manageable by altogether > eliminating needs_working_tree_merge(). A downside of this approach is that it too becomes a nightmare if git-checkout grows additional special cases like the proposed "-b", in which case multi-stage or disjoint or intersecting parse-options invocations might arise. Another downside is that this parse-options idea is somewhat invasive, whereas, needs_working_tree_merge(), despite its ugliness, is at least is self-contained and not at all invasive.
Re: [PATCH v1] checkout: optionally speed up "git checkout -b foo"
On Thu, Jul 26, 2018 at 11:04 AM Junio C Hamano wrote: > Ben Peart writes: > > I'm not thrilled with the long list either (the plethora of comments > > probably makes it appear worse than it is) but I don't see how... > > If there were a simple and futureproof way to tell the option > parsing loop to notice any feature other than "-b newbranch" was > used, then such a whitelisting may be a viable way, but there is > not, and the whitelist condition can become (over time---we are > talking about futureproofing and not "a filter that happens to match > today's feature set") just as complex as this blacklisting function > is (e.g. feature A and feature B when used alone may be compatible > with the optimization but not when used both), so if we were to use > this optimization, I think this long list of special-case checks is > the best we could do. I'm wondering if a two-stage parse-options invocations could make this potential maintenance problem more manageable by altogether eliminating needs_working_tree_merge(). Something very roughly along the lines of: new_branch_and_passive_options = { OPT_STRING('b', NULL, ...), ...options which can't impact "optimization" decision... }; argc = parse_options(argc, argv, prefix, new_branch_and_passive_options, NULL, PARSE_OPT_KEEP_UNKNOWN | PARSE_OPT_KEEP_DASHDASH); can_optimize_new_branch = 1; for (i = 0; i < argc; i++) if (argv[i][0] == '-') { can_optimize_new_branch = 0; break; } options = { ...all other options... } argc = parse_options(argc, argv, prefix, options, checkout_usage, PARSE_OPT_KEEP_DASHDASH); ...as before... The can_optimize_new_branch check could, of course, be fooled by a non-option which starts with a "-", but that would err toward safety of not optimizing, so shouldn't be a worry.
Re: [PATCH v1] checkout: optionally speed up "git checkout -b foo"
Ben Peart writes: [jc: it was a bit surprising that Eric covered all the bits I covered while we were writing without peeking each other's ;-)] >> This long list of special-case checks doesn't leave me too enthused, >> however, that aside, this approach seems backward. Rather than erring >> on the side of safety by falling back to the merging behavior, it errs >> in the other direction, ... > > I'm not thrilled with the long list either (the plethora of comments > probably makes it appear worse than it is) but I don't see how... If there were a simple and futureproof way to tell the option parsing loop to notice any feature other than "-b newbranch" was used, then such a whitelisting may be a viable way, but there is not, and the whitelist condition can become (over time---we are talking about futureproofing and not "a filter that happens to match today's feature set") just as complex as this blacklisting function is (e.g. feature A and feature B when used alone may be compatible with the optimization but not when used both), so if we were to use this optimization, I think this long list of special-case checks is the best we could do. >> if (!skip_worktree_merge(...)) >> ret = merge_working_tree(...); >> > > I personally agree, it was moved to its current location per feedback > the first time around. Perhaps with the addition of the config > setting it will be better received moved out to the caller. Sounds sensible. I am still not enthused by the configuration variable that is underdocumented. You already said that we know that this optimization does a wrong thing when sparse checkout is used---any other cases? I do not think of any myself, and if that is true, I am wondering if it makes more sense to "do we have sparse configuration?" as part of the blacklist condition. That way, the users do not have to set anything and they will get an optimization benefit from an obvious change to skip "read-tree -m HEAD HEAD" that ought to be a no-op. Thanks.
Re: [PATCH v1] checkout: optionally speed up "git checkout -b foo"
On 7/24/2018 3:21 PM, Junio C Hamano wrote: Ben Peart writes: From: Ben Peart If the new core.optimizecheckout config setting is set to true, speed up "git checkout -b foo" by avoiding the work to merge the working tree. This is valid because no merge needs to occur - only creating the new branch/ updating the refs. Any other options force it through the old code path. This change in behavior is off by default and behind the config setting so that users have to opt-in to the optimized behavior. We've been running with this patch internally for a long time but it was rejected when I submitted it to the mailing list before because it implicitly changes the behavior of checkout -b. Trying it again configured behind a config setting as a potential solution for other optimizations to checkout that could change the behavior as well. https://public-inbox.org/git/20180724042740.gb13...@sigill.intra.peff.net/T/#m75afe3ab318d23f36334cf3a6e3d058839592469 An incorrect link? It does not look like a thread that explains what was previously submitted but failed. The last paragraph looks like a fine material below the three-dash line. See my earlier reply about this section in: https://public-inbox.org/git/xmqqh8koxwwi@gitster-ct.c.googlers.com/T/#mb31136a09dbc1a963a5a62e840b118ac33043edf Signed-off-by: Ben Peart --- Notes: Base Ref: master Web-Diff: https://github.com/benpeart/git/commit/f43d934ce7 Checkout: git fetch https://github.com/benpeart/git checkout-b-v1 && git checkout f43d934ce7 Documentation/config.txt | 6 +++ builtin/checkout.c | 94 cache.h | 1 + config.c | 5 +++ environment.c| 1 + 5 files changed, 107 insertions(+) diff --git a/Documentation/config.txt b/Documentation/config.txt index a32172a43c..2c4f513bf1 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -911,6 +911,12 @@ core.commitGraph:: Enable git commit graph feature. Allows reading from the commit-graph file. +core.optimizedCheckout + Speed up "git checkout -b foo" by skipping much of the work of a + full checkout command. This changs the behavior as it will skip + merging the trees and updating the index and instead only create + and switch to the new ref. By the way, why is it a core.* thing, not checkout.* thing? I followed the naming convention used by core.sparsecheckout but I'm happy to call it whatever people want. If a new feature is not necessarily recommendable for normal users and it needs to be hidden behind an opt-in knob (I do not have a strong opinion if that is or is not the case for this particular feature at this point), the documentation for the knob should give a bit more than "This chang(e)s the behavior" to the readers, I would think, to be intellectually honest ;-). Let's tell them what bad things happen if we pretend that we switched the branch without twoway merge and the index update to help them make an informed decision. I attempted to explain what the change in behavior was in the same sentence by saying what it skips and what it still does. If that isn't intellectually honest, help me know how to better explain it so that it is. Is this better? Speed up "git checkout -b " by skipping the twoway merge and index update. Instead, just create a new branch named and switch to it. The working directory and index are left unchanged. +static int needs_working_tree_merge(const struct checkout_opts *opts, + const struct branch_info *old_branch_info, + const struct branch_info *new_branch_info) +{ + /* +* We must do the merge if we are actually moving to a new +* commit tree. What's a "commit tree"? Shouldn't it be just a "commit"? Sure +*/ + if (!old_branch_info->commit || !new_branch_info->commit || + oidcmp(&old_branch_info->commit->object.oid, &new_branch_info->commit->object.oid)) + return 1; + + /* +* opts->patch_mode cannot be used with switching branches so is +* not tested here +*/ + + /* +* opts->quiet only impacts output so doesn't require a merge +*/ + + /* +* Honor the explicit request for a three-way merge or to throw away +* local changes +*/ + if (opts->merge || opts->force) + return 1; + + /* +* --detach is documented as "updating the index and the files in the +* working tree" but this optimization skips those steps so fall through +* to the regular code path. +*/ + if (opts->force_detach) + return 1; + + /* +* opts->writeout_stage cannot be used with switching branches so is +* not tested here +*/ + + /* +* Honor the explicit ignore requests +*/ + if (!opts->over
Re: [PATCH v1] checkout: optionally speed up "git checkout -b foo"
On 7/24/2018 2:42 PM, Eric Sunshine wrote: On Tue, Jul 24, 2018 at 2:01 PM Ben Peart wrote: If the new core.optimizecheckout config setting is set to true, speed up Maybe: Add core.optimizeCheckout config setting which, when true, speeds up Sure "git checkout -b foo" by avoiding the work to merge the working tree. This is valid because no merge needs to occur - only creating the new branch/ updating the refs. Any other options force it through the old code path. This change in behavior is off by default and behind the config setting so that users have to opt-in to the optimized behavior. We've been running with this patch internally for a long time but it was rejected when I submitted it to the mailing list before because it implicitly changes the behavior of checkout -b. Trying it again configured behind a config setting as a potential solution for other optimizations to checkout that could change the behavior as well. This paragraph is mere commentary which probably belongs below the "---" line following your sign-off. Hopefully this commentary (I'll move it below the --- line) is clearer: We've been running with this patch internally for a long time but it was rejected when I submitted it to the mailing list before [1] because it implicitly changes the behavior of checkout -b as it no longer updates the working directory. I'm submitting it again behind a config setting so that it doesn't cause any back compat issues unless the user explicitly opts in to the new behavior. My hope is this same setting and model can be used if/when we make other performance optimizations to checkout like using the cache tree to avoid having to traverse the entire tree being discussed [2]. [1] https://public-inbox.org/git/20160909192520.4812-1-benpe...@microsoft.com/ [2] https://public-inbox.org/git/20180724042740.gb13...@sigill.intra.peff.net/T/#m75afe3ab318d23f36334cf3a6e3d058839592469 https://public-inbox.org/git/20180724042740.gb13...@sigill.intra.peff.net/T/#m75afe3ab318d23f36334cf3a6e3d058839592469 Is this link meant to reference the previous attempt of optimizing "checkout -b"? Although there's a single mention of "checkout -b" in that discussion, it doesn't seem to be the previous attempt or explain why it was rejected. It would be quite nice to see a discussion in both the commit message and the documentation about the pros and cons of enabling this optimization. That it was previously rejected suggests that there may be serious or unexpected consequences. How will a typical user know whether its use is desirable or not? Signed-off-by: Ben Peart --- diff --git a/Documentation/config.txt b/Documentation/config.txt @@ -911,6 +911,12 @@ core.commitGraph:: +core.optimizedCheckout + Speed up "git checkout -b foo" by skipping much of the work of a + full checkout command. This changs the behavior as it will skip s/changs/changes/ + merging the trees and updating the index and instead only create + and switch to the new ref. diff --git a/builtin/checkout.c b/builtin/checkout.c @@ -471,6 +475,88 @@ static void setup_branch_path(struct branch_info *branch) +static int needs_working_tree_merge(const struct checkout_opts *opts, + const struct branch_info *old_branch_info, + const struct branch_info *new_branch_info) +{ + /* +* We must do the merge if we are actually moving to a new +* commit tree. +*/ + if (!old_branch_info->commit || !new_branch_info->commit || + oidcmp(&old_branch_info->commit->object.oid, &new_branch_info->commit->object.oid)) + return 1; + [...] + return 0; +} This long list of special-case checks doesn't leave me too enthused, however, that aside, this approach seems backward. Rather than erring on the side of safety by falling back to the merging behavior, it errs in the other direction, which may be a problem if this list of special-case checks ever gets out of sync with 'checkout_opts'. That is, if someone adds a new option which ought to employ the merging behavior, but forgets to update this function, then this function will incorrectly default to using the optimization. I'm not thrilled with the long list either (the plethora of comments probably makes it appear worse than it is) but I don't see how flipping the logic around makes it fail if someone adds a new option. The "meets all criteria for optimization" code can only test existing options. A safer approach would be the inverse, namely: static int skip_worktree_merge(...) { if (...meets all criteria for optimization...) return 1; return 0; } static int merge_working_tree(const struct checkout_opts *opts, struct branch_info *old_branch_info, struct branch_info *new_branch_info, { + /* +* Skip merging the trees, updating the index, and work tree only i
Re: [PATCH v1] checkout: optionally speed up "git checkout -b foo"
Ben Peart writes: > From: Ben Peart > > If the new core.optimizecheckout config setting is set to true, speed up > "git checkout -b foo" by avoiding the work to merge the working tree. This > is valid because no merge needs to occur - only creating the new branch/ > updating the refs. Any other options force it through the old code path. > > This change in behavior is off by default and behind the config setting so > that users have to opt-in to the optimized behavior. > We've been running with this patch internally for a long time but it was > rejected when I submitted it to the mailing list before because it > implicitly changes the behavior of checkout -b. Trying it again configured > behind a config setting as a potential solution for other optimizations to > checkout that could change the behavior as well. > > https://public-inbox.org/git/20180724042740.gb13...@sigill.intra.peff.net/T/#m75afe3ab318d23f36334cf3a6e3d058839592469 An incorrect link? It does not look like a thread that explains what was previously submitted but failed. The last paragraph looks like a fine material below the three-dash line. > Signed-off-by: Ben Peart > --- > > Notes: > Base Ref: master > Web-Diff: https://github.com/benpeart/git/commit/f43d934ce7 > Checkout: git fetch https://github.com/benpeart/git checkout-b-v1 && git > checkout f43d934ce7 > > Documentation/config.txt | 6 +++ > builtin/checkout.c | 94 > cache.h | 1 + > config.c | 5 +++ > environment.c| 1 + > 5 files changed, 107 insertions(+) > > diff --git a/Documentation/config.txt b/Documentation/config.txt > index a32172a43c..2c4f513bf1 100644 > --- a/Documentation/config.txt > +++ b/Documentation/config.txt > @@ -911,6 +911,12 @@ core.commitGraph:: > Enable git commit graph feature. Allows reading from the > commit-graph file. > > +core.optimizedCheckout > + Speed up "git checkout -b foo" by skipping much of the work of a > + full checkout command. This changs the behavior as it will skip > + merging the trees and updating the index and instead only create > + and switch to the new ref. By the way, why is it a core.* thing, not checkout.* thing? If a new feature is not necessarily recommendable for normal users and it needs to be hidden behind an opt-in knob (I do not have a strong opinion if that is or is not the case for this particular feature at this point), the documentation for the knob should give a bit more than "This chang(e)s the behavior" to the readers, I would think, to be intellectually honest ;-). Let's tell them what bad things happen if we pretend that we switched the branch without twoway merge and the index update to help them make an informed decision. > +static int needs_working_tree_merge(const struct checkout_opts *opts, > + const struct branch_info *old_branch_info, > + const struct branch_info *new_branch_info) > +{ > + /* > + * We must do the merge if we are actually moving to a new > + * commit tree. What's a "commit tree"? Shouldn't it be just a "commit"? > + */ > + if (!old_branch_info->commit || !new_branch_info->commit || > + oidcmp(&old_branch_info->commit->object.oid, > &new_branch_info->commit->object.oid)) > + return 1; > + > + /* > + * opts->patch_mode cannot be used with switching branches so is > + * not tested here > + */ > + > + /* > + * opts->quiet only impacts output so doesn't require a merge > + */ > + > + /* > + * Honor the explicit request for a three-way merge or to throw away > + * local changes > + */ > + if (opts->merge || opts->force) > + return 1; > + > + /* > + * --detach is documented as "updating the index and the files in the > + * working tree" but this optimization skips those steps so fall through > + * to the regular code path. > + */ > + if (opts->force_detach) > + return 1; > + > + /* > + * opts->writeout_stage cannot be used with switching branches so is > + * not tested here > + */ > + > + /* > + * Honor the explicit ignore requests > + */ > + if (!opts->overwrite_ignore || opts->ignore_skipworktree || > + opts->ignore_other_worktrees) > + return 1; > + > + /* > + * opts->show_progress only impacts output so doesn't require a merge > + */ > + > + /* > + * If we aren't creating a new branch any changes or updates will > + * happen in the existing branch. Since that could only be updating > + * the index and working directory, we don't want to skip those steps > + * or we've defeated any purpose in running the command. > + */ > + if (!opts->new_branch) > + return 1; > + > + /* > + * new_branch_force is defined to "create/reset and checkout a branch" > +
Re: [PATCH v1] checkout: optionally speed up "git checkout -b foo"
On Tue, Jul 24, 2018 at 2:01 PM Ben Peart wrote: > If the new core.optimizecheckout config setting is set to true, speed up Maybe: Add core.optimizeCheckout config setting which, when true, speeds up > "git checkout -b foo" by avoiding the work to merge the working tree. This > is valid because no merge needs to occur - only creating the new branch/ > updating the refs. Any other options force it through the old code path. > > This change in behavior is off by default and behind the config setting so > that users have to opt-in to the optimized behavior. > > We've been running with this patch internally for a long time but it was > rejected when I submitted it to the mailing list before because it > implicitly changes the behavior of checkout -b. Trying it again configured > behind a config setting as a potential solution for other optimizations to > checkout that could change the behavior as well. This paragraph is mere commentary which probably belongs below the "---" line following your sign-off. > https://public-inbox.org/git/20180724042740.gb13...@sigill.intra.peff.net/T/#m75afe3ab318d23f36334cf3a6e3d058839592469 Is this link meant to reference the previous attempt of optimizing "checkout -b"? Although there's a single mention of "checkout -b" in that discussion, it doesn't seem to be the previous attempt or explain why it was rejected. It would be quite nice to see a discussion in both the commit message and the documentation about the pros and cons of enabling this optimization. That it was previously rejected suggests that there may be serious or unexpected consequences. How will a typical user know whether its use is desirable or not? > Signed-off-by: Ben Peart > --- > diff --git a/Documentation/config.txt b/Documentation/config.txt > @@ -911,6 +911,12 @@ core.commitGraph:: > +core.optimizedCheckout > + Speed up "git checkout -b foo" by skipping much of the work of a > + full checkout command. This changs the behavior as it will skip s/changs/changes/ > + merging the trees and updating the index and instead only create > + and switch to the new ref. > diff --git a/builtin/checkout.c b/builtin/checkout.c > @@ -471,6 +475,88 @@ static void setup_branch_path(struct branch_info *branch) > +static int needs_working_tree_merge(const struct checkout_opts *opts, > + const struct branch_info *old_branch_info, > + const struct branch_info *new_branch_info) > +{ > + /* > +* We must do the merge if we are actually moving to a new > +* commit tree. > +*/ > + if (!old_branch_info->commit || !new_branch_info->commit || > + oidcmp(&old_branch_info->commit->object.oid, > &new_branch_info->commit->object.oid)) > + return 1; > + [...] > + return 0; > +} This long list of special-case checks doesn't leave me too enthused, however, that aside, this approach seems backward. Rather than erring on the side of safety by falling back to the merging behavior, it errs in the other direction, which may be a problem if this list of special-case checks ever gets out of sync with 'checkout_opts'. That is, if someone adds a new option which ought to employ the merging behavior, but forgets to update this function, then this function will incorrectly default to using the optimization. A safer approach would be the inverse, namely: static int skip_worktree_merge(...) { if (...meets all criteria for optimization...) return 1; return 0; } > static int merge_working_tree(const struct checkout_opts *opts, > struct branch_info *old_branch_info, > struct branch_info *new_branch_info, > { > + /* > +* Skip merging the trees, updating the index, and work tree only if > we > +* are simply creating a new branch via "git checkout -b foo." Any > +* other options or usage will continue to do all these steps. > +*/ > + if (core_optimize_checkout && !needs_working_tree_merge(opts, > old_branch_info, new_branch_info)) > + return 0; This seems a somewhat odd place to hook in this optimization, especially as there is only a single caller of this function. Instead, one might expect the caller itself to make this judgment and avoid trying the merge in the first place if not needed. That is, in switch_branches: if (!skip_worktree_merge(...)) ret = merge_working_tree(...);