subject:"Re\: \[PATCH v1\] checkout\: optionally speed up \"git checkout \-b foo\""

Re: [PATCH v1] checkout: optionally speed up "git checkout -b foo"

2018-07-26 Thread Eric Sunshine

On Thu, Jul 26, 2018 at 2:59 PM Eric Sunshine  wrote:
> On Thu, Jul 26, 2018 at 11:04 AM Junio C Hamano  wrote:
> > If there were a simple and futureproof way to tell the option
> > parsing loop to notice any feature other than "-b newbranch" was
> > used, then such a whitelisting may be a viable way [...]
>
> I'm wondering if a two-stage parse-options invocations could make this
> potential maintenance problem more manageable by altogether
> eliminating needs_working_tree_merge().

A downside of this approach is that it too becomes a nightmare if
git-checkout grows additional special cases like the proposed "-b", in
which case multi-stage or disjoint or intersecting parse-options
invocations might arise. Another downside is that this parse-options
idea is somewhat invasive, whereas, needs_working_tree_merge(),
despite its ugliness, is at least is self-contained and not at all
invasive.

Re: [PATCH v1] checkout: optionally speed up "git checkout -b foo"

2018-07-26 Thread Eric Sunshine

On Thu, Jul 26, 2018 at 11:04 AM Junio C Hamano  wrote:
> Ben Peart  writes:
> > I'm not thrilled with the long list either (the plethora of comments
> > probably makes it appear worse than it is) but I don't see how...
>
> If there were a simple and futureproof way to tell the option
> parsing loop to notice any feature other than "-b newbranch" was
> used, then such a whitelisting may be a viable way, but there is
> not, and the whitelist condition can become (over time---we are
> talking about futureproofing and not "a filter that happens to match
> today's feature set") just as complex as this blacklisting function
> is (e.g. feature A and feature B when used alone may be compatible
> with the optimization but not when used both), so if we were to use
> this optimization, I think this long list of special-case checks is
> the best we could do.

I'm wondering if a two-stage parse-options invocations could make this
potential maintenance problem more manageable by altogether
eliminating needs_working_tree_merge(). Something very roughly along
the lines of:

new_branch_and_passive_options = {
OPT_STRING('b', NULL, ...),
...options which can't impact "optimization" decision...
};
argc = parse_options(argc, argv, prefix,
new_branch_and_passive_options, NULL,
PARSE_OPT_KEEP_UNKNOWN | PARSE_OPT_KEEP_DASHDASH);

can_optimize_new_branch = 1;
for (i = 0; i < argc; i++)
if (argv[i][0] == '-') {
can_optimize_new_branch = 0;
break;
}

options = {
...all other options...
}
argc = parse_options(argc, argv, prefix, options,
checkout_usage, PARSE_OPT_KEEP_DASHDASH);

...as before...

The can_optimize_new_branch check could, of course, be fooled by a
non-option which starts with a "-", but that would err toward safety
of not optimizing, so shouldn't be a worry.

Re: [PATCH v1] checkout: optionally speed up "git checkout -b foo"

2018-07-26 Thread Junio C Hamano

Ben Peart  writes:

[jc: it was a bit surprising that Eric covered all the bits I
covered while we were writing without peeking each other's ;-)]

>> This long list of special-case checks doesn't leave me too enthused,
>> however, that aside, this approach seems backward. Rather than erring
>> on the side of safety by falling back to the merging behavior, it errs
>> in the other direction, ...
>
> I'm not thrilled with the long list either (the plethora of comments
> probably makes it appear worse than it is) but I don't see how...

If there were a simple and futureproof way to tell the option
parsing loop to notice any feature other than "-b newbranch" was
used, then such a whitelisting may be a viable way, but there is
not, and the whitelist condition can become (over time---we are
talking about futureproofing and not "a filter that happens to match
today's feature set") just as complex as this blacklisting function
is (e.g. feature A and feature B when used alone may be compatible
with the optimization but not when used both), so if we were to use
this optimization, I think this long list of special-case checks is
the best we could do.

>>  if (!skip_worktree_merge(...))
>>  ret = merge_working_tree(...);
>>
>
> I personally agree, it was moved to its current location per feedback
> the first time around.  Perhaps with the addition of the config
> setting it will be better received moved out to the caller.

Sounds sensible.  I am still not enthused by the configuration
variable that is underdocumented.  You already said that we know
that this optimization does a wrong thing when sparse checkout is
used---any other cases?  I do not think of any myself, and if that
is true, I am wondering if it makes more sense to "do we have sparse
configuration?" as part of the blacklist condition.  That way, the
users do not have to set anything and they will get an optimization
benefit from an obvious change to skip "read-tree -m HEAD HEAD" that
ought to be a no-op.

Thanks.

Re: [PATCH v1] checkout: optionally speed up "git checkout -b foo"

2018-07-24 Thread Ben Peart





On 7/24/2018 3:21 PM, Junio C Hamano wrote:

Ben Peart  writes:


From: Ben Peart 

If the new core.optimizecheckout config setting is set to true, speed up
"git checkout -b foo" by avoiding the work to merge the working tree.  This
is valid because no merge needs to occur - only creating the new branch/
updating the refs. Any other options force it through the old code path.

This change in behavior is off by default and behind the config setting so
that users have to opt-in to the optimized behavior.






We've been running with this patch internally for a long time but it was
rejected when I submitted it to the mailing list before because it
implicitly changes the behavior of checkout -b. Trying it again configured
behind a config setting as a potential solution for other optimizations to
checkout that could change the behavior as well.

https://public-inbox.org/git/20180724042740.gb13...@sigill.intra.peff.net/T/#m75afe3ab318d23f36334cf3a6e3d058839592469


An incorrect link?  It does not look like a thread that explains
what was previously submitted but failed.  The last paragraph looks
like a fine material below the three-dash line.



See my earlier reply about this section in:

https://public-inbox.org/git/xmqqh8koxwwi@gitster-ct.c.googlers.com/T/#mb31136a09dbc1a963a5a62e840b118ac33043edf




Signed-off-by: Ben Peart 
---

Notes:
 Base Ref: master
 Web-Diff: https://github.com/benpeart/git/commit/f43d934ce7
 Checkout: git fetch https://github.com/benpeart/git checkout-b-v1 && git 
checkout f43d934ce7

  Documentation/config.txt |  6 +++
  builtin/checkout.c   | 94 
  cache.h  |  1 +
  config.c |  5 +++
  environment.c|  1 +
  5 files changed, 107 insertions(+)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index a32172a43c..2c4f513bf1 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -911,6 +911,12 @@ core.commitGraph::
Enable git commit graph feature. Allows reading from the
commit-graph file.
  
+core.optimizedCheckout

+   Speed up "git checkout -b foo" by skipping much of the work of a
+   full checkout command.  This changs the behavior as it will skip
+   merging the trees and updating the index and instead only create
+   and switch to the new ref.


By the way, why is it a core.* thing, not checkout.* thing?



I followed the naming convention used by core.sparsecheckout but I'm 
happy to call it whatever people want.



If a new feature is not necessarily recommendable for normal users
and it needs to be hidden behind an opt-in knob (I do not have a
strong opinion if that is or is not the case for this particular
feature at this point), the documentation for the knob should give a
bit more than "This chang(e)s the behavior" to the readers, I would
think, to be intellectually honest ;-).  Let's tell them what bad
things happen if we pretend that we switched the branch without
twoway merge and the index update to help them make an informed
decision.



I attempted to explain what the change in behavior was in the same 
sentence by saying what it skips and what it still does.  If that isn't 
intellectually honest, help me know how to better explain it so that it is.


Is this better?

Speed up "git checkout -b " by skipping the twoway merge and 
index update.  Instead, just create a new branch named  and 
switch to it.  The working directory and index are left unchanged.



+static int needs_working_tree_merge(const struct checkout_opts *opts,
+   const struct branch_info *old_branch_info,
+   const struct branch_info *new_branch_info)
+{
+   /*
+* We must do the merge if we are actually moving to a new
+* commit tree.


What's a "commit tree"?  Shouldn't it be just a "commit"?



Sure


+*/
+   if (!old_branch_info->commit || !new_branch_info->commit ||
+   oidcmp(&old_branch_info->commit->object.oid, 
&new_branch_info->commit->object.oid))
+   return 1;
+
+   /*
+* opts->patch_mode cannot be used with switching branches so is
+* not tested here
+*/
+
+   /*
+* opts->quiet only impacts output so doesn't require a merge
+*/
+
+   /*
+* Honor the explicit request for a three-way merge or to throw away
+* local changes
+*/
+   if (opts->merge || opts->force)
+   return 1;
+
+   /*
+* --detach is documented as "updating the index and the files in the
+* working tree" but this optimization skips those steps so fall through
+* to the regular code path.
+*/
+   if (opts->force_detach)
+   return 1;
+
+   /*
+* opts->writeout_stage cannot be used with switching branches so is
+* not tested here
+*/
+
+   /*
+* Honor the explicit ignore requests
+*/
+   if (!opts->over

Re: [PATCH v1] checkout: optionally speed up "git checkout -b foo"

2018-07-24 Thread Ben Peart





On 7/24/2018 2:42 PM, Eric Sunshine wrote:

On Tue, Jul 24, 2018 at 2:01 PM Ben Peart  wrote:

If the new core.optimizecheckout config setting is set to true, speed up


Maybe:

 Add core.optimizeCheckout config setting which, when true, speeds up



Sure


"git checkout -b foo" by avoiding the work to merge the working tree.  This
is valid because no merge needs to occur - only creating the new branch/
updating the refs. Any other options force it through the old code path.

This change in behavior is off by default and behind the config setting so
that users have to opt-in to the optimized behavior.

We've been running with this patch internally for a long time but it was
rejected when I submitted it to the mailing list before because it
implicitly changes the behavior of checkout -b. Trying it again configured
behind a config setting as a potential solution for other optimizations to
checkout that could change the behavior as well.


This paragraph is mere commentary which probably belongs below the
"---" line following your sign-off.



Hopefully this commentary (I'll move it below the --- line) is clearer:

We've been running with this patch internally for a long time but it was
rejected when I submitted it to the mailing list before [1] because it
implicitly changes the behavior of checkout -b as it no longer updates
the working directory.

I'm submitting it again behind a config setting so that it doesn't cause
any back compat issues unless the user explicitly opts in to the new
behavior. My hope is this same setting and model can be used if/when we
make other performance optimizations to checkout like using the cache
tree to avoid having to traverse the entire tree being discussed [2].

[1] 
https://public-inbox.org/git/20160909192520.4812-1-benpe...@microsoft.com/
[2] 
https://public-inbox.org/git/20180724042740.gb13...@sigill.intra.peff.net/T/#m75afe3ab318d23f36334cf3a6e3d058839592469



https://public-inbox.org/git/20180724042740.gb13...@sigill.intra.peff.net/T/#m75afe3ab318d23f36334cf3a6e3d058839592469


Is this link meant to reference the previous attempt of optimizing
"checkout -b"? Although there's a single mention of "checkout -b" in
that discussion, it doesn't seem to be the previous attempt or explain
why it was rejected.

It would be quite nice to see a discussion in both the commit message
and the documentation about the pros and cons of enabling this
optimization. That it was previously rejected suggests that there may
be serious or unexpected consequences. How will a typical user know
whether its use is desirable or not?


Signed-off-by: Ben Peart 
---
diff --git a/Documentation/config.txt b/Documentation/config.txt
@@ -911,6 +911,12 @@ core.commitGraph::
+core.optimizedCheckout
+   Speed up "git checkout -b foo" by skipping much of the work of a
+   full checkout command.  This changs the behavior as it will skip


s/changs/changes/


+   merging the trees and updating the index and instead only create
+   and switch to the new ref.
diff --git a/builtin/checkout.c b/builtin/checkout.c
@@ -471,6 +475,88 @@ static void setup_branch_path(struct branch_info *branch)
+static int needs_working_tree_merge(const struct checkout_opts *opts,
+   const struct branch_info *old_branch_info,
+   const struct branch_info *new_branch_info)
+{
+   /*
+* We must do the merge if we are actually moving to a new
+* commit tree.
+*/
+   if (!old_branch_info->commit || !new_branch_info->commit ||
+   oidcmp(&old_branch_info->commit->object.oid, 
&new_branch_info->commit->object.oid))
+   return 1;
+   [...]
+   return 0;
+}


This long list of special-case checks doesn't leave me too enthused,
however, that aside, this approach seems backward. Rather than erring
on the side of safety by falling back to the merging behavior, it errs
in the other direction, which may be a problem if this list of
special-case checks ever gets out of sync with 'checkout_opts'. That
is, if someone adds a new option which ought to employ the merging
behavior, but forgets to update this function, then this function will
incorrectly default to using the optimization.



I'm not thrilled with the long list either (the plethora of comments 
probably makes it appear worse than it is) but I don't see how flipping 
the logic around makes it fail if someone adds a new option.  The "meets 
all criteria for optimization" code can only test existing options.



A safer approach would be the inverse, namely:

 static int skip_worktree_merge(...)
 {
 if (...meets all criteria for optimization...)
 return 1;
 return 0;
 }


  static int merge_working_tree(const struct checkout_opts *opts,
   struct branch_info *old_branch_info,
   struct branch_info *new_branch_info,
{
+   /*
+* Skip merging the trees, updating the index, and work tree only i

Re: [PATCH v1] checkout: optionally speed up "git checkout -b foo"

2018-07-24 Thread Junio C Hamano

Ben Peart  writes:

> From: Ben Peart 
>
> If the new core.optimizecheckout config setting is set to true, speed up
> "git checkout -b foo" by avoiding the work to merge the working tree.  This
> is valid because no merge needs to occur - only creating the new branch/
> updating the refs. Any other options force it through the old code path.
>
> This change in behavior is off by default and behind the config setting so
> that users have to opt-in to the optimized behavior.




> We've been running with this patch internally for a long time but it was
> rejected when I submitted it to the mailing list before because it
> implicitly changes the behavior of checkout -b. Trying it again configured
> behind a config setting as a potential solution for other optimizations to
> checkout that could change the behavior as well.
>
> https://public-inbox.org/git/20180724042740.gb13...@sigill.intra.peff.net/T/#m75afe3ab318d23f36334cf3a6e3d058839592469

An incorrect link?  It does not look like a thread that explains
what was previously submitted but failed.  The last paragraph looks
like a fine material below the three-dash line.


> Signed-off-by: Ben Peart 
> ---
>
> Notes:
> Base Ref: master
> Web-Diff: https://github.com/benpeart/git/commit/f43d934ce7
> Checkout: git fetch https://github.com/benpeart/git checkout-b-v1 && git 
> checkout f43d934ce7
>
>  Documentation/config.txt |  6 +++
>  builtin/checkout.c   | 94 
>  cache.h  |  1 +
>  config.c |  5 +++
>  environment.c|  1 +
>  5 files changed, 107 insertions(+)
>
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index a32172a43c..2c4f513bf1 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -911,6 +911,12 @@ core.commitGraph::
>   Enable git commit graph feature. Allows reading from the
>   commit-graph file.
>  
> +core.optimizedCheckout
> + Speed up "git checkout -b foo" by skipping much of the work of a
> + full checkout command.  This changs the behavior as it will skip
> + merging the trees and updating the index and instead only create
> + and switch to the new ref.

By the way, why is it a core.* thing, not checkout.* thing?

If a new feature is not necessarily recommendable for normal users
and it needs to be hidden behind an opt-in knob (I do not have a
strong opinion if that is or is not the case for this particular
feature at this point), the documentation for the knob should give a
bit more than "This chang(e)s the behavior" to the readers, I would
think, to be intellectually honest ;-).  Let's tell them what bad
things happen if we pretend that we switched the branch without
twoway merge and the index update to help them make an informed
decision.

> +static int needs_working_tree_merge(const struct checkout_opts *opts,
> + const struct branch_info *old_branch_info,
> + const struct branch_info *new_branch_info)
> +{
> + /*
> +  * We must do the merge if we are actually moving to a new
> +  * commit tree.

What's a "commit tree"?  Shouldn't it be just a "commit"?

> +  */
> + if (!old_branch_info->commit || !new_branch_info->commit ||
> + oidcmp(&old_branch_info->commit->object.oid, 
> &new_branch_info->commit->object.oid))
> + return 1;
> +
> + /*
> +  * opts->patch_mode cannot be used with switching branches so is
> +  * not tested here
> +  */
> +
> + /*
> +  * opts->quiet only impacts output so doesn't require a merge
> +  */
> +
> + /*
> +  * Honor the explicit request for a three-way merge or to throw away
> +  * local changes
> +  */
> + if (opts->merge || opts->force)
> + return 1;
> +
> + /*
> +  * --detach is documented as "updating the index and the files in the
> +  * working tree" but this optimization skips those steps so fall through
> +  * to the regular code path.
> +  */
> + if (opts->force_detach)
> + return 1;
> +
> + /*
> +  * opts->writeout_stage cannot be used with switching branches so is
> +  * not tested here
> +  */
> +
> + /*
> +  * Honor the explicit ignore requests
> +  */
> + if (!opts->overwrite_ignore || opts->ignore_skipworktree ||
> + opts->ignore_other_worktrees)
> + return 1;
> +
> + /*
> +  * opts->show_progress only impacts output so doesn't require a merge
> +  */
> +
> + /*
> +  * If we aren't creating a new branch any changes or updates will
> +  * happen in the existing branch.  Since that could only be updating
> +  * the index and working directory, we don't want to skip those steps
> +  * or we've defeated any purpose in running the command.
> +  */
> + if (!opts->new_branch)
> + return 1;
> +
> + /*
> +  * new_branch_force is defined to "create/reset and checkout a branch"
> +

Re: [PATCH v1] checkout: optionally speed up "git checkout -b foo"

2018-07-24 Thread Eric Sunshine

On Tue, Jul 24, 2018 at 2:01 PM Ben Peart  wrote:
> If the new core.optimizecheckout config setting is set to true, speed up

Maybe:

Add core.optimizeCheckout config setting which, when true, speeds up

> "git checkout -b foo" by avoiding the work to merge the working tree.  This
> is valid because no merge needs to occur - only creating the new branch/
> updating the refs. Any other options force it through the old code path.
>
> This change in behavior is off by default and behind the config setting so
> that users have to opt-in to the optimized behavior.
>
> We've been running with this patch internally for a long time but it was
> rejected when I submitted it to the mailing list before because it
> implicitly changes the behavior of checkout -b. Trying it again configured
> behind a config setting as a potential solution for other optimizations to
> checkout that could change the behavior as well.

This paragraph is mere commentary which probably belongs below the
"---" line following your sign-off.

> https://public-inbox.org/git/20180724042740.gb13...@sigill.intra.peff.net/T/#m75afe3ab318d23f36334cf3a6e3d058839592469

Is this link meant to reference the previous attempt of optimizing
"checkout -b"? Although there's a single mention of "checkout -b" in
that discussion, it doesn't seem to be the previous attempt or explain
why it was rejected.

It would be quite nice to see a discussion in both the commit message
and the documentation about the pros and cons of enabling this
optimization. That it was previously rejected suggests that there may
be serious or unexpected consequences. How will a typical user know
whether its use is desirable or not?

> Signed-off-by: Ben Peart 
> ---
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> @@ -911,6 +911,12 @@ core.commitGraph::
> +core.optimizedCheckout
> +   Speed up "git checkout -b foo" by skipping much of the work of a
> +   full checkout command.  This changs the behavior as it will skip

s/changs/changes/

> +   merging the trees and updating the index and instead only create
> +   and switch to the new ref.
> diff --git a/builtin/checkout.c b/builtin/checkout.c
> @@ -471,6 +475,88 @@ static void setup_branch_path(struct branch_info *branch)
> +static int needs_working_tree_merge(const struct checkout_opts *opts,
> +   const struct branch_info *old_branch_info,
> +   const struct branch_info *new_branch_info)
> +{
> +   /*
> +* We must do the merge if we are actually moving to a new
> +* commit tree.
> +*/
> +   if (!old_branch_info->commit || !new_branch_info->commit ||
> +   oidcmp(&old_branch_info->commit->object.oid, 
> &new_branch_info->commit->object.oid))
> +   return 1;
> +   [...]
> +   return 0;
> +}

This long list of special-case checks doesn't leave me too enthused,
however, that aside, this approach seems backward. Rather than erring
on the side of safety by falling back to the merging behavior, it errs
in the other direction, which may be a problem if this list of
special-case checks ever gets out of sync with 'checkout_opts'. That
is, if someone adds a new option which ought to employ the merging
behavior, but forgets to update this function, then this function will
incorrectly default to using the optimization.

A safer approach would be the inverse, namely:

static int skip_worktree_merge(...)
{
if (...meets all criteria for optimization...)
return 1;
return 0;
}

>  static int merge_working_tree(const struct checkout_opts *opts,
>   struct branch_info *old_branch_info,
>   struct branch_info *new_branch_info,
> {
> +   /*
> +* Skip merging the trees, updating the index, and work tree only if 
> we
> +* are simply creating a new branch via "git checkout -b foo."  Any
> +* other options or usage will continue to do all these steps.
> +*/
> +   if (core_optimize_checkout && !needs_working_tree_merge(opts, 
> old_branch_info, new_branch_info))
> +   return 0;

This seems a somewhat odd place to hook in this optimization,
especially as there is only a single caller of this function. Instead,
one might expect the caller itself to make this judgment and avoid
trying the merge in the first place if not needed. That is, in
switch_branches:

if (!skip_worktree_merge(...))
ret = merge_working_tree(...);

Re: [PATCH v1] checkout: optionally speed up "git checkout -b foo"

Re: [PATCH v1] checkout: optionally speed up "git checkout -b foo"

Re: [PATCH v1] checkout: optionally speed up "git checkout -b foo"

Re: [PATCH v1] checkout: optionally speed up "git checkout -b foo"

Re: [PATCH v1] checkout: optionally speed up "git checkout -b foo"

Re: [PATCH v1] checkout: optionally speed up "git checkout -b foo"

Re: [PATCH v1] checkout: optionally speed up "git checkout -b foo"

7 matches

Site Navigation

Mail list logo

Footer information