Re: [PATCH v3] checkout: eliminate unnecessary merge for trivial checkout

2016-09-28 Thread Junio C Hamano
"Ben Peart"  writes:

> The fact that "git checkout -b NEW" updates the index and as a
> result reflects any changes in the sparse-checkout and the issue 
> Junio pointed out earlier about not calling show_local_changes 
> at the end of merge_working_tree are the only difference in behavior
> I am aware of.  Both of these are easily rectified.
>
> That said, given we are skipping huge amounts of work by no longer 
> merging the commit trees, generating a new index, and merging the 
> local modifications in the working tree, it is possible that there are
> other behavior changes I'm just not aware of.

That is OK.  It is not ok to leave such bugs at the end of the
development before the topic is merged to 'master' to be delivered
to the end users, but you do not have to fight alone to produce a
perfect piece of code with your first attempt.  That's what the
reviews and testing period are for.

If you are shooting for the same behaviour, then that is much better
than "make 'checkout -b NEW' be equivalent to a sequence of
update-ref && symbolic-ref, which is different from others", which
was the second explanation you gave earlier.  I am much happier with
that goal.

But if that is the case, I really do not see any point of singling
out "-b NEW" case.  The following property MUST be kept:

 (1) "git checkout -b NEW", "git checkout", "git checkout HEAD^0"
 and "git checkout HEAD" (no other parameters to any of them)
 ought to give identical index and working tree.  It is too
 confusing to leave subtly different results that will lead to
 hard to diagnose bugs for only one of them.

That would make the "do we skip unpack_trees() call?" decision a lot
simpler to make, I would suspect.  We only need to see "are the two
trees we would fed unpack_trees() the same as HEAD's tree?" and do
not have to look at new_branch and other irrelevant things at all.
What happens in the ref namespace is immaterial, as making or
skipping an unpack_trees() call would not affect anything other than
the resulting index and the working tree.  If we want to keep that
sparse-checkout wart, we would also need to see if the control file
sparse-checkout keeps in $GIT_DIR/ exists, but the result will be
much simpler set of rules, and would hopefully help remove the "the
optimization kicks in following logic that is an unreviewable-mess"
issue.





RE: [PATCH v3] checkout: eliminate unnecessary merge for trivial checkout

2016-09-28 Thread Ben Peart
Resending

> -Original Message-
> From: git-ow...@vger.kernel.org [mailto:git-ow...@vger.kernel.org] On
> Behalf Of Philip Oakley
> Sent: Saturday, September 24, 2016 3:31 PM
> To: Junio C Hamano <gits...@pobox.com>
> Cc: Ben Peart <ben.pe...@microsoft.com>; pclo...@gmail.com;
> git@vger.kernel.org
> Subject: Re: [PATCH v3] checkout: eliminate unnecessary merge for trivial
> checkout
> 
> Hi Junio,
> 
> From: "Junio C Hamano" <gits...@pobox.com>
> > "Philip Oakley" <philipoak...@iee.org> writes:
> >
> >>> > >"git checkout -b foo" (without -f -m or ) is defined
> >>> > >in the manual as being a shortcut for/equivalent to:
> >>> > >
> >>> > >(1a) "git branch foo"
> >>> > >(1b) "git checkout foo"
> >>> > >
> >>> > >However, it has been our experience in our observed use cases and
> >>> > >all the existing git tests, that it can be treated as equivalent
to:
> >>> > >
> >>> > >(2a) "git branch foo"
> >>> > >(2b) "git symbolic-ref HEAD refs/heads/foo"
> >>> > >...
> >>> > >
> >>> > I am still not sure if I like the change of what "checkout -b" is
> >>> > this late in the game, though.
> >>>
> >>> ...
> >>> That said, you're much more on the frontline of receiving negative
> >>> feedback about doing that than I am. :)  How would you like to
> >>> proceed?
> >>
> >> I didn't see an initial confirmation as to what the issue really was.
> >> You indicated the symptom ('a long checkout time'), but then we
> >> missed out on hard facts and example repos, so that the issue was
> >> replicable.
> >
> > I took it as a given, trivial and obvious optimization opportunity,
> > that it is wasteful having to traverse two trees to consolidate and
> > reflect their differences into the working tree when we know upfront
> > that these two trees are identical, no matter what the overhead for
> > doing so is.
> 
> I agree, and I believe Ben agrees.
> 

Correct.  In my original patch request I put more specific information on 
the impact this optimization has in our specific case (reducing the cost 
from 166 seconds to 16 seconds).

> >
> >> At the moment there is the simple workaround of an alias that
> >> executes that two step command dance to achieve what you needed, and
> >> Junio has outlined the issues he needed to be covered from his
> >> maintainer perspective (e.g. the detection of sparse checkouts).
> >> Confirming the root causes would help in setting a baseline.
> >>
> >> I hope that is of help - I'd seen that the discussion had gone quiet.
> >
> > Some of the problems I have are:
> >
> > (1) "git checkout -b NEW", "git checkout", "git checkout HEAD^0"
> > and "git checkout HEAD" (no other parameters to any of them)
> > ought to give identical index and working tree.  It is too
> > confusing to leave subtly different results that will lead to
> > hard to diagnose bugs for only one of them.
> >
> > (2) The proposed log message talks only about "performance
> > optimization",
> 
> >while the purpose of the change is more
> > about
> > changing the definition
> 
> Here I think is the misunderstanding. His purpose is NOT to change the
> definition (IIUC). As I read the message you reference below (and Ben's
other
> messages), I understood that he was trying to achieve what you said (i.e.
> optimise the trivial and obvious opportunity) of selecting for the common
> case (underlying conditions) where the two command sequences are
> identical. If the selected case / conditions is not identical then it is
defined
> wrongly...
> 
> I suspect that it was Ben's 'soft' explanation that allowed the discussion
to
> diverge.
> 

I'm unaccustomed to doing reviews like this via email so have been 
struggling with how to most effectively communicate about the proposed
change.  I appreciate any help and understanding as I go through this
for the first time.

My intention was not to change the users expected results which
I believe are to "create a new branch and switch to it."  We reinforce
that expectation with the output of the command which completes 
with the text "Switched to a new branch 'foo'"

> 
> >   

Re: [PATCH v3] checkout: eliminate unnecessary merge for trivial checkout

2016-09-24 Thread Philip Oakley

Hi Junio,

From: "Junio C Hamano" 

"Philip Oakley"  writes:


> >"git checkout -b foo" (without -f -m or ) is defined in
> >the manual as being a shortcut for/equivalent to:
> >
> >(1a) "git branch foo"
> >(1b) "git checkout foo"
> >
> >However, it has been our experience in our observed use cases and all
> >the existing git tests, that it can be treated as equivalent to:
> >
> >(2a) "git branch foo"
> >(2b) "git symbolic-ref HEAD refs/heads/foo"
> >...
> >
> I am still not sure if I like the change of what "checkout -b" is this
> late in the game, though.

...
That said, you're much more on the frontline of receiving negative
feedback about doing that than I am. :)  How would you like to
proceed?


I didn't see an initial confirmation as to what the issue really
was. You indicated the symptom ('a long checkout time'), but then we
missed out on hard facts and example repos, so that the issue was
replicable.


I took it as a given, trivial and obvious optimization opportunity,
that it is wasteful having to traverse two trees to consolidate and
reflect their differences into the working tree when we know upfront
that these two trees are identical, no matter what the overhead for
doing so is.


I agree, and I believe Ben agrees.




At the moment there is the simple workaround of an alias that executes
that two step command dance to achieve what you needed, and Junio has
outlined the issues he needed to be covered from his maintainer
perspective (e.g. the detection of sparse checkouts). Confirming the
root causes would help in setting a baseline.

I hope that is of help - I'd seen that the discussion had gone quiet.


Some of the problems I have are:

(1) "git checkout -b NEW", "git checkout", "git checkout HEAD^0"
and "git checkout HEAD" (no other parameters to any of them)
ought to give identical index and working tree.  It is too
confusing to leave subtly different results that will lead to
hard to diagnose bugs for only one of them.

(2) The proposed log message talks only about "performance
optimization",


   while the purpose of the change is more 
about

changing the definition


Here I think is the misunderstanding. His purpose is NOT to change the 
definition (IIUC). As I read the message you reference below (and Ben's 
other messages), I understood that he was trying to achieve what you said 
(i.e. optimise the trivial and obvious opportunity) of selecting for the 
common case (underlying conditions) where the two command sequences are 
identical. If the selected case / conditions is not identical then it is 
defined wrongly...


I suspect that it was Ben's 'soft' explanation that allowed the discussion 
to diverge.



of what "git checkout -b 
NEW" is from

"git branch NEW && git checkout NEW" to "git branch NEW && git
symbolic-ref HEAD refs/heads/NEW".  The explanation in a Ben's
later message <007401d21278$445eba80$cd1c2f80$@gmail.com> does
a much better job contrasting the two.

(3) I identified only one difference as an example sufficient to
point out why the patch provided is not a pure optimization but
behaviour change.  Fixing that example alone to avoid change in
the behaviour is trivial (see if the "info/sparse-checkout"
file is present and refrain from skipping the proper checkout),


This is probably the point Ben needs to take on board to narrow the 
conditions down. There may be others.



but a much larger problem is that I do not know (and Ben does
not, I suspect) know what other behaviour changes the patch is
introducing, and worse, the checks are sufficiently dense too
detailed and intimate to the implementation of unpack_trees()
that it is impossible for anybody to make sure the exceptions
defined in this patch and updates to other parts of the system
will be kept in sync.


I did not believe he was proposing such a change to behaviour, hence his 
difficulty in responding (or at least that is my perception). I.e. he was 
digging a hole in the wrong place.


It is possible that he had accidentally introduced a behavious change, and 
having failed to explictly say "This patch (should) produces no behavious 
change", which then continued to re-inforce the misunderstanding.




So my inclination at this point, unless we see somebody invents a
clever way to solve (3), is that any change that violates (1),
i.e. as long as the patch does "Are we doing '-b NEW'?  Then we do
something subtly different", is not acceptable, and solving (3) in a
maintainable way smells like quite a hard thing to do.  But it would
be ideal if (3) is solved cleanly, as we will then not have to worry
about changing behaviour at all and can apply the optimization for
all of the four cases equally.  As a side effect, that approach
would solve problem (2) above.

If we were to punt 

Re: [PATCH v3] checkout: eliminate unnecessary merge for trivial checkout

2016-09-24 Thread Junio C Hamano
"Philip Oakley"  writes:

>> > >"git checkout -b foo" (without -f -m or ) is defined in
>> > >the manual as being a shortcut for/equivalent to:
>> > >
>> > >(1a) "git branch foo"
>> > >(1b) "git checkout foo"
>> > >
>> > >However, it has been our experience in our observed use cases and all
>> > >the existing git tests, that it can be treated as equivalent to:
>> > >
>> > >(2a) "git branch foo"
>> > >(2b) "git symbolic-ref HEAD refs/heads/foo"
>> > >...
>> > >
>> > I am still not sure if I like the change of what "checkout -b" is this
>> > late in the game, though.
>>
>> ...
>> That said, you're much more on the frontline of receiving negative
>> feedback about doing that than I am. :)  How would you like to
>> proceed?
>
> I didn't see an initial confirmation as to what the issue really
> was. You indicated the symptom ('a long checkout time'), but then we
> missed out on hard facts and example repos, so that the issue was
> replicable.

I took it as a given, trivial and obvious optimization opportunity,
that it is wasteful having to traverse two trees to consolidate and
reflect their differences into the working tree when we know upfront
that these two trees are identical, no matter what the overhead for
doing so is.

> At the moment there is the simple workaround of an alias that executes
> that two step command dance to achieve what you needed, and Junio has
> outlined the issues he needed to be covered from his maintainer
> perspective (e.g. the detection of sparse checkouts). Confirming the
> root causes would help in setting a baseline.
>
> I hope that is of help - I'd seen that the discussion had gone quiet.

Some of the problems I have are:

 (1) "git checkout -b NEW", "git checkout", "git checkout HEAD^0"
 and "git checkout HEAD" (no other parameters to any of them)
 ought to give identical index and working tree.  It is too
 confusing to leave subtly different results that will lead to
 hard to diagnose bugs for only one of them.

 (2) The proposed log message talks only about "performance
 optimization", while the purpose of the change is more about
 changing the definition of what "git checkout -b NEW" is from
 "git branch NEW && git checkout NEW" to "git branch NEW && git
 symbolic-ref HEAD refs/heads/NEW".  The explanation in a Ben's
 later message <007401d21278$445eba80$cd1c2f80$@gmail.com> does
 a much better job contrasting the two.

 (3) I identified only one difference as an example sufficient to
 point out why the patch provided is not a pure optimization but
 behaviour change.  Fixing that example alone to avoid change in
 the behaviour is trivial (see if the "info/sparse-checkout"
 file is present and refrain from skipping the proper checkout),
 but a much larger problem is that I do not know (and Ben does
 not, I suspect) know what other behaviour changes the patch is
 introducing, and worse, the checks are sufficiently dense too
 detailed and intimate to the implementation of unpack_trees()
 that it is impossible for anybody to make sure the exceptions
 defined in this patch and updates to other parts of the system
 will be kept in sync.

So my inclination at this point, unless we see somebody invents a
clever way to solve (3), is that any change that violates (1),
i.e. as long as the patch does "Are we doing '-b NEW'?  Then we do
something subtly different", is not acceptable, and solving (3) in a
maintainable way smells like quite a hard thing to do.  But it would
be ideal if (3) is solved cleanly, as we will then not have to worry
about changing behaviour at all and can apply the optimization for
all of the four cases equally.  As a side effect, that approach
would solve problem (2) above.

If we were to punt on keeping the sanity (1) and introduce a subtly
different "create a new branch and point the HEAD at it", an easier
way out may be be one of

 1. a totally new command, e.g. "git branch-switch NEW" that takes
only a single argument and no other "checkout" options, or

 2. a new option to "git checkout" that takes _ONLY_ a single
argument and incompatible with any other option or command line
argument, or

 3. an alias that does "git branch" followed by "git symbolic-ref".

Neither of the first two sounds palatable, though.







Re: [PATCH v3] checkout: eliminate unnecessary merge for trivial checkout

2016-09-24 Thread Philip Oakley

Ben,
Using a 'bottom / in-line' posting flow is much preferred, which may require 
some manual editing[1], hopefully I have it about right...

Philip
--
[1] this is massaged and mangled Outlook Express, sometimes one has to work 
with the tools at hand...


From: "Ben Peart" 

From: Junio C Hamano [mailto:gits...@pobox.com]
> Junio C Hamano 
> >"git checkout -b foo" (without -f -m or ) is defined in
> >the manual as being a shortcut for/equivalent to:
> >
> >(1a) "git branch foo"
> >(1b) "git checkout foo"
> >
> >However, it has been our experience in our observed use cases and all
> >the existing git tests, that it can be treated as equivalent to:
> >
> >(2a) "git branch foo"
> >(2b) "git symbolic-ref HEAD refs/heads/foo"
> >...
> >
> I am still not sure if I like the change of what "checkout -b" is this
> late in the game, though.
>
> Having said all that.
>
> I do see the merit of having a shorthand way to invoke your 2 above.
> It is just that I am not convinced that it is the best way to achieve 
> that goal to redefine what "git checkout -b " (no other 
> parameters) does.

>
---

I understand the reluctance to change the existing behavior of the "git 
checkout -b " command.


I see this as a tradeoff between taking advantage of the muscle memory for 
the existing command and coming up with a new shortcut command and 
training people to use it instead.


The fact that all the use cases we've observed and all the git test cases 
actually produce the same results but significantly faster with that 
change in behavior made me hope we could redefine the command to take 
advantage of the muscle memory.


That said, you're much more on the frontline of receiving negative 
feedback about doing that than I am. :)  How would you like to proceed?


The discussion can often feel harsh [2], especially if there is accidental 
'talking past each other', which is usually because of differing 
perspectives on the issues.


I didn't see an initial confirmation as to what the issue really was. You 
indicated the symptom ('a long checkout time'), but then we missed out on 
hard facts and example repos, so that the issue was replicable.


Is there an example public repo that you can show the issue on? (or 
anonymise a private one - there is a script for that [3])


Can you give local timings (and indication of the hardware and software 
versions used for the test, and if appropriate, network setup)?


I know at my work that sometime our home drives are multiply mapped to H:, a 
C:/homedrive directory and a $netshare/me network directory via the 
Microsofy roaming profiles, and if there is hard synchronization (or 
whatever term is appropriate) there can be sudden slowdowns as local C: 
writes drop from 'instant' to 'forever'...


Is there anything special about the repos that have the delays? Is it a 
local process issue that causes the repos to develop those symptoms (see 
above about not being sure why you have these issues), in which case it 
could be local self inflicted issues, or it could be that you have a 
regulatory issue for that domain that requires such symptoms, which would 
shift the problem from a 'don't do that' response to a 'hmm, how to cover 
this'.



At the moment there is the simple workaround of an alias that executes that 
two step command dance to achieve what you needed, and Junio has outlined 
the issues he needed to be covered from his maintainer perspective (e.g. the 
detection of sparse checkouts). Confirming the root causes would help in 
setting a baseline.


I hope that is of help - I'd seen that the discussion had gone quiet.

--
Philip


[2] Been there, feel your pain. It's not in any way malicious, just a 
reflection that email can be a poor medium for such discussions.
[3] https://public-inbox.org/git/20140827170127.ga6...@peff.net/ suggest 
that the `git fast-export --anonymize --all` maybe the approach. 



RE: [PATCH v3] checkout: eliminate unnecessary merge for trivial checkout

2016-09-21 Thread Ben Peart
I understand the reluctance to change the existing behavior of the "git 
checkout -b " command.

I see this as a tradeoff between taking advantage of the muscle memory for the 
existing command and coming up with a new shortcut command and training people 
to use it instead.

The fact that all the use cases we've observed and all the git test cases 
actually produce the same results but significantly faster with that change in 
behavior made me hope we could redefine the command to take advantage of the 
muscle memory.

That said, you're much more on the frontline of receiving negative feedback 
about doing that than I am. :)  How would you like to proceed?

Ben

-Original Message-
From: Junio C Hamano [mailto:gits...@pobox.com] 
Sent: Monday, September 19, 2016 1:04 PM
To: Ben Peart <peart...@gmail.com>
Cc: pclo...@gmail.com; Ben Peart <ben.pe...@microsoft.com>; git@vger.kernel.org
Subject: Re: [PATCH v3] checkout: eliminate unnecessary merge for trivial 
checkout

Junio C Hamano <gits...@pobox.com> writes:

>> "git checkout -b foo" (without -f -m or ) is defined in 
>> the manual as being a shortcut for/equivalent to:
>>  
>> (1a) "git branch foo"
>> (1b) "git checkout foo"
>>  
>> However, it has been our experience in our observed use cases and all 
>> the existing git tests, that it can be treated as equivalent to:
>>  
>> (2a) "git branch foo"
>> (2b) "git symbolic-ref HEAD refs/heads/foo"
>> ...
>
> I am still not sure if I like the change of what "checkout -b" is this 
> late in the game, though.

Having said all that.

I do see the merit of having a shorthand way to invoke your 2 above.
It is just that I am not convinced that it is the best way to achieve that goal 
to redefine what "git checkout -b " (no other parameters) does.


Re: [PATCH v3] checkout: eliminate unnecessary merge for trivial checkout

2016-09-19 Thread Junio C Hamano
Junio C Hamano  writes:

>> "git checkout -b foo" (without -f -m or ) is defined in the
>> manual as being a shortcut for/equivalent to:
>>  
>> (1a) "git branch foo"
>> (1b) "git checkout foo"
>>  
>> However, it has been our experience in our observed use cases and all the
>> existing git tests, that it can be treated as equivalent to:
>>  
>> (2a) "git branch foo"
>> (2b) "git symbolic-ref HEAD refs/heads/foo"
>> ...
>
> I am still not sure if I like the change of what "checkout -b" is
> this late in the game, though.

Having said all that.

I do see the merit of having a shorthand way to invoke your 2 above.
It is just that I am not convinced that it is the best way to
achieve that goal to redefine what "git checkout -b " (no
other parameters) does.


Re: [PATCH v3] checkout: eliminate unnecessary merge for trivial checkout

2016-09-19 Thread Junio C Hamano
"Ben Peart"  writes:

> Let me see if I can better explain what I’m trying to accomplish with this
> patch.  
>  
> "git checkout -b foo" (without -f -m or ) is defined in the
> manual as being a shortcut for/equivalent to:
>  
> (1a) "git branch foo"
> (1b) "git checkout foo"
>  
> However, it has been our experience in our observed use cases and all the
> existing git tests, that it can be treated as equivalent to:
>  
> (2a) "git branch foo"
> (2b) "git symbolic-ref HEAD refs/heads/foo"
>  
> That is, the common perception (use case) is to just create a new branch
> "foo" (pointing at the current commit) and point HEAD at it WITHOUT making
> any changes to the index or worktree.
>  
> However, the (1b) command has "git reset" connotations in that it should
> examine and manipulate the trees, index, and worktree in the expectation
> that there MIGHT be work to do.
>  
> Since this additional work in (1b) takes minutes on large repos and (2b)
> takes less than a second, my intent was to identify the conditions that this
> additional work will have no affect and thereby avoid it.
>  
> Alternatively, was the "-b" option just created as a shortcut only to avoid
> calling the separate "git branch foo" command and we should not think about
> the common perception and usage?

If you are trying to change the definition of "checkout -b" from 1
to 2 above, that is a completely different issue.  I thought this
was an attempt to optimize for the performance without changing the
behaviour.

So if you did not apologize like this...

> It is correct that this optimization will skip updating the tree to honor
> any changes to the sparse-checkout in the case of creating a new branch.
> Unfortunately, I don't know of any way to detect the changes other than
> actually doing all the work to update the skip work tree bit in the index.

... but insisted that skipping the yucky sparse-checkout adjustment
in this case was an intended behaviour change, I would have
understood (not necessarily agreed, though) what you were trying to
do.

> Beyond this code review process and testing, I don't know how else we make
> sure we're caught all the conditions where we are OK skipping some of the
> steps. Any change has inherent risk - a change in behavior even more so.

At least we made one-step progress today.  I now know that you are
trying to change the behaviour, but I didn't know that last week,
when I was primarily reacting that your claim that this was
performance thing and assuming you meant no change in behaviour, but
there was clearly behaviour change, and it was apparent that the
denseness of the code made it almost impossible to see if there are
unintended changes.

I am still not sure if I like the change of what "checkout -b" is
this late in the game, though.


RE: [PATCH v3] checkout: eliminate unnecessary merge for trivial checkout

2016-09-19 Thread Ben Peart
Let me see if I can better explain what I’m trying to accomplish with this
patch.  
 
"git checkout -b foo" (without -f -m or ) is defined in the
manual as being a shortcut for/equivalent to:
 
    (1a) "git branch foo"
    (1b) "git checkout foo"
 
However, it has been our experience in our observed use cases and all the
existing git tests, that it can be treated as equivalent to:
 
    (2a) "git branch foo"
    (2b) "git symbolic-ref HEAD refs/heads/foo"
 
That is, the common perception (use case) is to just create a new branch
"foo" (pointing at the current commit) and point HEAD at it WITHOUT making
any changes to the index or worktree.
 
However, the (1b) command has "git reset" connotations in that it should
examine and manipulate the trees, index, and worktree in the expectation
that there MIGHT be work to do.
 
Since this additional work in (1b) takes minutes on large repos and (2b)
takes less than a second, my intent was to identify the conditions that this
additional work will have no affect and thereby avoid it.
 
Alternatively, was the "-b" option just created as a shortcut only to avoid
calling the separate "git branch foo" command and we should not think about
the common perception and usage?
 
More comments inline...
 
> -Original Message-
> From: Junio C Hamano [mailto:gits...@pobox.com]
> Sent: Tuesday, September 13, 2016 6:35 PM
> To: Ben Peart <mailto:peart...@gmail.com>
> Cc: mailto:git@vger.kernel.org; mailto:pclo...@gmail.com; Ben Peart
> <mailto:ben.pe...@microsoft.com>
> Subject: Re: [PATCH v3] checkout: eliminate unnecessary merge for trivial
> checkout
> 
> Ben Peart <mailto:peart...@gmail.com> writes:
> 
> > +static int needs_working_tree_merge(const struct checkout_opts *opts,
> > +  const struct branch_info *old,
> > +  const struct branch_info *new)
> > +{
> > +...
> > +}
> 
> I do not think I need to repeat the same remarks on the conditions in this
> helper, which hasn't changed since v2.  Many "comments" in the code do not
> explain why skipping is justified, or what they claim to check looks to me
just
> plain wrong.
> 
> For example, there is
> 
>    /*
> * If we're not creating a new branch, by definition we're changing
> * the existing one so need to do the merge
> */
>    if (!opts->new_branch)
>    return 1;
> 
> but "git checkout" (no other argument) hits this condition.  It disables
the
> most trivial optimization opportunity, because we are not "creating".
> 
 
Disabling the optimization for "git checkout" with no argument was
intentional. This command does not create a new branch but instead, performs
a "soft reset" which will update the index and working directory to reflect
changes to the sparse-checkout (for example).  If this was not disabled,
many tests fail as they expect this behavior.  Because "git checkout" does
not actually change the refs, if we skipped the merge/index/working
directory update, this command becomes a no-op.
 
> "By definition, we're changing"?  Really?  Not quite.
> 
 
What I was attempting to communicate is that if we aren't creating a new
branch any changes or updates will happen in the existing branch.  Since
that could only be updating the index and working directory, we don't want
to skip those steps or we've defeated any purpose in running the command. 
 
> If you disable this bogus check, "git checkout" (no other argument) would
be
> allowed to skip the merge_working_tree(), and that in turn reveals another
> case that the helper is not checking when
> unpack_trees() MUST be called.
> 
> Note: namely, when sparse checkout is in effect, switching from
> HEAD to HEAD can nuke existing working tree files outside the
> sparse pattern -- YUCK!  See penultimate test in t1011 for
> an example.
> 
> This yuckiness is not your fault, but needs_working_tree_merge() logic you
> added needs to refrain from skipping unpack_trees() call when sparse thing
> is in effect.  I'd expect "git checkout -b foo"
> instead of "git checkout" (no other argument) would fail to honor the
sparse
> thing and reveal this bug, because the above bogus "!opts->new_branch"
> check will not protect you for that case.
> 
 
It is correct that this optimization will skip updating the tree to honor
any changes to the sparse-checkout in the case of creating a new branch.
Unfortunately, I don't know of any way to detect the changes other than
actually doing all the work to update the skip work tree bit in the index.
If this behavior is required, then this optimization will need to

Re: [PATCH v3] checkout: eliminate unnecessary merge for trivial checkout

2016-09-14 Thread Junio C Hamano
Oleg Taranenko  writes:

> Sorry for bothering, why not introduce a brand new option like git
> checkout -b foo --skip-worktree-merge for such rare optimization use
> case?

I am not sure what problem such a new option solves.  How would you
describe and explain what "--skip-worktree-merge" option to the end
user?


Re: [PATCH v3] checkout: eliminate unnecessary merge for trivial checkout

2016-09-14 Thread Oleg Taranenko
Sorry for bothering, why not introduce a brand new option like git
checkout -b foo --skip-worktree-merge for such rare optimization use
case?

On Wed, Sep 14, 2016 at 12:34 AM, Junio C Hamano  wrote:
> Ben Peart  writes:
>
>> +static int needs_working_tree_merge(const struct checkout_opts *opts,
>> + const struct branch_info *old,
>> + const struct branch_info *new)
>> +{
>> +...
>> +}
>
> I do not think I need to repeat the same remarks on the conditions
> in this helper, which hasn't changed since v2.  Many "comments" in
> the code do not explain why skipping is justified, or what they
> claim to check looks to me just plain wrong.
>
> For example, there is
>
>/*
> * If we're not creating a new branch, by definition we're changing
> * the existing one so need to do the merge
> */
>if (!opts->new_branch)
>return 1;
>
> but "git checkout" (no other argument) hits this condition.  It
> disables the most trivial optimization opportunity, because we are
> not "creating".
>
> "By definition, we're changing"?  Really?  Not quite.
>
> If you disable this bogus check, "git checkout" (no other argument)
> would be allowed to skip the merge_working_tree(), and that in turn
> reveals another case that the helper is not checking when
> unpack_trees() MUST be called.
>
> Note: namely, when sparse checkout is in effect, switching from
> HEAD to HEAD can nuke existing working tree files outside the
> sparse pattern -- YUCK!  See penultimate test in t1011 for
> an example.
>
> This yuckiness is not your fault, but needs_working_tree_merge()
> logic you added needs to refrain from skipping unpack_trees() call
> when sparse thing is in effect.  I'd expect "git checkout -b foo"
> instead of "git checkout" (no other argument) would fail to honor
> the sparse thing and reveal this bug, because the above bogus
> "!opts->new_branch" check will not protect you for that case.
>
> In other words, these random series of "if (...) return 1" are bugs
> hiding other real bugs and we need to reason about which ones are
> bugs that are hiding what other bugs that are not covered by this
> function.  As Peff said earlier for v1, this is still an unreadable
> mess.  We need to figure out a way to make sure we are skipping on
> the right condition and not accidentally hiding a bug of failing to
> check the right condition.  I offhand do not have a good suggestion
> on this; sorry.
>
>>  static int merge_working_tree(const struct checkout_opts *opts,
>> struct branch_info *old,
>> struct branch_info *new,
>> int *writeout_error)
>>  {
>> + /*
>> +  * Optimize the performance of "git checkout -b foo" by avoiding
>> +  * the expensive merge, index and working directory updates if they
>> +  * are not needed.
>> +  */
>> + if (!needs_working_tree_merge(opts, old, new))
>> + return 0;
>> +
>>   int ret;
>>   struct lock_file *lock_file = xcalloc(1, sizeof(struct lock_file));
>
> With the change you made at the beginning of this function, it no
> longer compiles with -Wdecl-after-stmt, but that is the smallest of
> the problems.
>
> It is a small step in the right direction to move the call to the
> helper from the caller to this function, but it is a bit too small.
>
> Notice that the lines after the above context look like this:
>
> hold_locked_index(lock_file, 1);
> if (read_cache_preload(NULL) < 0)
> return error(_("index file corrupt"));
>
> resolve_undo_clear();
> if (opts->force) {
> ret = reset_tree(new->commit->tree, opts, 1, writeout_error);
> if (ret)
> return ret;
> } else {
> struct tree_desc trees[2];
> ...
>
> I would have expected that the check goes inside the "else" thing
> that actually does a two-tree merge, and the helper loses the check
> with opts->force, at least.  That would still be a change smaller
> than desired, but at least a meaningful improvement compared to the
> previous one.  As I have already pointed out, in the "else" clause
> there is a check "is the index free of conflicted entries? if so
> error out", and that must be honored in !opt->force case, no matter
> what your needs_working_tree_merge() says.  I also was hoping that
> you would notice, when you were told about the unmerged check, by
> reading the remainder of the merge_working_tree(), that we need to
> call show_local_changes() when we are not doing force and when we
> are not quiet---returning early like the above patch will never be
> able to call that one downstream in the function.
>
> Regardless of what the actual checks end up to be, the right place
> to do this "optimization" would look more like:
>
>  builtin/checkout.c | 7 ++-
>  1 file changed, 6 

[PATCH v3] checkout: eliminate unnecessary merge for trivial checkout

2016-09-13 Thread Ben Peart
Teach git to avoid unnecessary merge during trivial checkout.  When
running 'git checkout -b foo' git follows a common code path through
the expensive merge_working_tree even when it is unnecessary.  As a
result, 95% of the time is spent in merge_working_tree doing the 2-way
merge between the new and old commit trees that is unneeded.

The time breakdown is as follows:

merge_working_tree <-- 95%
unpack_trees <-- 80%
traverse_trees <-- 50%
cache_tree_update <-- 17%
mark_new_skip_worktree <-- 10%

With a large repo, this cost is pronounced.  Using "git checkout -b r"
to create and switch to a new branch costs 166 seconds (all times worst
case with a cold file system cache).

git.c:406   trace: built-in: git 'checkout' '-b' 'r'
read-cache.c:1667   performance: 17.442926555 s: read_index_from
name-hash.c:128 performance: 2.912145231 s: lazy_init_name_hash
read-cache.c:2208   performance: 4.387713335 s: write_locked_index
trace.c:420 performance: 166.458921289 s: git command:
'c:\Users\benpeart\bin\git.exe' 
'checkout' '-b' 'r'
Switched to a new branch 'r'

By adding a test to skip the unnecessary call to merge_working_tree in
this case reduces the cost to 16 seconds.

git.c:406   trace: built-in: git 'checkout' '-b' 's'
read-cache.c:1667   performance: 16.100742476 s: read_index_from
trace.c:420 performance: 16.461547867 s: git command: 
'c:\Users\benpeart\bin\git.exe' 'checkout' '-b' 's'
Switched to a new branch 's'

Signed-off-by: Ben Peart 
---
 builtin/checkout.c | 92 ++
 1 file changed, 92 insertions(+)

diff --git a/builtin/checkout.c b/builtin/checkout.c
index 8672d07..8b2f428 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -38,6 +38,10 @@ struct checkout_opts {
int ignore_skipworktree;
int ignore_other_worktrees;
int show_progress;
+   /*
+* If new checkout options are added, needs_working_tree_merge
+* should be updated accordingly.
+*/
 
const char *new_branch;
const char *new_branch_force;
@@ -460,11 +464,99 @@ static void setup_branch_path(struct branch_info *branch)
branch->path = strbuf_detach(, NULL);
 }
 
+static int needs_working_tree_merge(const struct checkout_opts *opts,
+   const struct branch_info *old,
+   const struct branch_info *new)
+{
+   /*
+* We must do the merge if we are actually moving to a new
+* commit tree.
+*/
+   if (!old->commit || !new->commit ||
+   oidcmp(>commit->tree->object.oid, 
>commit->tree->object.oid))
+   return 1;
+
+   /*
+* opts->patch_mode cannot be used with switching branches so is
+* not tested here
+*/
+
+   /*
+* opts->quiet only impacts output so doesn't require a merge
+*/
+
+   /*
+* Honor the explicit request for a three-way merge or to throw away
+* local changes
+*/
+   if (opts->merge || opts->force)
+   return 1;
+
+   /*
+* Checking out the requested commit may require updating the working
+* directory and index, let the merge handle it.
+*/
+   if (opts->force_detach)
+   return 1;
+
+   /*
+* opts->writeout_stage cannot be used with switching branches so is
+* not tested here
+*/
+
+   /*
+* Honor the explicit ignore requests
+*/
+   if (!opts->overwrite_ignore || opts->ignore_skipworktree ||
+   opts->ignore_other_worktrees)
+   return 1;
+
+   /*
+* opts->show_progress only impacts output so doesn't require a merge
+*/
+
+   /*
+* If we're not creating a new branch, by definition we're changing
+* the existing one so need to do the merge
+*/
+   if (!opts->new_branch)
+   return 1;
+
+   /*
+* new_branch_force is defined to "create/reset and checkout a branch"
+* so needs to go through the merge to do the reset
+*/
+   if (opts->new_branch_force)
+   return 1;
+
+   /*
+* A new orphaned branch requrires the index and the working tree to be
+* adjusted to 
+*/
+   if (opts->new_orphan_branch)
+   return 1;
+
+   /*
+* Remaining variables are not checkout options but used to track state
+* that doesn't trigger the need for a merge.
+*/
+
+   return 0;
+}
+
 static int merge_working_tree(const struct checkout_opts *opts,
  struct branch_info *old,
  struct branch_info *new,
  int *writeout_error)
 {
+   /*
+* Optimize the performance of "git checkout -b foo" by avoiding
+*