Re: [PATCH 1/3] cache-tree: Create/update cache-tree on checkout

2014-07-07 Thread Junio C Hamano
David Turner  writes:

>> I am not convinced that doing an equivalent of write-tree when you
>> switch branches is the right approach in the first place.  You will
>> eventually write it out as a tree, and having a relatively undamaged
>> cache-tree will help you when you do so, but spending the cycles
>> necessary to compute a fully populated cache-tree, only to let it
>> degrade over time until you are ready to write it out as a tree,
>> somehow sounds like asking for a duplicated work upfront.
>
> As I understand it, the cache-tree extension was originally designed to
> speed up writing the tree later.  However, as Karsten Blees's work (and
> my own tests) have shown, it also speeds up git status.  I use git
> status a lot while working, and I've talked to a few others who do the
> same.  So I think it's worth spending extra time when switching branches
> to have a good working experience within that branch.

You are reading the history of the subsystem correctly.  Since
b65982b6 (Optimize "diff-index --cached" using cache-tree,
2009-05-20), having an undamaged cache-tree does help with "git
status" as well.

> In the new version of the patchset (which I'll post shortly), I've added
> an option WRITE_TREE_REPAIR, which does all of the work to compute a new
> tree object, but only adds it to the cache-tree if it already exists
> on-disk.  This is a little wasteful for the reason that you note.  So if
> you would like, I could add a config option to skip it.  But I think it
> is a good default.

OK, sounds good.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] cache-tree: Create/update cache-tree on checkout

2014-07-05 Thread David Turner
On Tue, 2014-07-01 at 13:15 -0700, Junio C Hamano wrote:
> David Turner  writes:
> 
> > When git checkout checks out a branch, create or update the
> > cache-tree so that subsequent operations are faster.
> >
> > Signed-off-by: David Turner 
> > ---
> >  builtin/checkout.c|  8 
> >  cache-tree.c  |  5 +++--
> >  t/t0090-cache-tree.sh | 15 ++-
> >  3 files changed, 25 insertions(+), 3 deletions(-)
> >
> > diff --git a/builtin/checkout.c b/builtin/checkout.c
> > index 07cf555..a023a86 100644
> > --- a/builtin/checkout.c
> > +++ b/builtin/checkout.c
> > @@ -553,6 +553,14 @@ static int merge_working_tree(const struct 
> > checkout_opts *opts,
> > }
> > }
> >  
> > +   if (!active_cache_tree)
> > +   active_cache_tree = cache_tree();
> > +
> > +   if (!cache_tree_fully_valid(active_cache_tree))
> > +   cache_tree_update(active_cache_tree,
> > + (const struct cache_entry * const 
> > *)active_cache,
> > + active_nr, 0);
> > +
> 
> This looks much better than the previous round, but it still does
> allow verify_cache() to throw noises against unmerged entries in the
> cache, as WRITE_TREE_SILENT flag is not passed down, no?
> 
>   $ git checkout master^0
> $ git am $this_message
> $ make
> $ edit builtin/checkout.c ;# make changes to the above lines
> $ ./git checkout -m master^0
> x   builtin/checkout.c: unmerged 
> (972c8a7b28f16f88475268f9a714048c228e69db)
> x   builtin/checkout.c: unmerged 
> (f1dc56e55f7b2200412142b10517458ccfda2952)
> x   builtin/checkout.c: unmerged 
> (3b9753ba8c19e7dfe6e922f30eb85c83a92a4596)
> M   builtin/checkout.c
> Warning: you are leaving 1 commit behind, not connected to
> any of your branches:
> 
>   25fab54 cache-tree: Create/update cache-tree on checkout
> 
> Switched to branch 'master'
> 
> Passing WRITE_TREE_SILENT in the flags parameter will get rid of the
> conflict notice output from the above.
> 
> The user is not interested in writing a brand new tree object at all
> in this case, so it feels wrong to actually let the call chain go
> down to update_one() and create new tree objects.
> 
>   Side note.  And passing WRITE_TREE_DRY_RUN is not a good
>   solution either, because a later write_cache_as_tree() will
>   not create the necessary tree object once you stuff a tree
>   object name in the cache-tree.
> 
> What we want in this code path is a way to repair a sub cache_tree
> if it can be repaired without creating a new tree object and
> otherwise leave that part invalid.  The existing cache-tree
> framework is not prepared to do that kind of thing.  It wants to
> start from the bottom and percolate things up, computing levels
> nearer to the top-level only after it fully created the trees for
> deeper levels, because it is meant to be used only when we really
> want to write out trees.  We may want to reuse update_one() but
> 
> I am not convinced that doing an equivalent of write-tree when you
> switch branches is the right approach in the first place.  You will
> eventually write it out as a tree, and having a relatively undamaged
> cache-tree will help you when you do so, but spending the cycles
> necessary to compute a fully populated cache-tree, only to let it
> degrade over time until you are ready to write it out as a tree,
> somehow sounds like asking for a duplicated work upfront.

As I understand it, the cache-tree extension was originally designed to
speed up writing the tree later.  However, as Karsten Blees's work (and
my own tests) have shown, it also speeds up git status.  I use git
status a lot while working, and I've talked to a few others who do the
same.  So I think it's worth spending extra time when switching branches
to have a good working experience within that branch.

In the new version of the patchset (which I'll post shortly), I've added
an option WRITE_TREE_REPAIR, which does all of the work to compute a new
tree object, but only adds it to the cache-tree if it already exists
on-disk.  This is a little wasteful for the reason that you note.  So if
you would like, I could add a config option to skip it.  But I think it
is a good default.

Does this seem OK to you, assuming the implementation is good? 

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] cache-tree: Create/update cache-tree on checkout

2014-07-01 Thread David Turner
On Tue, 2014-07-01 at 14:03 -0700, Junio C Hamano wrote:
> David Turner  writes:
> 
> > On Tue, 2014-07-01 at 06:16 +0200, Torsten Bögershausen wrote:
> >> diff --git a/t/t0090-cache-tree.sh b/t/t0090-cache-tree.sh
> >> index 6c33e28..7c60675 100755
> >> --- a/t/t0090-cache-tree.sh
> >> +++ b/t/t0090-cache-tree.sh
> >> @@ -85,9 +85,22 @@ test_expect_success 'reset --hard without index gives 
> >> cache-tree' '
> >>test_shallow_cache_tree
> >>   '
> >>   
> >> -test_expect_failure 'checkout gives cache-tree' '
> >> +test_expect_success 'checkout gives cache-tree' '
> >> +  git tag current
> >>git checkout HEAD^ &&
> >>test_shallow_cache_tree
> >> 
> >> The && chainis broken here.
> >> Does the test now pass, because "git tag" is added ?
> >
> > The tag does not cause the cache-tree to be created, so git tag does not
> > cause the test to pass.
> 
> That does not explain why it is a good idea to declare success of
> this test if this new "git tag current" fails here for whatever
> reason (e.g. somebody updated "git tag" for a reason that is
> completely unrelated to cache-tree and made it segfault without
> creating the "current" tag).

Indeed; that's why the latest version includes &&.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] cache-tree: Create/update cache-tree on checkout

2014-07-01 Thread Junio C Hamano
David Turner  writes:

> When git checkout checks out a branch, create or update the
> cache-tree so that subsequent operations are faster.
>
> Signed-off-by: David Turner 
> ---

Could you number your patches e.g. [PATCH v4 1/3] and/or summarize
below the three-dash line what got updated since the last round?
The readers can guess without when one is sending a reroll every
other day or less frequently, but with rerolls more often than that,
it gets unwieldy to check which points raised during the review have
been addressed.

Thanks.

>  builtin/checkout.c|  8 
>  cache-tree.c  |  5 +++--
>  t/t0090-cache-tree.sh | 19 ---
>  3 files changed, 27 insertions(+), 5 deletions(-)
>
> diff --git a/builtin/checkout.c b/builtin/checkout.c
> index 07cf555..a023a86 100644
> --- a/builtin/checkout.c
> +++ b/builtin/checkout.c
> @@ -553,6 +553,14 @@ static int merge_working_tree(const struct checkout_opts 
> *opts,
>   }
>   }
>  
> + if (!active_cache_tree)
> + active_cache_tree = cache_tree();
> +
> + if (!cache_tree_fully_valid(active_cache_tree))
> + cache_tree_update(active_cache_tree,
> +   (const struct cache_entry * const 
> *)active_cache,
> +   active_nr, 0);
> +
>   if (write_cache(newfd, active_cache, active_nr) ||
>   commit_locked_index(lock_file))
>   die(_("unable to write new index file"));
> diff --git a/cache-tree.c b/cache-tree.c
> index 7fa524a..28d2356 100644
> --- a/cache-tree.c
> +++ b/cache-tree.c
> @@ -612,9 +612,10 @@ int write_cache_as_tree(unsigned char *sha1, int flags, 
> const char *prefix)
>   cache_tree_find(active_cache_tree, prefix);
>   if (!subtree)
>   return WRITE_TREE_PREFIX_ERROR;
> - hashcpy(sha1, subtree->sha1);
> + if (sha1)
> + hashcpy(sha1, subtree->sha1);
>   }
> - else
> + else if (sha1)
>   hashcpy(sha1, active_cache_tree->sha1);
>  
>   if (0 <= newfd)
> diff --git a/t/t0090-cache-tree.sh b/t/t0090-cache-tree.sh
> index 6c33e28..98fb1ab 100755
> --- a/t/t0090-cache-tree.sh
> +++ b/t/t0090-cache-tree.sh
> @@ -44,14 +44,14 @@ test_expect_success 'read-tree HEAD establishes 
> cache-tree' '
>  
>  test_expect_success 'git-add invalidates cache-tree' '
>   test_when_finished "git reset --hard; git read-tree HEAD" &&
> - echo "I changed this file" > foo &&
> + echo "I changed this file" >foo &&
>   git add foo &&
>   test_invalid_cache_tree
>  '
>  
>  test_expect_success 'update-index invalidates cache-tree' '
>   test_when_finished "git reset --hard; git read-tree HEAD" &&
> - echo "I changed this file" > foo &&
> + echo "I changed this file" >foo &&
>   git update-index --add foo &&
>   test_invalid_cache_tree
>  '
> @@ -85,9 +85,22 @@ test_expect_success 'reset --hard without index gives 
> cache-tree' '
>   test_shallow_cache_tree
>  '
>  
> -test_expect_failure 'checkout gives cache-tree' '
> +test_expect_success 'checkout gives cache-tree' '
> + git tag current &&
>   git checkout HEAD^ &&
>   test_shallow_cache_tree
>  '
>  
> +test_expect_success 'checkout -b gives cache-tree' '
> + git checkout current &&
> + git checkout -b prev HEAD^ &&
> + test_shallow_cache_tree
> +'
> +
> +test_expect_success 'checkout -B gives cache-tree' '
> + git checkout current &&
> + git checkout -B prev HEAD^ &&
> + test_shallow_cache_tree
> +'
> +
>  test_done
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] cache-tree: Create/update cache-tree on checkout

2014-07-01 Thread Junio C Hamano
David Turner  writes:

> On Tue, 2014-07-01 at 06:16 +0200, Torsten Bögershausen wrote:
>> diff --git a/t/t0090-cache-tree.sh b/t/t0090-cache-tree.sh
>> index 6c33e28..7c60675 100755
>> --- a/t/t0090-cache-tree.sh
>> +++ b/t/t0090-cache-tree.sh
>> @@ -85,9 +85,22 @@ test_expect_success 'reset --hard without index gives 
>> cache-tree' '
>>  test_shallow_cache_tree
>>   '
>>   
>> -test_expect_failure 'checkout gives cache-tree' '
>> +test_expect_success 'checkout gives cache-tree' '
>> +git tag current
>>  git checkout HEAD^ &&
>>  test_shallow_cache_tree
>> 
>> The && chainis broken here.
>> Does the test now pass, because "git tag" is added ?
>
> The tag does not cause the cache-tree to be created, so git tag does not
> cause the test to pass.

That does not explain why it is a good idea to declare success of
this test if this new "git tag current" fails here for whatever
reason (e.g. somebody updated "git tag" for a reason that is
completely unrelated to cache-tree and made it segfault without
creating the "current" tag).
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] cache-tree: Create/update cache-tree on checkout

2014-07-01 Thread Junio C Hamano
David Turner  writes:

> When git checkout checks out a branch, create or update the
> cache-tree so that subsequent operations are faster.
>
> Signed-off-by: David Turner 
> ---
>  builtin/checkout.c|  8 
>  cache-tree.c  |  5 +++--
>  t/t0090-cache-tree.sh | 15 ++-
>  3 files changed, 25 insertions(+), 3 deletions(-)
>
> diff --git a/builtin/checkout.c b/builtin/checkout.c
> index 07cf555..a023a86 100644
> --- a/builtin/checkout.c
> +++ b/builtin/checkout.c
> @@ -553,6 +553,14 @@ static int merge_working_tree(const struct checkout_opts 
> *opts,
>   }
>   }
>  
> + if (!active_cache_tree)
> + active_cache_tree = cache_tree();
> +
> + if (!cache_tree_fully_valid(active_cache_tree))
> + cache_tree_update(active_cache_tree,
> +   (const struct cache_entry * const 
> *)active_cache,
> +   active_nr, 0);
> +

This looks much better than the previous round, but it still does
allow verify_cache() to throw noises against unmerged entries in the
cache, as WRITE_TREE_SILENT flag is not passed down, no?

$ git checkout master^0
$ git am $this_message
$ make
$ edit builtin/checkout.c ;# make changes to the above lines
$ ./git checkout -m master^0
x   builtin/checkout.c: unmerged (972c8a7b28f16f88475268f9a714048c228e69db)
x   builtin/checkout.c: unmerged (f1dc56e55f7b2200412142b10517458ccfda2952)
x   builtin/checkout.c: unmerged (3b9753ba8c19e7dfe6e922f30eb85c83a92a4596)
M   builtin/checkout.c
Warning: you are leaving 1 commit behind, not connected to
any of your branches:

  25fab54 cache-tree: Create/update cache-tree on checkout

Switched to branch 'master'

Passing WRITE_TREE_SILENT in the flags parameter will get rid of the
conflict notice output from the above.

The user is not interested in writing a brand new tree object at all
in this case, so it feels wrong to actually let the call chain go
down to update_one() and create new tree objects.

Side note.  And passing WRITE_TREE_DRY_RUN is not a good
solution either, because a later write_cache_as_tree() will
not create the necessary tree object once you stuff a tree
object name in the cache-tree.

What we want in this code path is a way to repair a sub cache_tree
if it can be repaired without creating a new tree object and
otherwise leave that part invalid.  The existing cache-tree
framework is not prepared to do that kind of thing.  It wants to
start from the bottom and percolate things up, computing levels
nearer to the top-level only after it fully created the trees for
deeper levels, because it is meant to be used only when we really
want to write out trees.  We may want to reuse update_one() but

I am not convinced that doing an equivalent of write-tree when you
switch branches is the right approach in the first place.  You will
eventually write it out as a tree, and having a relatively undamaged
cache-tree will help you when you do so, but spending the cycles
necessary to compute a fully populated cache-tree, only to let it
degrade over time until you are ready to write it out as a tree,
somehow sounds like asking for a duplicated work upfront.

By the way, you seem to have touched write_cache_as_tree() in the
same patch, but I am not sure what makes the change necessary.  I do
not see a new caller to it that passes a NULL to its sha1 parameter.

> diff --git a/cache-tree.c b/cache-tree.c
> index 7fa524a..28d2356 100644
> --- a/cache-tree.c
> +++ b/cache-tree.c
> @@ -612,9 +612,10 @@ int write_cache_as_tree(unsigned char *sha1, int flags, 
> const char *prefix)
>   cache_tree_find(active_cache_tree, prefix);
>   if (!subtree)
>   return WRITE_TREE_PREFIX_ERROR;
> - hashcpy(sha1, subtree->sha1);
> + if (sha1)
> + hashcpy(sha1, subtree->sha1);
>   }
> - else
> + else if (sha1)
>   hashcpy(sha1, active_cache_tree->sha1);
>  
>   if (0 <= newfd)
> diff --git a/t/t0090-cache-tree.sh b/t/t0090-cache-tree.sh
> index 6c33e28..7c60675 100755
> --- a/t/t0090-cache-tree.sh
> +++ b/t/t0090-cache-tree.sh
> @@ -85,9 +85,22 @@ test_expect_success 'reset --hard without index gives 
> cache-tree' '
>   test_shallow_cache_tree
>  '
>  
> -test_expect_failure 'checkout gives cache-tree' '
> +test_expect_success 'checkout gives cache-tree' '
> + git tag current
>   git checkout HEAD^ &&
>   test_shallow_cache_tree
>  '
>  
> +test_expect_success 'checkout -b gives cache-tree' '
> + git checkout current &&
> + git checkout -b prev HEAD^ &&
> + test_shallow_cache_tree
> +'
> +
> +test_expect_success 'checkout -B gives cache-tree' '
> + git checkout current &&
> + git checkout -B prev HEAD^ &&
> + test_shallow_cache_tree
> +'
> +
>  test_done
--
To unsubscr

Re: [PATCH 1/3] cache-tree: Create/update cache-tree on checkout

2014-07-01 Thread David Turner
On Tue, 2014-07-01 at 06:16 +0200, Torsten Bögershausen wrote:
> diff --git a/t/t0090-cache-tree.sh b/t/t0090-cache-tree.sh
> index 6c33e28..7c60675 100755
> --- a/t/t0090-cache-tree.sh
> +++ b/t/t0090-cache-tree.sh
> @@ -85,9 +85,22 @@ test_expect_success 'reset --hard without index gives 
> cache-tree' '
>   test_shallow_cache_tree
>   '
>   
> -test_expect_failure 'checkout gives cache-tree' '
> +test_expect_success 'checkout gives cache-tree' '
> + git tag current
>   git checkout HEAD^ &&
>   test_shallow_cache_tree
> 
> The && chainis broken here.
> Does the test now pass, because "git tag" is added ?

The tag does not cause the cache-tree to be created, so git tag does not
cause the test to pass.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] cache-tree: Create/update cache-tree on checkout

2014-06-30 Thread Torsten Bögershausen


diff --git a/t/t0090-cache-tree.sh b/t/t0090-cache-tree.sh
index 6c33e28..7c60675 100755
--- a/t/t0090-cache-tree.sh
+++ b/t/t0090-cache-tree.sh
@@ -85,9 +85,22 @@ test_expect_success 'reset --hard without index gives 
cache-tree' '
test_shallow_cache_tree
 '
 
-test_expect_failure 'checkout gives cache-tree' '

+test_expect_success 'checkout gives cache-tree' '
+   git tag current
git checkout HEAD^ &&
test_shallow_cache_tree

The && chainis broken here.
Does the test now pass, because "git tag" is added ?
In this case: does it may make sense the keep the old one as it is
an  add a new test case  like this ?

+test_expect_success 'tag and checkout gives cache-tree' '

[]
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] cache-tree: Create/update cache-tree on checkout

2014-06-30 Thread Junio C Hamano
Duy Nguyen  writes:

> I wonder if we should do this in !opts->force code path only. In the
> opts->force code path we could use prime_cache_tree() (like
> read-tree), which is supposedly faster...

Nobody sane should be constantly running "checkout -f", so even if
priming from existing tree objects were faster, it would be adding
complexity to the code to optimize for a wrong case.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] cache-tree: Create/update cache-tree on checkout

2014-06-29 Thread Junio C Hamano
David Turner  writes:

> When git checkout checks out a branch, create or update the
> cache-tree so that subsequent operations are faster.
>
> Signed-off-by: David Turner 
> ---
>  builtin/checkout.c|  4 
>  cache-tree.c  | 22 --
>  cache-tree.h  |  1 +
>  t/t0090-cache-tree.sh | 15 ++-
>  4 files changed, 31 insertions(+), 11 deletions(-)
>
> diff --git a/builtin/checkout.c b/builtin/checkout.c
> index 07cf555..df791e8 100644
> --- a/builtin/checkout.c
> +++ b/builtin/checkout.c
> @@ -553,6 +553,10 @@ static int merge_working_tree(const struct checkout_opts 
> *opts,
>   }
>   }
>  
> + if (write_cache_as_tree(NULL, WRITE_TREE_DO_NOT_WRITE, "")) {
> + warn("Unable to write cache_tree");
> + }
> +

This is not even warn-worthy event, isn't it?  "checkout -m other"
would attempt a three-way merge to replay your local changes
relative to your current HEAD in the context of "other" branch and
will leave conflicts in the index and in the working tree, and it is
perfectly a normal thing that you cannot write the index in such a
state as a tree object.  Perhaps you should check if the index is
unmerged before even attempting to compute the cache tree.

Also it looks very strange that write-cache-as-tree, whose *PRIMARY*
function is to write a tree, receives a "DO NOT WRITE" option here.
I understand that you want a new interface into the cache-tree
subsystem that only updates the cache-tree but it somehow smells
like a sloppy/lazy refactoring that is not done quite right.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] cache-tree: Create/update cache-tree on checkout

2014-06-28 Thread Duy Nguyen
On Sat, Jun 28, 2014 at 7:20 AM, David Turner  wrote:
> When git checkout checks out a branch, create or update the
> cache-tree so that subsequent operations are faster.
>
> Signed-off-by: David Turner 
> ---
>  builtin/checkout.c|  4 
>  cache-tree.c  | 22 --
>  cache-tree.h  |  1 +
>  t/t0090-cache-tree.sh | 15 ++-
>  4 files changed, 31 insertions(+), 11 deletions(-)
>
> diff --git a/builtin/checkout.c b/builtin/checkout.c
> index 07cf555..df791e8 100644
> --- a/builtin/checkout.c
> +++ b/builtin/checkout.c
> @@ -553,6 +553,10 @@ static int merge_working_tree(const struct checkout_opts 
> *opts,
> }
> }
>
> +   if (write_cache_as_tree(NULL, WRITE_TREE_DO_NOT_WRITE, "")) {
> +   warn("Unable to write cache_tree");
> +   }
> +

I wonder if we should do this in !opts->force code path only. In the
opts->force code path we could use prime_cache_tree() (like
read-tree), which is supposedly faster (but may need some tests to be
sure). prime_cache_tree() could be made a bit faster by doing it
during tree traversal in unpack_trees() so we don't have to unpack any
tree objects twice.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html