git reset for index restoration?

2014-05-22 Thread David Turner
If I have a git repository with a clean working tree, and I delete the
index, then I can use git reset (with no arguments) to recreate it.
However, when I do recreate it, it doesn't come back the same.  I have
not analyzed this in detail, but the effect is that commands like git
status take much longer because they must read objects out of a pack
file.  In other words, the index seems to not realize that the index (or
at least most of it) represents the same state as HEAD.  If I do git
reset --hard, the index is restored to the original state (it's
byte-for-byte identical), and the pack file is no longer read.

Before I try to dig in to why this should be so, does anyone happen to
know off the top of their head?  Does this constitute a bug in git, or a
bug in my understanding of git?

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git reset for index restoration?

2014-05-22 Thread Jeff King
On Thu, May 22, 2014 at 12:22:43PM -0400, David Turner wrote:

 If I have a git repository with a clean working tree, and I delete the
 index, then I can use git reset (with no arguments) to recreate it.
 However, when I do recreate it, it doesn't come back the same.  I have
 not analyzed this in detail, but the effect is that commands like git
 status take much longer because they must read objects out of a pack
 file.  In other words, the index seems to not realize that the index (or
 at least most of it) represents the same state as HEAD.  If I do git
 reset --hard, the index is restored to the original state (it's
 byte-for-byte identical), and the pack file is no longer read.

Are you sure it's reading a packfile? I would expect that it is rather
reading the files in the working tree. One of the functions of the index
is to maintain a cache of the sha1 of the working tree file content and
the stat information. Once that is populated, a subsequent git status
can then just lstat() the files to see that its sha1 cache is up to
date.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git reset for index restoration?

2014-05-22 Thread Elijah Newren
On Thu, May 22, 2014 at 9:22 AM, David Turner dtur...@twopensource.com wrote:
 If I have a git repository with a clean working tree, and I delete the
 index, then I can use git reset (with no arguments) to recreate it.
 However, when I do recreate it, it doesn't come back the same.  I have
 not analyzed this in detail, but the effect is that commands like git
 status take much longer because they must read objects out of a pack
 file.  In other words, the index seems to not realize that the index (or
 at least most of it) represents the same state as HEAD.  If I do git
 reset --hard, the index is restored to the original state (it's
 byte-for-byte identical), and the pack file is no longer read.

 Before I try to dig in to why this should be so, does anyone happen to
 know off the top of their head?  Does this constitute a bug in git, or a
 bug in my understanding of git?

It's not a bug.  The index has additional stat-info it tracks about
files -- inode number, mtime, etc. -- so that it can quickly check
whether files are up to date without comparing full file contents in
the working copy to the relevant version from .git/objects.

'git reset' means make the index match the commit in HEAD.  It implies
nothing about the working copy files, which could be different.
Although you say that you have a clean working tree, git doesn't check
to verify that, so it has to treat these files as stat-dirty until the
next operation (e.g. git status) fills these in -- an operation that
involves full comparison of the files in the working copy with the
relevant version of the file from under .git/objects. (You may find
'git update-index --refresh' helpful here.)

git reset --hard means not only make the index match the commit in
HEAD, but change files in the working copy to match as well.  In such
a case, git will know that the index matches the working copy (since
it is enforcing it), so it can update all the stat-info in the index
and future git-status operations will be fast.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git reset for index restoration?

2014-05-22 Thread David Turner
On Thu, 2014-05-22 at 12:46 -0400, Jeff King wrote:
 On Thu, May 22, 2014 at 12:22:43PM -0400, David Turner wrote:
 
  If I have a git repository with a clean working tree, and I delete the
  index, then I can use git reset (with no arguments) to recreate it.
  However, when I do recreate it, it doesn't come back the same.  I have
  not analyzed this in detail, but the effect is that commands like git
  status take much longer because they must read objects out of a pack
  file.  In other words, the index seems to not realize that the index (or
  at least most of it) represents the same state as HEAD.  If I do git
  reset --hard, the index is restored to the original state (it's
  byte-for-byte identical), and the pack file is no longer read.
 
 Are you sure it's reading a packfile?

Well, it's calling inflate(), and strace says it is reading
e.g. .git/objects/pack/pack-{idx,pack}.

So, I would say so.




--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git reset for index restoration?

2014-05-22 Thread David Turner
On Thu, 2014-05-22 at 09:46 -0700, Elijah Newren wrote:
 On Thu, May 22, 2014 at 9:22 AM, David Turner dtur...@twopensource.com 
 wrote:
  If I have a git repository with a clean working tree, and I delete the
  index, then I can use git reset (with no arguments) to recreate it.
  However, when I do recreate it, it doesn't come back the same.  I have
  not analyzed this in detail, but the effect is that commands like git
  status take much longer because they must read objects out of a pack
  file.  In other words, the index seems to not realize that the index (or
  at least most of it) represents the same state as HEAD.  If I do git
  reset --hard, the index is restored to the original state (it's
  byte-for-byte identical), and the pack file is no longer read.
 
  Before I try to dig in to why this should be so, does anyone happen to
  know off the top of their head?  Does this constitute a bug in git, or a
  bug in my understanding of git?
 
 It's not a bug.  The index has additional stat-info it tracks about
 files -- inode number, mtime, etc. -- so that it can quickly check
 whether files are up to date without comparing full file contents in
 the working copy to the relevant version from .git/objects.
 
 'git reset' means make the index match the commit in HEAD. 

Sure, so why does git status the need to read the pack file, given that
we already know that the index matches HEAD?

  It implies
 nothing about the working copy files, which could be different.
 Although you say that you have a clean working tree, git doesn't check
 to verify that, so it has to treat these files as stat-dirty until the
 next operation (e.g. git status) fills these in -- an operation that
 involves full comparison of the files in the working copy with the
 relevant version of the file from under .git/objects. (You may find
 'git update-index --refresh' helpful here.)

In fact, git status does not write the index (at least in this context).
And what is slow is not (only) checking over the working tree, but
reading the packs.  There should be no need to read files from the ODB
at all, since the index knows those objects' SHA1s, and it can check
those against the working tree's SHA1s.  Interestingly, if strace is to
be believed, git status doesn't even do check the working trees SHA1s
after a git reset; maybe reset already sets the stat bits?

So that's why I'm confused.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git reset for index restoration?

2014-05-22 Thread Jeff King
On Thu, May 22, 2014 at 02:08:16PM -0400, David Turner wrote:

 On Thu, 2014-05-22 at 12:46 -0400, Jeff King wrote:
  On Thu, May 22, 2014 at 12:22:43PM -0400, David Turner wrote:
 
   If I have a git repository with a clean working tree, and I delete the
   index, then I can use git reset (with no arguments) to recreate it.
   However, when I do recreate it, it doesn't come back the same.  I have
   not analyzed this in detail, but the effect is that commands like git
   status take much longer because they must read objects out of a pack
   file.  In other words, the index seems to not realize that the index (or
   at least most of it) represents the same state as HEAD.  If I do git
   reset --hard, the index is restored to the original state (it's
   byte-for-byte identical), and the pack file is no longer read.
 
  Are you sure it's reading a packfile?

 Well, it's calling inflate(), and strace says it is reading
 e.g. .git/objects/pack/pack-{idx,pack}.

 So, I would say so.

That seems odd that we would be spending extra time there. We do
inflate() the trees in order to diff the index against HEAD, but we
shouldn't need to inflate any blobs.

Here it is for me (on linux.git):

  [before, warm cache]
  $ time perf record -q git status /dev/null
  real0m0.192s
  user0m0.080s
  sys 0m0.108s

  $ perf report | grep -v '#' | head -5
 7.46%  git  [kernel.kallsyms]   [k] __d_lookup_rcu
 4.55%  git  libz.so.1.2.8   [.] inflate
 3.53%  git  libc-2.18.so[.] __memcmp_sse4_1
 3.46%  git  [kernel.kallsyms]   [k] security_inode_getattr
 3.29%  git  git [.] memihash

  $ time git reset
  real0m0.080s
  user0m0.036s
  sys 0m0.040s

So status is pretty quick, and the time is going to lstat in the kernel,
and some tree inflation. Reset is fast, because it has nothing much to
do. Now let's kill off the index's stat cache:

  $ rm .git/index
  $ time perf record -q git reset
  real0m0.967s
  user0m0.780s
  sys 0m0.180s

That took a while. What was it doing?

  $ perf report | grep -v '#' | head -5
 3.23%  git  [kernel.kallsyms]   [k] copy_user_enhanced_fast_string
 1.74%  git  libcrypto.so.1.0.0  [.] 0x0007e010
 1.60%  git  [kernel.kallsyms]   [k] __d_lookup_rcu
 1.51%  git  [kernel.kallsyms]   [k] page_fault
 1.44%  git  libc-2.18.so[.] __memcmp_sse4_1   

Reading files and sha1. We hash the working-tree files here (reset
doesn't technically need to refresh the index from the working tree to
copy entries from HEAD into the index, but it does it so it can do fancy
things like tell you about which files are now out-of-date).

Now how does stat fare after this?

  $ time perf record -q git status /dev/null
  real0m0.189s
  user0m0.088s
  sys 0m0.096s

Looks about the same as before to me.

Note that if you use read-tree instead of reset, it _just_ loads the
index, and doesn't touch the working tree. If you then run git status,
then _that_ command has to refresh the index, and it will pay the
hashing cost. Like:

  $ rm .git/index
  $ time git read-tree HEAD
  real0m0.084s
  user0m0.064s
  sys 0m0.016s
  $ time git status /dev/null
  real0m0.833s
  user0m0.712s
  sys 0m0.112s

All of this is behaving as I would expect. Can you show us a set of
commands that deviate from this?

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git reset for index restoration?

2014-05-22 Thread Jeff King
On Thu, May 22, 2014 at 02:17:22PM -0400, David Turner wrote:

 In fact, git status does not write the index (at least in this context).
 And what is slow is not (only) checking over the working tree, but
 reading the packs.  There should be no need to read files from the ODB
 at all, since the index knows those objects' SHA1s, and it can check
 those against the working tree's SHA1s.  Interestingly, if strace is to
 be believed, git status doesn't even do check the working trees SHA1s
 after a git reset; maybe reset already sets the stat bits?

Keep in mind that status is running _two_ diffs. One between HEAD and
the index, and one between the index and the working tree.

Your timings might be more interesting just run git diff-index --cached
HEAD, which is the index-HEAD half of that, ignoring the working tree
state entirely.

Naively, running that diff will involve walking the trees and the index
in parallel, which means opening a bunch of tree objects. The cache-tree
index extension, however, records tree sha1s, which means we can avoid
recursing into parts of the comparison.

Comparing the two:

  $ rm .git/index
  $ git reset
  $ time git diff-index --cached HEAD
  real0m0.044s
  user0m0.040s
  sys 0m0.000s

  $ git reset --hard
  $ time git diff-index --cached HEAD
  real0m0.012s
  user0m0.008s
  sys 0m0.000s

does show some improvement. Perhaps git reset is not writing out the
cache-tree extension?

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git reset for index restoration?

2014-05-22 Thread David Turner
On Thu, 2014-05-22 at 14:39 -0400, Jeff King wrote:
 does show some improvement. Perhaps git reset is not writing out the
 cache-tree extension?
 
Yes, that seems to be exactly what is going on; the two indexes are
identical up to the point where the TREE extension appears.

Thanks for clearing that up!  

Do you think that this is a bug in git reset?

(I'm also going to reply to your other mail because it seems like it
maybe points up some related bug)

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git reset for index restoration?

2014-05-22 Thread Jeff King
On Thu, May 22, 2014 at 03:07:49PM -0400, David Turner wrote:

 On Thu, 2014-05-22 at 14:39 -0400, Jeff King wrote:
  does show some improvement. Perhaps git reset is not writing out the
  cache-tree extension?
  
 Yes, that seems to be exactly what is going on; the two indexes are
 identical up to the point where the TREE extension appears.
 
 Thanks for clearing that up!  
 
 Do you think that this is a bug in git reset?

Possibly. There is a call to prime_cache_tree in builtin/reset.c, which
looks like it should trigger during a mixed or hard reset (and
without arguments, you should have a mixed reset). But it doesn't seem
to get called. I haven't traced it further.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git reset for index restoration?

2014-05-22 Thread David Turner
On Thu, 2014-05-22 at 14:23 -0400, Jeff King wrote:
 On Thu, May 22, 2014 at 02:08:16PM -0400, David Turner wrote:
 
  On Thu, 2014-05-22 at 12:46 -0400, Jeff King wrote:
   On Thu, May 22, 2014 at 12:22:43PM -0400, David Turner wrote:
  
If I have a git repository with a clean working tree, and I delete the
index, then I can use git reset (with no arguments) to recreate it.
However, when I do recreate it, it doesn't come back the same.  I have
not analyzed this in detail, but the effect is that commands like git
status take much longer because they must read objects out of a pack
file.  In other words, the index seems to not realize that the index (or
at least most of it) represents the same state as HEAD.  If I do git
reset --hard, the index is restored to the original state (it's
byte-for-byte identical), and the pack file is no longer read.
  
   Are you sure it's reading a packfile?
 
  Well, it's calling inflate(), and strace says it is reading
  e.g. .git/objects/pack/pack-{idx,pack}.
 
  So, I would say so.
 
 That seems odd that we would be spending extra time there. We do
 inflate() the trees in order to diff the index against HEAD, but we
 shouldn't need to inflate any blobs.

I just did a fresh clone of linux.git, and it doesn't have TREE (and
spends time in inflate).  Then I did a reset --hard, and it gained TREE
(and stopped spending time in inflate).  Then I did a checkout -b, and
TREE was lost again (time in inflate).  hard reset again (to HEAD, so no
change), and TREE returns and status is fast again.

Committing doesn't seem to create a TREE extension where one doesn't
exist.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git reset for index restoration?

2014-05-22 Thread Jeff King
[+cc Junio for cache-tree expertise]

On Thu, May 22, 2014 at 03:09:59PM -0400, Jeff King wrote:

   does show some improvement. Perhaps git reset is not writing out the
   cache-tree extension?
 [...]
 
 Possibly. There is a call to prime_cache_tree in builtin/reset.c, which
 looks like it should trigger during a mixed or hard reset (and
 without arguments, you should have a mixed reset). But it doesn't seem
 to get called. I haven't traced it further.

So here's what's happening. The prime_cache_tree call is in reset_index,
and was added by:

  commit 6c52ec8a9ab48b50fc8bf9559467d5a4cf7eee3b
  Author: Thomas Rast tr...@student.ethz.ch
  Date:   Tue Dec 6 18:43:39 2011 +0100
  
  reset: update cache-tree data when appropriate
  
  In the case of --mixed and --hard, we throw away the old index and
  rebuild everything from the tree argument (or HEAD).  So we have an
  opportunity here to fill in the cache-tree data, just as read-tree
  did.
  
  Signed-off-by: Thomas Rast tr...@student.ethz.ch
  Signed-off-by: Junio C Hamano gits...@pobox.com

But that was counteracted by:

  commit 3fde386a40f38dbaa684c17603e71909b862d021
  Author: Martin von Zweigbergk martinv...@gmail.com
  Date:   Mon Jan 14 21:47:51 2013 -0800
  
  reset [--mixed]: use diff-based reset whether or not pathspec was given
  
  Thanks to b65982b (Optimize diff-index --cached using cache-tree,
  2009-05-20), resetting with paths is much faster than resetting
  without paths. Some timings for the linux-2.6 repo to illustrate this
  (best of five, warm cache):
  
  reset   reset .
  real0m0.219s0m0.080s
  user0m0.140s0m0.040s
  sys 0m0.070s0m0.030s
  
  These two commands should do the same thing, so instead of having the
  user type the trailing  . to get the faster do_diff_cache()-based
  implementation, always use it when doing a mixed reset, with or
  without paths (so git reset $rev would also be faster).
  
  Timing git reset shows that it indeed becomes as fast as
  git reset . after this patch.
  
  Signed-off-by: Martin von Zweigbergk martinv...@gmail.com
  Signed-off-by: Junio C Hamano gits...@pobox.com

We never call reset_index now, because we handle it via diff.  We could
call prime_cache_tree in this case, but I'm not sure if that is a good
idea, because it primes it from scratch (and so it opens up all those
trees that we are trying to avoid touching). I'm not sure if there's an
easy way to update it incrementally; I don't know the cache-tree code
very well.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git reset for index restoration?

2014-05-22 Thread Junio C Hamano
Jeff King p...@peff.net writes:

 [+cc Junio for cache-tree expertise]
 ...
 We never call reset_index now, because we handle it via diff.  We could
 call prime_cache_tree in this case, but I'm not sure if that is a good
 idea, because it primes it from scratch (and so it opens up all those
 trees that we are trying to avoid touching). I'm not sure if there's an
 easy way to update it incrementally; I don't know the cache-tree code
 very well.

The cache-tree is designed to start in a well-populated state,
allowing you to efficiently smudge the part you touched by
invalidating while keeping the parts you haven't touched intact.

What is missing in its API is a more fine-grained support to let us
say it has degraded too much and we need to bring it into a
well-populated state again for it to be truly useful as an
optimization.  There are only two modes of support to revive a
degraded cache-tree, one being write_cache_as_tree(), in which case
we have to compute necessary tree object names anyway (so there is
no point discarding the result of the computation), and the other
being calls to prime-cache-tree, in which we happen to know that the
whole index contents must match the whole tree structure represented
by one tree object.

Both aim to restore the cache-tree into a fully-populated state, and
there is no support to populate it well enough by doing anything
incremental.  You can call write-tree side incremental, because it
does reuse what is still valid without recomputing tree objects for
them---but the result is a fully-populated state.

Adding a more fine-grain support is not against the overall design,
but it was unclear what such additional API functions should look
like, and where we can call them safely, at least back when we were
actively improving it.  Two that comes to my mind are:

 - We know that the subtrees down in this directory are degraded too
   much; write-tree only the subtrees that correspond to this
   directory without restoring other parts of the tree.

 - We just populated the index with the subtrees in this directory
   and know that they should match the tree hierarchy exactly.
   prime-cache-tree only the parts without affecting other parts of
   the tree.

As with calls to existing (whole-tree) prime-cache-tree, the latter
is an error-prone optimization---I think we had cases where we said
after this operation, we know that the index must exactly match the
tree we used to muck with the index and added a call, and later
discovered that must exactly match was not true.

The former forces recomputation, so there is much less safety
concern.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git reset for index restoration?

2014-05-22 Thread David Turner
On Thu, 2014-05-22 at 14:34 -0700, Junio C Hamano wrote:
 Jeff King p...@peff.net writes:
 
  [+cc Junio for cache-tree expertise]
  ...
  We never call reset_index now, because we handle it via diff.  We could
  call prime_cache_tree in this case, but I'm not sure if that is a good
  idea, because it primes it from scratch (and so it opens up all those
  trees that we are trying to avoid touching). I'm not sure if there's an
  easy way to update it incrementally; I don't know the cache-tree code
  very well.
 
 The cache-tree is designed to start in a well-populated state,
 allowing you to efficiently smudge the part you touched by
 invalidating while keeping the parts you haven't touched intact.

As far as I can tell, the cache-tree does not in fact ever get into a
well-populated state (that is, it does not exist at all) under ordinary
git operation except by git reset --hard.  Perhaps this was already
clear from the previous traffic on the thread, but I wanted to make sure
Junio was also aware of this.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git reset for index restoration?

2014-05-22 Thread Junio C Hamano
David Turner dtur...@twopensource.com writes:

 On Thu, 2014-05-22 at 14:34 -0700, Junio C Hamano wrote:
 Jeff King p...@peff.net writes:
 
  [+cc Junio for cache-tree expertise]
  ...
  We never call reset_index now, because we handle it via diff.  We could
  call prime_cache_tree in this case, but I'm not sure if that is a good
  idea, because it primes it from scratch (and so it opens up all those
  trees that we are trying to avoid touching). I'm not sure if there's an
  easy way to update it incrementally; I don't know the cache-tree code
  very well.
 
 The cache-tree is designed to start in a well-populated state,
 allowing you to efficiently smudge the part you touched by
 invalidating while keeping the parts you haven't touched intact.

 As far as I can tell, the cache-tree does not in fact ever get into a
 well-populated state (that is, it does not exist at all) under ordinary
 git operation except by git reset --hard.  Perhaps this was already
 clear from the previous traffic on the thread, but I wanted to make sure
 Junio was also aware of this.

Yes.  As I said, that should not usually be a problem for those who
do the real work (read: commit), at which time write-tree will fully
populate the cache-tree.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git reset for index restoration?

2014-05-22 Thread Junio C Hamano
David Turner dtur...@twopensource.com writes:

 Yes.  As I said, that should not usually be a problem for those who
 do the real work (read: commit), at which time write-tree will fully
 populate the cache-tree.

 Git commit does not in fact populate the cache-tree.

If that is the case, we must have broken the write-tree codepath
over time.

Of course, git commit foo will need to prepare a temporary index
where only the entry foo is different from the HEAD version, write
the tree to create a commit, but we do not write out the main index
as a tree after updating the same entry foo in it (there may be
other changes relative to HEAD), so its cache-tree is only
invalidated at foo when we updating the entry and we do not spend
extra cycles to repopulate it.

But at least my understanding has been that git commit (no partial
commit, write the whole index as a commit) which uses the git
write-tree machinery knows which subtree has what tree object name
and populates the cache-tree fully.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git reset for index restoration?

2014-05-22 Thread Junio C Hamano
Junio C Hamano gits...@pobox.com writes:

 David Turner dtur...@twopensource.com writes:

 Yes.  As I said, that should not usually be a problem for those who
 do the real work (read: commit), at which time write-tree will fully
 populate the cache-tree.

 Git commit does not in fact populate the cache-tree.

 If that is the case, we must have broken the write-tree codepath
 over time.

 Of course, git commit foo will need to prepare a temporary index
 where only the entry foo is different from the HEAD version, write
 the tree to create a commit, but we do not write out the main index
 as a tree after updating the same entry foo in it (there may be
 other changes relative to HEAD), so its cache-tree is only
 invalidated at foo when we updating the entry and we do not spend
 extra cycles to repopulate it.

 But at least my understanding has been that git commit (no partial
 commit, write the whole index as a commit) which uses the git
 write-tree machinery knows which subtree has what tree object name
 and populates the cache-tree fully.

... and the incrementally repair Peff talks about would be to
cover more cases where we may know (either because we have already
computed it to write out a subtree, or we have just read from a
known tree to populate a part of the index and we know the paths in
the index that correspond to that subtree are exactly the same ones
as found in the tree we read from) parts of the cache-tree can be
updated with tree object names for subtrees, but we don't do
anything right now.  git commit foo (where foo is a directory)
may be able to say The cache-tree entry for 'foo' can be updated
with the tree object of the new HEAD:foo because we know they must
match in the main index, for example.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git reset for index restoration?

2014-05-22 Thread Junio C Hamano
Junio C Hamano gits...@pobox.com writes:

 But at least my understanding has been that git commit (no partial
 commit, write the whole index as a commit) which uses the git
 write-tree machinery knows which subtree has what tree object name
 and populates the cache-tree fully.

Here is what I tried just now.

$ rm .git/index
$ git read-tree HEAD HEAD

Note that a single-tree read-tree will populate the cache-tree and
that is why I am forcing switch branches 2-way read-tree here,
which I know will discard the cache-tree fully.

$ ls -l .git/index
-rw-r- 1 jch eng 249440 May 22 15:20 .git/index
$ git checkout HEAD^0
$ ls -l .git/index
-rw-r- 1 jch eng 249440 May 22 15:21 .git/index

Still the same size, without cache-tree.

$ git write-tree
57361c4add61b638dad1c1c2542edf877f515c48
$ ls -l .git/index
-rw-r- 1 jch eng 254383 May 22 15:21 .git/index

The size differences come from the recomputation of the cache tree.
The result is the same if we replace git write-tree with a
whole-index commit, e.g.

$ git commit --allow-empty -m foo

and test-dump-cache-tree seem to see a fully populated cache-tree
after these steps.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git reset for index restoration?

2014-05-22 Thread David Turner
On Thu, 2014-05-22 at 15:29 -0700, Junio C Hamano wrote:
 Junio C Hamano gits...@pobox.com writes:
 
  But at least my understanding has been that git commit (no partial
  commit, write the whole index as a commit) which uses the git
  write-tree machinery knows which subtree has what tree object name
  and populates the cache-tree fully.
 
 Here is what I tried just now.
 
 $ rm .git/index
 $ git read-tree HEAD HEAD
 
 Note that a single-tree read-tree will populate the cache-tree and
 that is why I am forcing switch branches 2-way read-tree here,
 which I know will discard the cache-tree fully.
 
 $ ls -l .git/index
 -rw-r- 1 jch eng 249440 May 22 15:20 .git/index
 $ git checkout HEAD^0
 $ ls -l .git/index
 -rw-r- 1 jch eng 249440 May 22 15:21 .git/index
 
 Still the same size, without cache-tree.
 
 $ git write-tree
 57361c4add61b638dad1c1c2542edf877f515c48
 $ ls -l .git/index
 -rw-r- 1 jch eng 254383 May 22 15:21 .git/index
 
 The size differences come from the recomputation of the cache tree.
 The result is the same if we replace git write-tree with a
 whole-index commit, e.g.
 
 $ git commit --allow-empty -m foo
 
 and test-dump-cache-tree seem to see a fully populated cache-tree
 after these steps.

I get the same results as you with git write-tree.  But I do not get the
same results from a whole-index git commit (I tried your exact
command-line).  That is, when I do git commit with no cache-tree in
place, it does not create one. 

To expand: even if git commit did work for me the way it seems to work
for you, I still believe that the cache-tree behavior would be
suboptimal, because every time a user switches branches, they lose their
cache-tree, and thus all of their git status commands are slow until
their first commit. But I am willing to believe that my workflow is
atypical, and that most people commit enough soon after switching
branches.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git reset for index restoration?

2014-05-22 Thread Junio C Hamano
David Turner dtur...@twopensource.com writes:

 ... I still believe that the cache-tree behavior would be
 suboptimal, ...

I do not think anybody doubts that suboptimal-ness in this thread.
As you saw the incremental thing from Peff and my responses to it,
there may be more things we could be doing.  It just has not been at
a ultra high priority, and if we can choose only one change from
possibilities, losing the entire cache-tree upon switching branches,
like in my two-way read-tree example, would be the thing that would
give us the most benefit if we somehow change it.

That however is unfortunately not a low-hanging fruit.  The two-way
merge machinery we use for switching branches wants to populate the
index one entry at a time, without having any point where you can
say OK the result in this subdirectory will exactly match this
subtree which would allow us to say prime the cache tree for that
subdirectory with this tree object.



--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git reset for index restoration?

2014-05-22 Thread Duy Nguyen
On Fri, May 23, 2014 at 5:18 AM, Junio C Hamano gits...@pobox.com wrote:
 ... and the incrementally repair Peff talks about would be to
 cover more cases where we may know (either because we have already
 computed it to write out a subtree, or we have just read from a
 known tree to populate a part of the index and we know the paths in
 the index that correspond to that subtree are exactly the same ones
 as found in the tree we read from) parts of the cache-tree can be
 updated with tree object names for subtrees, but we don't do
 anything right now.

I wanted to do this but it's hard. For diff --cached (which should
be where we find out and repair cache-tree), we flatten the trees in
traverse_trees() and let unpack-trees figure out the differences
against the index. The's no direct connection between a change and a
tree. Maybe I missed something. David if you are interested in git
status performance, this repairing thing could be important when the
worktree is large because the diff cost increases accordingly if
cache-tree is not fully populated. Empty cache-tree could cost us
nearly the same time we save with avoiding opendir/readdir..
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git reset for index restoration?

2014-05-22 Thread David Turner
On Fri, 2014-05-23 at 06:33 +0700, Duy Nguyen wrote:
 On Fri, May 23, 2014 at 5:18 AM, Junio C Hamano gits...@pobox.com wrote:
  ... and the incrementally repair Peff talks about would be to
  cover more cases where we may know (either because we have already
  computed it to write out a subtree, or we have just read from a
  known tree to populate a part of the index and we know the paths in
  the index that correspond to that subtree are exactly the same ones
  as found in the tree we read from) parts of the cache-tree can be
  updated with tree object names for subtrees, but we don't do
  anything right now.
 
 I wanted to do this but it's hard. For diff --cached (which should
 be where we find out and repair cache-tree), we flatten the trees in
 traverse_trees() and let unpack-trees figure out the differences
 against the index. The's no direct connection between a change and a
 tree. Maybe I missed something. David if you are interested in git
 status performance, this repairing thing could be important when the
 worktree is large because the diff cost increases accordingly if
 cache-tree is not fully populated. Empty cache-tree could cost us
 nearly the same time we save with avoiding opendir/readdir..

I am interested, and I believe I might be able to start looking into it
in a week or two.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html