Re: git reset for index restoration?
On Fri, 2014-05-23 at 06:33 +0700, Duy Nguyen wrote: > On Fri, May 23, 2014 at 5:18 AM, Junio C Hamano wrote: > > ... and the "incrementally repair" Peff talks about would be to > > cover more cases where we may know (either because we have already > > computed it to write out a subtree, or we have just read from a > > known tree to populate a part of the index and we know the paths in > > the index that correspond to that subtree are exactly the same ones > > as found in the tree we read from) parts of the cache-tree can be > > updated with tree object names for subtrees, but we don't do > > anything right now. > > I wanted to do this but it's hard. For "diff --cached" (which should > be where we find out and repair cache-tree), we flatten the trees in > traverse_trees() and let unpack-trees figure out the differences > against the index. The's no direct connection between a change and a > tree. Maybe I missed something. David if you are interested in "git > status" performance, this repairing thing could be important when the > worktree is large because the diff cost increases accordingly if > cache-tree is not fully populated. Empty cache-tree could cost us > nearly the same time we save with avoiding opendir/readdir.. I am interested, and I believe I might be able to start looking into it in a week or two. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: git reset for index restoration?
On Fri, May 23, 2014 at 5:18 AM, Junio C Hamano wrote: > ... and the "incrementally repair" Peff talks about would be to > cover more cases where we may know (either because we have already > computed it to write out a subtree, or we have just read from a > known tree to populate a part of the index and we know the paths in > the index that correspond to that subtree are exactly the same ones > as found in the tree we read from) parts of the cache-tree can be > updated with tree object names for subtrees, but we don't do > anything right now. I wanted to do this but it's hard. For "diff --cached" (which should be where we find out and repair cache-tree), we flatten the trees in traverse_trees() and let unpack-trees figure out the differences against the index. The's no direct connection between a change and a tree. Maybe I missed something. David if you are interested in "git status" performance, this repairing thing could be important when the worktree is large because the diff cost increases accordingly if cache-tree is not fully populated. Empty cache-tree could cost us nearly the same time we save with avoiding opendir/readdir.. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: git reset for index restoration?
David Turner writes: > ... I still believe that the cache-tree behavior would be > suboptimal, ... I do not think anybody doubts that "suboptimal"-ness in this thread. As you saw the "incremental" thing from Peff and my responses to it, there may be more things we could be doing. It just has not been at a ultra high priority, and if we can choose only one change from possibilities, losing the entire cache-tree upon switching branches, like in my two-way read-tree example, would be the thing that would give us the most benefit if we somehow change it. That however is unfortunately not a low-hanging fruit. The two-way merge machinery we use for switching branches wants to populate the index one entry at a time, without having any point where you can say "OK the result in this subdirectory will exactly match this subtree" which would allow us to say "prime the cache tree for that subdirectory with this tree object". -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: git reset for index restoration?
On Thu, 2014-05-22 at 15:29 -0700, Junio C Hamano wrote: > Junio C Hamano writes: > > > But at least my understanding has been that "git commit" (no partial > > commit, write the whole index as a commit) which uses the "git > > write-tree" machinery knows which subtree has what tree object name > > and populates the cache-tree fully. > > Here is what I tried just now. > > $ rm .git/index > $ git read-tree HEAD HEAD > > Note that a single-tree read-tree will populate the cache-tree and > that is why I am forcing "switch branches" 2-way read-tree here, > which I know will discard the cache-tree fully. > > $ ls -l .git/index > -rw-r- 1 jch eng 249440 May 22 15:20 .git/index > $ git checkout HEAD^0 > $ ls -l .git/index > -rw-r- 1 jch eng 249440 May 22 15:21 .git/index > > Still the same size, without cache-tree. > > $ git write-tree > 57361c4add61b638dad1c1c2542edf877f515c48 > $ ls -l .git/index > -rw-r- 1 jch eng 254383 May 22 15:21 .git/index > > The size differences come from the recomputation of the cache tree. > The result is the same if we replace "git write-tree" with a > whole-index commit, e.g. > > $ git commit --allow-empty -m foo > > and test-dump-cache-tree seem to see a fully populated cache-tree > after these steps. I get the same results as you with git write-tree. But I do not get the same results from a whole-index git commit (I tried your exact command-line). That is, when I do git commit with no cache-tree in place, it does not create one. To expand: even if git commit did work for me the way it seems to work for you, I still believe that the cache-tree behavior would be suboptimal, because every time a user switches branches, they lose their cache-tree, and thus all of their git status commands are slow until their first commit. But I am willing to believe that my workflow is atypical, and that most people commit enough soon after switching branches. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: git reset for index restoration?
Junio C Hamano writes: > But at least my understanding has been that "git commit" (no partial > commit, write the whole index as a commit) which uses the "git > write-tree" machinery knows which subtree has what tree object name > and populates the cache-tree fully. Here is what I tried just now. $ rm .git/index $ git read-tree HEAD HEAD Note that a single-tree read-tree will populate the cache-tree and that is why I am forcing "switch branches" 2-way read-tree here, which I know will discard the cache-tree fully. $ ls -l .git/index -rw-r- 1 jch eng 249440 May 22 15:20 .git/index $ git checkout HEAD^0 $ ls -l .git/index -rw-r- 1 jch eng 249440 May 22 15:21 .git/index Still the same size, without cache-tree. $ git write-tree 57361c4add61b638dad1c1c2542edf877f515c48 $ ls -l .git/index -rw-r- 1 jch eng 254383 May 22 15:21 .git/index The size differences come from the recomputation of the cache tree. The result is the same if we replace "git write-tree" with a whole-index commit, e.g. $ git commit --allow-empty -m foo and test-dump-cache-tree seem to see a fully populated cache-tree after these steps. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: git reset for index restoration?
Junio C Hamano writes: > David Turner writes: > >>> Yes. As I said, that should not usually be a problem for those who >>> do the real work (read: commit), at which time write-tree will fully >>> populate the cache-tree. >> >> Git commit does not in fact populate the cache-tree. > > If that is the case, we must have broken the write-tree codepath > over time. > > Of course, "git commit foo" will need to prepare a temporary index > where only the entry "foo" is different from the HEAD version, write > the tree to create a commit, but we do not write out the main index > as a tree after updating the same entry "foo" in it (there may be > other changes relative to HEAD), so its cache-tree is only > invalidated at "foo" when we updating the entry and we do not spend > extra cycles to repopulate it. > > But at least my understanding has been that "git commit" (no partial > commit, write the whole index as a commit) which uses the "git > write-tree" machinery knows which subtree has what tree object name > and populates the cache-tree fully. ... and the "incrementally repair" Peff talks about would be to cover more cases where we may know (either because we have already computed it to write out a subtree, or we have just read from a known tree to populate a part of the index and we know the paths in the index that correspond to that subtree are exactly the same ones as found in the tree we read from) parts of the cache-tree can be updated with tree object names for subtrees, but we don't do anything right now. "git commit foo" (where "foo" is a directory) may be able to say "The cache-tree entry for 'foo' can be updated with the tree object of the new HEAD:foo because we know they must match in the main index", for example. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: git reset for index restoration?
David Turner writes: >> Yes. As I said, that should not usually be a problem for those who >> do the real work (read: commit), at which time write-tree will fully >> populate the cache-tree. > > Git commit does not in fact populate the cache-tree. If that is the case, we must have broken the write-tree codepath over time. Of course, "git commit foo" will need to prepare a temporary index where only the entry "foo" is different from the HEAD version, write the tree to create a commit, but we do not write out the main index as a tree after updating the same entry "foo" in it (there may be other changes relative to HEAD), so its cache-tree is only invalidated at "foo" when we updating the entry and we do not spend extra cycles to repopulate it. But at least my understanding has been that "git commit" (no partial commit, write the whole index as a commit) which uses the "git write-tree" machinery knows which subtree has what tree object name and populates the cache-tree fully. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: git reset for index restoration?
On Thu, 2014-05-22 at 14:58 -0700, Junio C Hamano wrote: > David Turner writes: > > > On Thu, 2014-05-22 at 14:34 -0700, Junio C Hamano wrote: > >> Jeff King writes: > >> > >> > [+cc Junio for cache-tree expertise] > >> > ... > >> > We never call reset_index now, because we handle it via diff. We could > >> > call prime_cache_tree in this case, but I'm not sure if that is a good > >> > idea, because it primes it from scratch (and so it opens up all those > >> > trees that we are trying to avoid touching). I'm not sure if there's an > >> > easy way to update it incrementally; I don't know the cache-tree code > >> > very well. > >> > >> The cache-tree is designed to start in a well-populated state, > >> allowing you to efficiently smudge the part you touched by > >> invalidating while keeping the parts you haven't touched intact. > > > > As far as I can tell, the cache-tree does not in fact ever get into a > > well-populated state (that is, it does not exist at all) under ordinary > > git operation except by git reset --hard. Perhaps this was already > > clear from the previous traffic on the thread, but I wanted to make sure > > Junio was also aware of this. > > Yes. As I said, that should not usually be a problem for those who > do the real work (read: commit), at which time write-tree will fully > populate the cache-tree. Git commit does not in fact populate the cache-tree. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: git reset for index restoration?
David Turner writes: > On Thu, 2014-05-22 at 14:34 -0700, Junio C Hamano wrote: >> Jeff King writes: >> >> > [+cc Junio for cache-tree expertise] >> > ... >> > We never call reset_index now, because we handle it via diff. We could >> > call prime_cache_tree in this case, but I'm not sure if that is a good >> > idea, because it primes it from scratch (and so it opens up all those >> > trees that we are trying to avoid touching). I'm not sure if there's an >> > easy way to update it incrementally; I don't know the cache-tree code >> > very well. >> >> The cache-tree is designed to start in a well-populated state, >> allowing you to efficiently smudge the part you touched by >> invalidating while keeping the parts you haven't touched intact. > > As far as I can tell, the cache-tree does not in fact ever get into a > well-populated state (that is, it does not exist at all) under ordinary > git operation except by git reset --hard. Perhaps this was already > clear from the previous traffic on the thread, but I wanted to make sure > Junio was also aware of this. Yes. As I said, that should not usually be a problem for those who do the real work (read: commit), at which time write-tree will fully populate the cache-tree. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: git reset for index restoration?
On Thu, 2014-05-22 at 14:34 -0700, Junio C Hamano wrote: > Jeff King writes: > > > [+cc Junio for cache-tree expertise] > > ... > > We never call reset_index now, because we handle it via diff. We could > > call prime_cache_tree in this case, but I'm not sure if that is a good > > idea, because it primes it from scratch (and so it opens up all those > > trees that we are trying to avoid touching). I'm not sure if there's an > > easy way to update it incrementally; I don't know the cache-tree code > > very well. > > The cache-tree is designed to start in a well-populated state, > allowing you to efficiently smudge the part you touched by > invalidating while keeping the parts you haven't touched intact. As far as I can tell, the cache-tree does not in fact ever get into a well-populated state (that is, it does not exist at all) under ordinary git operation except by git reset --hard. Perhaps this was already clear from the previous traffic on the thread, but I wanted to make sure Junio was also aware of this. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: git reset for index restoration?
Jeff King writes: > [+cc Junio for cache-tree expertise] > ... > We never call reset_index now, because we handle it via diff. We could > call prime_cache_tree in this case, but I'm not sure if that is a good > idea, because it primes it from scratch (and so it opens up all those > trees that we are trying to avoid touching). I'm not sure if there's an > easy way to update it incrementally; I don't know the cache-tree code > very well. The cache-tree is designed to start in a well-populated state, allowing you to efficiently smudge the part you touched by invalidating while keeping the parts you haven't touched intact. What is missing in its API is a more fine-grained support to let us say "it has degraded too much and we need to bring it into a well-populated state again for it to be truly useful as an optimization." There are only two modes of support to revive a degraded cache-tree, one being write_cache_as_tree(), in which case we have to compute necessary tree object names anyway (so there is no point discarding the result of the computation), and the other being calls to prime-cache-tree, in which we happen to know that the whole index contents must match the whole tree structure represented by one tree object. Both aim to restore the cache-tree into a fully-populated state, and there is no support to populate it "well enough" by doing anything incremental. You can call write-tree side incremental, because it does reuse what is still valid without recomputing tree objects for them---but the result is a fully-populated state. Adding a more fine-grain support is not against the overall design, but it was unclear what such additional API functions should look like, and where we can call them safely, at least back when we were actively improving it. Two that comes to my mind are: - We know that the subtrees down in this directory are degraded too much; write-tree only the subtrees that correspond to this directory without restoring other parts of the tree. - We just populated the index with the subtrees in this directory and know that they should match the tree hierarchy exactly. prime-cache-tree only the parts without affecting other parts of the tree. As with calls to existing (whole-tree) prime-cache-tree, the latter is an error-prone optimization---I think we had cases where we said "after this operation, we know that the index must exactly match the tree we used to muck with the index" and added a call, and later discovered that "must exactly match" was not true. The former forces recomputation, so there is much less safety concern. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: git reset for index restoration?
[+cc Junio for cache-tree expertise] On Thu, May 22, 2014 at 03:09:59PM -0400, Jeff King wrote: > > > does show some improvement. Perhaps "git reset" is not writing out the > > > cache-tree extension? > [...] > > Possibly. There is a call to prime_cache_tree in builtin/reset.c, which > looks like it should trigger during a "mixed" or "hard" reset (and > without arguments, you should have a mixed reset). But it doesn't seem > to get called. I haven't traced it further. So here's what's happening. The prime_cache_tree call is in reset_index, and was added by: commit 6c52ec8a9ab48b50fc8bf9559467d5a4cf7eee3b Author: Thomas Rast Date: Tue Dec 6 18:43:39 2011 +0100 reset: update cache-tree data when appropriate In the case of --mixed and --hard, we throw away the old index and rebuild everything from the tree argument (or HEAD). So we have an opportunity here to fill in the cache-tree data, just as read-tree did. Signed-off-by: Thomas Rast Signed-off-by: Junio C Hamano But that was counteracted by: commit 3fde386a40f38dbaa684c17603e71909b862d021 Author: Martin von Zweigbergk Date: Mon Jan 14 21:47:51 2013 -0800 reset [--mixed]: use diff-based reset whether or not pathspec was given Thanks to b65982b (Optimize "diff-index --cached" using cache-tree, 2009-05-20), resetting with paths is much faster than resetting without paths. Some timings for the linux-2.6 repo to illustrate this (best of five, warm cache): reset reset . real0m0.219s0m0.080s user0m0.140s0m0.040s sys 0m0.070s0m0.030s These two commands should do the same thing, so instead of having the user type the trailing " ." to get the faster do_diff_cache()-based implementation, always use it when doing a mixed reset, with or without paths (so "git reset $rev" would also be faster). Timing "git reset" shows that it indeed becomes as fast as "git reset ." after this patch. Signed-off-by: Martin von Zweigbergk Signed-off-by: Junio C Hamano We never call reset_index now, because we handle it via diff. We could call prime_cache_tree in this case, but I'm not sure if that is a good idea, because it primes it from scratch (and so it opens up all those trees that we are trying to avoid touching). I'm not sure if there's an easy way to update it incrementally; I don't know the cache-tree code very well. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: git reset for index restoration?
On Thu, 2014-05-22 at 14:23 -0400, Jeff King wrote: > On Thu, May 22, 2014 at 02:08:16PM -0400, David Turner wrote: > > > On Thu, 2014-05-22 at 12:46 -0400, Jeff King wrote: > > > On Thu, May 22, 2014 at 12:22:43PM -0400, David Turner wrote: > > > > > > > If I have a git repository with a clean working tree, and I delete the > > > > index, then I can use git reset (with no arguments) to recreate it. > > > > However, when I do recreate it, it doesn't come back the same. I have > > > > not analyzed this in detail, but the effect is that commands like git > > > > status take much longer because they must read objects out of a pack > > > > file. In other words, the index seems to not realize that the index (or > > > > at least most of it) represents the same state as HEAD. If I do git > > > > reset --hard, the index is restored to the original state (it's > > > > byte-for-byte identical), and the pack file is no longer read. > > > > > > Are you sure it's reading a packfile? > > > > Well, it's calling inflate(), and strace says it is reading > > e.g. .git/objects/pack/pack-{idx,pack}. > > > > So, I would say so. > > That seems odd that we would be spending extra time there. We do > inflate() the trees in order to diff the index against HEAD, but we > shouldn't need to inflate any blobs. I just did a fresh clone of linux.git, and it doesn't have TREE (and spends time in inflate). Then I did a reset --hard, and it gained TREE (and stopped spending time in inflate). Then I did a checkout -b, and TREE was lost again (time in inflate). hard reset again (to HEAD, so no change), and TREE returns and status is fast again. Committing doesn't seem to create a TREE extension where one doesn't exist. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: git reset for index restoration?
On Thu, May 22, 2014 at 03:07:49PM -0400, David Turner wrote: > On Thu, 2014-05-22 at 14:39 -0400, Jeff King wrote: > > does show some improvement. Perhaps "git reset" is not writing out the > > cache-tree extension? > > Yes, that seems to be exactly what is going on; the two indexes are > identical up to the point where the TREE extension appears. > > Thanks for clearing that up! > > Do you think that this is a bug in git reset? Possibly. There is a call to prime_cache_tree in builtin/reset.c, which looks like it should trigger during a "mixed" or "hard" reset (and without arguments, you should have a mixed reset). But it doesn't seem to get called. I haven't traced it further. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: git reset for index restoration?
On Thu, 2014-05-22 at 14:39 -0400, Jeff King wrote: > does show some improvement. Perhaps "git reset" is not writing out the > cache-tree extension? Yes, that seems to be exactly what is going on; the two indexes are identical up to the point where the TREE extension appears. Thanks for clearing that up! Do you think that this is a bug in git reset? (I'm also going to reply to your other mail because it seems like it maybe points up some related bug) -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: git reset for index restoration?
On Thu, May 22, 2014 at 02:17:22PM -0400, David Turner wrote: > In fact, git status does not write the index (at least in this context). > And what is slow is not (only) checking over the working tree, but > reading the packs. There should be no need to read files from the ODB > at all, since the index knows those objects' SHA1s, and it can check > those against the working tree's SHA1s. Interestingly, if strace is to > be believed, git status doesn't even do check the working trees SHA1s > after a git reset; maybe reset already sets the stat bits? Keep in mind that "status" is running _two_ diffs. One between HEAD and the index, and one between the index and the working tree. Your timings might be more interesting just run "git diff-index --cached HEAD", which is the index->HEAD half of that, ignoring the working tree state entirely. Naively, running that diff will involve walking the trees and the index in parallel, which means opening a bunch of tree objects. The cache-tree index extension, however, records tree sha1s, which means we can avoid recursing into parts of the comparison. Comparing the two: $ rm .git/index $ git reset $ time git diff-index --cached HEAD real0m0.044s user0m0.040s sys 0m0.000s $ git reset --hard $ time git diff-index --cached HEAD real0m0.012s user0m0.008s sys 0m0.000s does show some improvement. Perhaps "git reset" is not writing out the cache-tree extension? -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: git reset for index restoration?
On Thu, May 22, 2014 at 02:08:16PM -0400, David Turner wrote: > On Thu, 2014-05-22 at 12:46 -0400, Jeff King wrote: > > On Thu, May 22, 2014 at 12:22:43PM -0400, David Turner wrote: > > > > > If I have a git repository with a clean working tree, and I delete the > > > index, then I can use git reset (with no arguments) to recreate it. > > > However, when I do recreate it, it doesn't come back the same. I have > > > not analyzed this in detail, but the effect is that commands like git > > > status take much longer because they must read objects out of a pack > > > file. In other words, the index seems to not realize that the index (or > > > at least most of it) represents the same state as HEAD. If I do git > > > reset --hard, the index is restored to the original state (it's > > > byte-for-byte identical), and the pack file is no longer read. > > > > Are you sure it's reading a packfile? > > Well, it's calling inflate(), and strace says it is reading > e.g. .git/objects/pack/pack-{idx,pack}. > > So, I would say so. That seems odd that we would be spending extra time there. We do inflate() the trees in order to diff the index against HEAD, but we shouldn't need to inflate any blobs. Here it is for me (on linux.git): [before, warm cache] $ time perf record -q git status >/dev/null real0m0.192s user0m0.080s sys 0m0.108s $ perf report | grep -v '#' | head -5 7.46% git [kernel.kallsyms] [k] __d_lookup_rcu 4.55% git libz.so.1.2.8 [.] inflate 3.53% git libc-2.18.so[.] __memcmp_sse4_1 3.46% git [kernel.kallsyms] [k] security_inode_getattr 3.29% git git [.] memihash $ time git reset real0m0.080s user0m0.036s sys 0m0.040s So status is pretty quick, and the time is going to lstat in the kernel, and some tree inflation. Reset is fast, because it has nothing much to do. Now let's kill off the index's stat cache: $ rm .git/index $ time perf record -q git reset real0m0.967s user0m0.780s sys 0m0.180s That took a while. What was it doing? $ perf report | grep -v '#' | head -5 3.23% git [kernel.kallsyms] [k] copy_user_enhanced_fast_string 1.74% git libcrypto.so.1.0.0 [.] 0x0007e010 1.60% git [kernel.kallsyms] [k] __d_lookup_rcu 1.51% git [kernel.kallsyms] [k] page_fault 1.44% git libc-2.18.so[.] __memcmp_sse4_1 Reading files and sha1. We hash the working-tree files here (reset doesn't technically need to refresh the index from the working tree to copy entries from HEAD into the index, but it does it so it can do fancy things like tell you about which files are now out-of-date). Now how does stat fare after this? $ time perf record -q git status >/dev/null real0m0.189s user0m0.088s sys 0m0.096s Looks about the same as before to me. Note that if you use "read-tree" instead of "reset", it _just_ loads the index, and doesn't touch the working tree. If you then run "git status", then _that_ command has to refresh the index, and it will pay the hashing cost. Like: $ rm .git/index $ time git read-tree HEAD real0m0.084s user0m0.064s sys 0m0.016s $ time git status >/dev/null real0m0.833s user0m0.712s sys 0m0.112s All of this is behaving as I would expect. Can you show us a set of commands that deviate from this? -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: git reset for index restoration?
On Thu, 2014-05-22 at 09:46 -0700, Elijah Newren wrote: > On Thu, May 22, 2014 at 9:22 AM, David Turner > wrote: > > If I have a git repository with a clean working tree, and I delete the > > index, then I can use git reset (with no arguments) to recreate it. > > However, when I do recreate it, it doesn't come back the same. I have > > not analyzed this in detail, but the effect is that commands like git > > status take much longer because they must read objects out of a pack > > file. In other words, the index seems to not realize that the index (or > > at least most of it) represents the same state as HEAD. If I do git > > reset --hard, the index is restored to the original state (it's > > byte-for-byte identical), and the pack file is no longer read. > > > > Before I try to dig in to why this should be so, does anyone happen to > > know off the top of their head? Does this constitute a bug in git, or a > > bug in my understanding of git? > > It's not a bug. The index has additional stat-info it tracks about > files -- inode number, mtime, etc. -- so that it can quickly check > whether files are up to date without comparing full file contents in > the working copy to the relevant version from .git/objects. > > 'git reset' means make the index match the commit in HEAD. Sure, so why does git status the need to read the pack file, given that we already know that the index matches HEAD? > It implies > nothing about the working copy files, which could be different. > Although you say that you have a clean working tree, git doesn't check > to verify that, so it has to treat these files as stat-dirty until the > next operation (e.g. git status) fills these in -- an operation that > involves full comparison of the files in the working copy with the > relevant version of the file from under .git/objects. (You may find > 'git update-index --refresh' helpful here.) In fact, git status does not write the index (at least in this context). And what is slow is not (only) checking over the working tree, but reading the packs. There should be no need to read files from the ODB at all, since the index knows those objects' SHA1s, and it can check those against the working tree's SHA1s. Interestingly, if strace is to be believed, git status doesn't even do check the working trees SHA1s after a git reset; maybe reset already sets the stat bits? So that's why I'm confused. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: git reset for index restoration?
On Thu, 2014-05-22 at 12:46 -0400, Jeff King wrote: > On Thu, May 22, 2014 at 12:22:43PM -0400, David Turner wrote: > > > If I have a git repository with a clean working tree, and I delete the > > index, then I can use git reset (with no arguments) to recreate it. > > However, when I do recreate it, it doesn't come back the same. I have > > not analyzed this in detail, but the effect is that commands like git > > status take much longer because they must read objects out of a pack > > file. In other words, the index seems to not realize that the index (or > > at least most of it) represents the same state as HEAD. If I do git > > reset --hard, the index is restored to the original state (it's > > byte-for-byte identical), and the pack file is no longer read. > > Are you sure it's reading a packfile? Well, it's calling inflate(), and strace says it is reading e.g. .git/objects/pack/pack-{idx,pack}. So, I would say so. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: git reset for index restoration?
On Thu, May 22, 2014 at 9:22 AM, David Turner wrote: > If I have a git repository with a clean working tree, and I delete the > index, then I can use git reset (with no arguments) to recreate it. > However, when I do recreate it, it doesn't come back the same. I have > not analyzed this in detail, but the effect is that commands like git > status take much longer because they must read objects out of a pack > file. In other words, the index seems to not realize that the index (or > at least most of it) represents the same state as HEAD. If I do git > reset --hard, the index is restored to the original state (it's > byte-for-byte identical), and the pack file is no longer read. > > Before I try to dig in to why this should be so, does anyone happen to > know off the top of their head? Does this constitute a bug in git, or a > bug in my understanding of git? It's not a bug. The index has additional stat-info it tracks about files -- inode number, mtime, etc. -- so that it can quickly check whether files are up to date without comparing full file contents in the working copy to the relevant version from .git/objects. 'git reset' means make the index match the commit in HEAD. It implies nothing about the working copy files, which could be different. Although you say that you have a clean working tree, git doesn't check to verify that, so it has to treat these files as stat-dirty until the next operation (e.g. git status) fills these in -- an operation that involves full comparison of the files in the working copy with the relevant version of the file from under .git/objects. (You may find 'git update-index --refresh' helpful here.) git reset --hard means not only make the index match the commit in HEAD, but change files in the working copy to match as well. In such a case, git will know that the index matches the working copy (since it is enforcing it), so it can update all the stat-info in the index and future git-status operations will be fast. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: git reset for index restoration?
On Thu, May 22, 2014 at 12:22:43PM -0400, David Turner wrote: > If I have a git repository with a clean working tree, and I delete the > index, then I can use git reset (with no arguments) to recreate it. > However, when I do recreate it, it doesn't come back the same. I have > not analyzed this in detail, but the effect is that commands like git > status take much longer because they must read objects out of a pack > file. In other words, the index seems to not realize that the index (or > at least most of it) represents the same state as HEAD. If I do git > reset --hard, the index is restored to the original state (it's > byte-for-byte identical), and the pack file is no longer read. Are you sure it's reading a packfile? I would expect that it is rather reading the files in the working tree. One of the functions of the index is to maintain a cache of the sha1 of the working tree file content and the stat information. Once that is populated, a subsequent "git status" can then just lstat() the files to see that its sha1 cache is up to date. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
git reset for index restoration?
If I have a git repository with a clean working tree, and I delete the index, then I can use git reset (with no arguments) to recreate it. However, when I do recreate it, it doesn't come back the same. I have not analyzed this in detail, but the effect is that commands like git status take much longer because they must read objects out of a pack file. In other words, the index seems to not realize that the index (or at least most of it) represents the same state as HEAD. If I do git reset --hard, the index is restored to the original state (it's byte-for-byte identical), and the pack file is no longer read. Before I try to dig in to why this should be so, does anyone happen to know off the top of their head? Does this constitute a bug in git, or a bug in my understanding of git? -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html