Re: [RFC] git checkout $tree -- $path always rewrites files
David Aguilar dav...@gmail.com writes: On Sun, Nov 09, 2014 at 09:21:49AM -0800, Junio C Hamano wrote: It might be less risky if the updated semantics were to make the paths that are originally in the index but not in $tree untracked (as opposed to reset --hard emulation where they will be lost) unless they need to be removed to make room for D/F conflict issues, but I haven't thought it through. Git has always been really careful to not lose data. One way to avoid the problem of changing existing semantics is to make the new semantics accessible behind a flag, e.g. git checkout --hard HEAD -- some-new-path. Yup, but you seem to be behind by a few exchanges, as we tentatively decided that we won't talk about changing the semantics and concentrate on fixing the implementation glitches only at least for now ;-) I find that --hard is not a very good name for the new mode. There will be different kinds of more than what we usually do modes of operations discovered over time in the coming years, and it is better to be more specific to denote in what way we are doing it harder (I think the difference the proposed new mode has is to also checkout absense of the paths). But in this particular case, making the paths that are absent in $tree we are checking out of into untracked paths (instead of removing) is a right balance of safety---it is similar to git reset HEAD (no --hard) after adding a new path which leaves the file in the working tree. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] git checkout $tree -- $path always rewrites files
On Sun, Nov 09, 2014 at 09:21:49AM -0800, Junio C Hamano wrote: Jeff King p...@peff.net writes: On Fri, Nov 07, 2014 at 11:35:59PM -0800, Junio C Hamano wrote: I think that has direct linkage; what you have in mind I think is http://thread.gmane.org/gmane.comp.version-control.git/234903/focus=234935 Thanks for that link. It was one of the items in the git blame leftover bits list (websearch for that exact phrase), so I didn't have to do any digging just for this thread ;-) But I made a huge typo above. s/I think/I do not think/; Oh. That might explain some of my confusion. :) The original observation you made in this thread is that when git checkout $tree - $pathspec, whose defintion is to grab paths in the $tree that match $pathspec and put them in the index, and then overwrite the working tree paths with the contents of these paths that are updated in the index with the contents taken from the $tree, unnecessarily rewrites the working tree paths even when they already had contents that were up to date. That is what I called an implementation glitch. The old thread is a different topic. [...] Right, I do agree that these things don't need to be linked. The reason I ended up dealing with the deletion thing is that one obvious way to implement do not touch entries which have not changed is by running a diff between $tree and the index. But doing a diff means we have to reconsider all of the code surrounding deletions (and handling of unmerged entries in the pathspec but not in $tree). I tried a few variants and had trouble making it work (getting caught up either in the make sure each pathspec matched code, or the treat unmerged entries specially behavior). In the patch below, I ended up retaining the existing read_tree_recursive code, and just specially handling the replacement of index entries (which is itself sort of a diff, but it fits nicely into the existing scheme). I'd prefer that these two to be treated separately. Yeah, that makes sense after reading your emails. What I was really unclear on was whether the handling of deletion was a bug or a design choice, and it is the latter (if it were the former, we would not need a transition plan :) ). Anyway, here is the patch. -- 8 -- Subject: checkout $tree: do not throw away unchanged index entries When we git checkout $tree, we pull paths from $tree into the index, and then check the resulting entries out to the worktree. Our method for the first step is rather heavy-handed, though; it clobbers the entire existing index entry, even if the content is the same. This means we lose our stat information, leading checkout_entry to later rewrite the entire file with identical content. Instead, let's see if we have the identical entry already in the index, in which case we leave it in place. That lets checkout_entry do the right thing. Our tests cover two interesting cases: 1. We make sure that a file which has no changes is not rewritten. 2. We make sure that we do update a file that is unchanged in the index (versus $tree), but has working tree changes. We keep the old index entry, and checkout_entry is able to realize that our stat information is out of date. Signed-off-by: Jeff King p...@peff.net --- Note that the test refreshes the index manually (because we are tweaking the timestamp of file2). In normal use this should not be necessary (i.e., your entries should generally be uptodate). I did wonder if checkout should be refreshing the index itself, but it would a bunch of extra lstats in the common case. builtin/checkout.c| 31 +-- t/t2022-checkout-paths.sh | 17 + 2 files changed, 42 insertions(+), 6 deletions(-) diff --git a/builtin/checkout.c b/builtin/checkout.c index 5410dac..67cab4e 100644 --- a/builtin/checkout.c +++ b/builtin/checkout.c @@ -65,21 +65,40 @@ static int post_checkout_hook(struct commit *old, struct commit *new, static int update_some(const unsigned char *sha1, const char *base, int baselen, const char *pathname, unsigned mode, int stage, void *context) { - int len; + struct strbuf buf = STRBUF_INIT; struct cache_entry *ce; + int pos; if (S_ISDIR(mode)) return READ_TREE_RECURSIVE; - len = baselen + strlen(pathname); - ce = xcalloc(1, cache_entry_size(len)); + strbuf_add(buf, base, baselen); + strbuf_addstr(buf, pathname); + + /* +* If the entry is the same as the current index, we can leave the old +* entry in place. Whether it is UPTODATE or not, checkout_entry will +* do the right thing. +*/ + pos = cache_name_pos(buf.buf, buf.len); + if (pos = 0) { + struct cache_entry *old = active_cache[pos]; + if (create_ce_mode(mode) == old-ce_mode + !hashcmp(old-sha1, sha1)) { + old-ce_flags |=
Re: [RFC] git checkout $tree -- $path always rewrites files
Jeff King p...@peff.net writes: On Sun, Nov 09, 2014 at 09:21:49AM -0800, Junio C Hamano wrote: Jeff King p...@peff.net writes: On Fri, Nov 07, 2014 at 11:35:59PM -0800, Junio C Hamano wrote: I think that has direct linkage; what you have in mind I think is http://thread.gmane.org/gmane.comp.version-control.git/234903/focus=234935 Thanks for that link. It was one of the items in the git blame leftover bits list (websearch for that exact phrase), so I didn't have to do any digging just for this thread ;-) But I made a huge typo above. s/I think/I do not think/; Oh. That might explain some of my confusion. :) Yeah, tells me to never type on a tablet X-. I'd prefer that these two to be treated separately. Yeah, that makes sense after reading your emails. What I was really unclear on was whether the handling of deletion was a bug or a design choice, and it is the latter (if it were the former, we would not need a transition plan :) ). Yeah, I think we agree to refrain from saying if that design choice was a good one or bad one at least for now. Subject: checkout $tree: do not throw away unchanged index entries When we git checkout $tree, we pull paths from $tree into the index, and then check the resulting entries out to the worktree. Our method for the first step is rather heavy-handed, though; it clobbers the entire existing index entry, even if the content is the same. This means we lose our stat information, leading checkout_entry to later rewrite the entire file with identical content. Instead, let's see if we have the identical entry already in the index, in which case we leave it in place. That lets checkout_entry do the right thing. Our tests cover two interesting cases: 1. We make sure that a file which has no changes is not rewritten. 2. We make sure that we do update a file that is unchanged in the index (versus $tree), but has working tree changes. We keep the old index entry, and checkout_entry is able to realize that our stat information is out of date. Signed-off-by: Jeff King p...@peff.net --- Note that the test refreshes the index manually (because we are tweaking the timestamp of file2). In normal use this should not be necessary (i.e., your entries should generally be uptodate). I did wonder if checkout should be refreshing the index itself, but it would a bunch of extra lstats in the common case. builtin/checkout.c| 31 +-- t/t2022-checkout-paths.sh | 17 + 2 files changed, 42 insertions(+), 6 deletions(-) diff --git a/builtin/checkout.c b/builtin/checkout.c index 5410dac..67cab4e 100644 --- a/builtin/checkout.c +++ b/builtin/checkout.c @@ -65,21 +65,40 @@ static int post_checkout_hook(struct commit *old, struct commit *new, static int update_some(const unsigned char *sha1, const char *base, int baselen, const char *pathname, unsigned mode, int stage, void *context) { ... } Makes sense, including the use of strbuf (otherwise you would allocate ce and then discard when it turns out that it is not needed, which is probably with the same allocation pressure, but looks uglier). diff --git a/t/t2022-checkout-paths.sh b/t/t2022-checkout-paths.sh index 8e3545d..f46d049 100755 --- a/t/t2022-checkout-paths.sh +++ b/t/t2022-checkout-paths.sh @@ -61,4 +61,21 @@ test_expect_success 'do not touch unmerged entries matching $path but not in $tr test_cmp expect.next0 actual.next0 ' +test_expect_success 'do not touch files that are already up-to-date' ' + git reset --hard + echo one file1 + echo two file2 + git add file1 file2 + git commit -m base + echo modified file1 + test-chmtime =10 file2 Is the idea behind the hardcoded timestamp that this is sufficiently old (Sep 2001) that we will not get in trouble comparing with the real timestamp we get from the filesystem (which will definitely newer than that anyway) no matter when we run this test (unless you have a time-machine, that is)? + git update-index -q --refresh + git checkout HEAD -- file1 file2 + echo one expect + test_cmp expect file1 + echo 10file2 expect + test-chmtime -v +0 file2 actual + test_cmp expect actual +' + test_done -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] git checkout $tree -- $path always rewrites files
On Thu, Nov 13, 2014 at 11:15:27AM -0800, Junio C Hamano wrote: diff --git a/builtin/checkout.c b/builtin/checkout.c index 5410dac..67cab4e 100644 --- a/builtin/checkout.c +++ b/builtin/checkout.c @@ -65,21 +65,40 @@ static int post_checkout_hook(struct commit *old, struct commit *new, static int update_some(const unsigned char *sha1, const char *base, int baselen, const char *pathname, unsigned mode, int stage, void *context) { ... } Makes sense, including the use of strbuf (otherwise you would allocate ce and then discard when it turns out that it is not needed, which is probably with the same allocation pressure, but looks uglier). Exactly. Constructing it in ce-name does save you an allocation/memcpy in the case that we actually use the new entry, but I thought it would look weirder. It probably doesn't matter much either way, so I tried to write the most obvious thing. (I also experimented with using make_cache_entry at one point, which requires the strbuf; some of my thinking on what looks reasonable may be left over from that approach). +test_expect_success 'do not touch files that are already up-to-date' ' + git reset --hard + echo one file1 + echo two file2 + git add file1 file2 + git commit -m base + echo modified file1 + test-chmtime =10 file2 Is the idea behind the hardcoded timestamp that this is sufficiently old (Sep 2001) that we will not get in trouble comparing with the real timestamp we get from the filesystem (which will definitely newer than that anyway) no matter when we run this test (unless you have a time-machine, that is)? I didn't actually calculate what the timestamp was. The important thing is that it is not the timestamp that your system would give to the file if git-checkout opened and rewrote it. :) I initially used 123, but was worried that would cause weird portability problems on systems. So I opted for something closer to normal, but in the past. I think it is fine (modulo time machines), but I'd be happy to put in some more obvious sentinel, too. And the worst case if you did have a time machine is that we might accidentally declare a buggy git to be correct (racily!). I can live with that, but I guess you could use a relative value (like -1) instead of a fixed sentinel (but then you'd have to record it for the expect check). -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] git checkout $tree -- $path always rewrites files
On Thu, Nov 13, 2014 at 02:26:55PM -0500, Jeff King wrote: Makes sense, including the use of strbuf (otherwise you would allocate ce and then discard when it turns out that it is not needed, which is probably with the same allocation pressure, but looks uglier). Exactly. Constructing it in ce-name does save you an allocation/memcpy in the case that we actually use the new entry, but I thought it would look weirder. It probably doesn't matter much either way, so I tried to write the most obvious thing. Actually, it is not that bad: diff --git a/builtin/checkout.c b/builtin/checkout.c index 5410dac..5a78758 100644 --- a/builtin/checkout.c +++ b/builtin/checkout.c @@ -67,6 +67,7 @@ static int update_some(const unsigned char *sha1, const char *base, int baselen, { int len; struct cache_entry *ce; + int pos; if (S_ISDIR(mode)) return READ_TREE_RECURSIVE; @@ -79,6 +80,23 @@ static int update_some(const unsigned char *sha1, const char *base, int baselen, ce-ce_flags = create_ce_flags(0) | CE_UPDATE; ce-ce_namelen = len; ce-ce_mode = create_ce_mode(mode); + + /* +* If the entry is the same as the current index, we can leave the old +* entry in place. Whether it is UPTODATE or not, checkout_entry will +* do the right thing. +*/ + pos = cache_name_pos(ce-name, ce-ce_namelen); + if (pos = 0) { + struct cache_entry *old = active_cache[pos]; + if (ce-ce_mode == old-ce_mode + !hashcmp(ce-sha1, old-sha1)) { + old-ce_flags |= CE_UPDATE; + free(ce); + return 0; + } + } + add_cache_entry(ce, ADD_CACHE_OK_TO_ADD | ADD_CACHE_OK_TO_REPLACE); return 0; } and in some ways more readable, as you form the whole thing, and then as the final step either add it, or realize that what is there is fine (I'd almost wonder if it could be a flag to add_cache_entry). -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] git checkout $tree -- $path always rewrites files
Jeff King p...@peff.net writes: On Thu, Nov 13, 2014 at 02:26:55PM -0500, Jeff King wrote: Makes sense, including the use of strbuf (otherwise you would allocate ce and then discard when it turns out that it is not needed, which is probably with the same allocation pressure, but looks uglier). Exactly. Constructing it in ce-name does save you an allocation/memcpy in the case that we actually use the new entry, but I thought it would look weirder. It probably doesn't matter much either way, so I tried to write the most obvious thing. Actually, it is not that bad: Yeah, actually it does look better; want me to squash it into the patch before queuing? diff --git a/builtin/checkout.c b/builtin/checkout.c index 5410dac..5a78758 100644 --- a/builtin/checkout.c +++ b/builtin/checkout.c @@ -67,6 +67,7 @@ static int update_some(const unsigned char *sha1, const char *base, int baselen, { int len; struct cache_entry *ce; + int pos; if (S_ISDIR(mode)) return READ_TREE_RECURSIVE; @@ -79,6 +80,23 @@ static int update_some(const unsigned char *sha1, const char *base, int baselen, ce-ce_flags = create_ce_flags(0) | CE_UPDATE; ce-ce_namelen = len; ce-ce_mode = create_ce_mode(mode); + + /* + * If the entry is the same as the current index, we can leave the old + * entry in place. Whether it is UPTODATE or not, checkout_entry will + * do the right thing. + */ + pos = cache_name_pos(ce-name, ce-ce_namelen); + if (pos = 0) { + struct cache_entry *old = active_cache[pos]; + if (ce-ce_mode == old-ce_mode + !hashcmp(ce-sha1, old-sha1)) { + old-ce_flags |= CE_UPDATE; + free(ce); + return 0; + } + } + add_cache_entry(ce, ADD_CACHE_OK_TO_ADD | ADD_CACHE_OK_TO_REPLACE); return 0; } and in some ways more readable, as you form the whole thing, and then as the final step either add it, or realize that what is there is fine (I'd almost wonder if it could be a flag to add_cache_entry). -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] git checkout $tree -- $path always rewrites files
On Thu, Nov 13, 2014 at 01:18:43PM -0800, Junio C Hamano wrote: Jeff King p...@peff.net writes: On Thu, Nov 13, 2014 at 02:26:55PM -0500, Jeff King wrote: Makes sense, including the use of strbuf (otherwise you would allocate ce and then discard when it turns out that it is not needed, which is probably with the same allocation pressure, but looks uglier). Exactly. Constructing it in ce-name does save you an allocation/memcpy in the case that we actually use the new entry, but I thought it would look weirder. It probably doesn't matter much either way, so I tried to write the most obvious thing. Actually, it is not that bad: Yeah, actually it does look better; want me to squash it into the patch before queuing? Yeah, if you like it, too, then let's go with it. Thanks. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] git checkout $tree -- $path always rewrites files
On Sun, Nov 09, 2014 at 09:21:49AM -0800, Junio C Hamano wrote: Jeff King p...@peff.net writes: So just to be clear, the behavior we want is that: echo foo some-new-path git add some-new-path git checkout HEAD -- . will delete some-new-path (whereas the current code turns it into an untracked file). With the updated semantics proposed in the old thread, yes, that is what should happen. git checkout HEAD -- some-new-path do in that case? Likewise. And if some-new-path were a directory, with existing path O and new path N both in the index but only the former in HEAD, the operation would revert some-new-path/O to that of HEAD and remove some-new-path/N. That is the only logical thing we could do if we were to take the updated sematics. That is one of the reasons why I am not 100% convinced that the proposed updated semantics is better, even though I was fairly positive in the old discussion and also I kept the topic in the leftover bits list. The above command is a fairly common way to say I started refactoring the existing path some-path/O and sprinkled its original contents spread into new files A, B and C in the same directory. Now I no longer have O in the working tree, but let me double check by grabbing it out of the state recoded in the commit. You expect that git checkout HEAD -- some-path would not lose A, B or C, knowing some-path only had O. That expectation would even be stronger if you are used to the current semantics, but that is something we could fix, if we decide that the proposed updated semantics is better, with a careful transition plan. It might be less risky if the updated semantics were to make the paths that are originally in the index but not in $tree untracked (as opposed to reset --hard emulation where they will be lost) unless they need to be removed to make room for D/F conflict issues, but I haven't thought it through. Git has always been really careful to not lose data. One way to avoid the problem of changing existing semantics is to make the new semantics accessible behind a flag, e.g. git checkout --hard HEAD -- some-new-path. -- David -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] git checkout $tree -- $path always rewrites files
On Sat, Nov 08, 2014 at 08:19:21AM -0800, Martin von Zweigbergk wrote: What should: git checkout HEAD -- some-new-path do in that case? With the current code, it actually barfs, complaining that nothing matched some-new-path (because it is not part of HEAD, and therefore we don't consider it at all), and aborts the whole operation. I think we would want to delete some-new-path in that case, too. I don't think we'd want it to be deleted. I would view 'git reset --hard' as the role model here, and that command (without paths) would not remove the file. And applying it to a path should not change the behavior, just restrict it to the paths, right? Are you sure about git reset here? If I do: git init echo content file git add file git commit -m base echo modified file echo new some-new-path git add file some-new-path git reset --hard then we delete some-new-path (it is not untracked, because the index knows about it). That makes sense to me. I.e., we treat it with the same preciousness whether it is named explicitly or not. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] git checkout $tree -- $path always rewrites files
Jeff King p...@peff.net writes: On Fri, Nov 07, 2014 at 11:35:59PM -0800, Junio C Hamano wrote: I think that has direct linkage; what you have in mind I think is http://thread.gmane.org/gmane.comp.version-control.git/234903/focus=234935 Thanks for that link. It was one of the items in the git blame leftover bits list (websearch for that exact phrase), so I didn't have to do any digging just for this thread ;-) But I made a huge typo above. s/I think/I do not think/; The original observation you made in this thread is that when git checkout $tree - $pathspec, whose defintion is to grab paths in the $tree that match $pathspec and put them in the index, and then overwrite the working tree paths with the contents of these paths that are updated in the index with the contents taken from the $tree, unnecessarily rewrites the working tree paths even when they already had contents that were up to date. That is what I called an implementation glitch. The old thread is a different topic. It is about changing the semantics of the operation to make paths in the index and in the $tree identical, and then update the working tree paths to also match, all with respect to the $pathspec. This, as Martin noted, needs careful debate on the merit and transition plan if we decide that it is worth doing. The do ignore paths that are not in $tree is a deliberate design choice. I'd prefer that these two to be treated separately. That is, even if our plan did not involve changing the semantics of the operation, we would want to fix the implementation glitch. Compare the $tree with the index using $pathspec, adjust the index by adding paths that are missing from the index that are in $tree, but not removing the entries in the index only because they are not in $tree (note: when the index has a path A, the $tree has a path A/B, and the $pathspec says A, we would end up removing A to make room for A/B, and that should be allowed---it does not fall into the only because the path is not in $tree. In such a scenario, we remove A not because $tree does not have A but because A/B that the $tree has is what we were asked to materialize). And after updating the index that way, do an equivalent of git checkout -- $pathspec. The entries that were the same between the $tree and the index will have the up-to-dateness kept and will not unnecessarily rewrite an unmodified path that way, while things that are modified with the operation will be overwritten, I would think. And with that machinery in place, we could start thinking about updating the semantics. It will be a small change to the loop that goes over the result from diff_index() and modifying the code that used to do a not remove only because not in $tree to do a remove if not in $tree. So just to be clear, the behavior we want is that: echo foo some-new-path git add some-new-path git checkout HEAD -- . will delete some-new-path (whereas the current code turns it into an untracked file). With the updated semantics proposed in the old thread, yes, that is what should happen. git checkout HEAD -- some-new-path do in that case? Likewise. And if some-new-path were a directory, with existing path O and new path N both in the index but only the former in HEAD, the operation would revert some-new-path/O to that of HEAD and remove some-new-path/N. That is the only logical thing we could do if we were to take the updated sematics. That is one of the reasons why I am not 100% convinced that the proposed updated semantics is better, even though I was fairly positive in the old discussion and also I kept the topic in the leftover bits list. The above command is a fairly common way to say I started refactoring the existing path some-path/O and sprinkled its original contents spread into new files A, B and C in the same directory. Now I no longer have O in the working tree, but let me double check by grabbing it out of the state recoded in the commit. You expect that git checkout HEAD -- some-path would not lose A, B or C, knowing some-path only had O. That expectation would even be stronger if you are used to the current semantics, but that is something we could fix, if we decide that the proposed updated semantics is better, with a careful transition plan. It might be less risky if the updated semantics were to make the paths that are originally in the index but not in $tree untracked (as opposed to reset --hard emulation where they will be lost) unless they need to be removed to make room for D/F conflict issues, but I haven't thought it through. Thanks. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] git checkout $tree -- $path always rewrites files
Jeff King p...@peff.net writes: On Sat, Nov 08, 2014 at 03:30:40AM -0500, Jeff King wrote: So just to be clear, the behavior we want is that: echo foo some-new-path git add some-new-path git checkout HEAD -- . will delete some-new-path (whereas the current code turns it into an untracked file). What should: git checkout HEAD -- some-new-path do in that case? With the current code, it actually barfs, complaining that nothing matched some-new-path (because it is not part of HEAD, and therefore we don't consider it at all), and aborts the whole operation. I think we would want to delete some-new-path in that case, too. Also, t2022.3 has me very confused. It is explicitly checking that if we have subdir/foo unmerged in the index, and we git checkout $tree -- subdir, and $tree does not mention foo, that we _leave_ foo in place. The definition of how checkout $tree -- $pathspec should behave logically leads to that, I think. Grabbing everything that matches the subdir pathspec from $tree, adding them to the index and checking them out will not touch subdir/foo that does not appear in that $tree. With the proposed updated semantics, it would behave differently, so it is natural that we have tests that protect the traditional definition of how it should behave and we will have to visit them and update their expectation if we decide that the proposed updated semantics is what we want. That seems very counter-intuitive to me. If you asked to make subdir look like $tree, then we should clobber it. Yes. But the existing expectation is different ;-) If the $tree has 'subdir/foo', we would want checkout $tree -- subdir/foo to keep working as a way for the user to declare the resolution of the ongoing merge. With the proposed change in sematics, git checkout $tree -- subdir will become a way to say Everything inside subdir/ I'd resolve to the state recorded in $tree during such a conflicted merge, and it will lose paths not in the $tree. Which may be a good thing, but existing users who are used to the traditional behaviour will find it confusing and even risky (i.e. I am checking OUT; never expected it to lose any path). A counter argument that I find sensible, of course, is You told us to check out subdir/, and the state recorded for subdir/ in $tree does not have 'foo' in it. That is the state you asked us to check out. The argument for checkout $tree -- subdir/foo is essentially the same. The state recorded in $tree for subdir/foo is non-existence, and that is checked out with the proposed new semantics. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] git checkout $tree -- $path always rewrites files
I'm not sure I had seen that particular thread, but it seems like we're all in agreement on that topic. I'm keeping my fingers crossed that Jeff will have time to tackle that this time :-) On Fri, Nov 7, 2014 at 11:35 PM, Junio C Hamano gits...@pobox.com wrote: I think that has direct linkage; what you have in mind I think is http://thread.gmane.org/gmane.comp.version-control.git/234903/focus=234935 What is on thread here is an implementation glitch, not semantic one. Checking out a directory as opposed to set of paths that match the pathspec was a deliberate design choice, not an implementation glitch. Pardon HTML, misspellings and grammos, typed on a tablet. On Nov 7, 2014 11:10 PM, Martin von Zweigbergk martinv...@gmail.com wrote: Trying again from plain old gmail which I think does not send a multipart content. On Fri, Nov 7, 2014 at 11:06 PM, Martin von Zweigbergk martinv...@gmail.com wrote: Is this also related to git checkout $rev . not removing removed files? What you say about the difference in implementation between checkout and reset sounds vaguely familiar from when I looked at it some years ago. Curiously, I just talked to Jonathan about this over lunch yesterday. I think we would both be happy to get that oddity of checkout fixed. If what you mention here is indeed related to fixing that, it does complicate things with regards to backwards compatibility. On Fri Nov 07 2014 at 11:24:09 AM Jeff King p...@peff.net wrote: On Fri, Nov 07, 2014 at 09:14:42AM -0800, Junio C Hamano wrote: Jeff King p...@peff.net writes: Is there a reason that we don't use this diff technique for checkout? I suspect that the reasons are probably mixture of these: (1) the code path may descend from checkout-index and predates the in-core diff machinery; (2) in the context of checkout-index, it was more desirable to be able to say I want the contents on the filesystem, even if you think I already have it there, as you as a new software are likely to be wrong and I know better; or (3) it was easier to code that way ;-) I do not see there is any reason not to do what you suggest. OK. It's not very much code (and can mostly be lifted from git-reset), so I may take a stab at it. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] git checkout $tree -- $path always rewrites files
On Fri, Nov 07, 2014 at 11:35:59PM -0800, Junio C Hamano wrote: I think that has direct linkage; what you have in mind I think is http://thread.gmane.org/gmane.comp.version-control.git/234903/focus=234935 Thanks for that link. I did spend a few hours on this topic earlier today, and got very confused trying to figure out what the deletion behavior _should_ be, and whether I was breaking it. For some reason I had zero recollection of a conversation from last year that I was obviously a major part of. I think I am getting old. :) The end of that thread concludes that a diff-based approach is not going to work, because we need to update the working tree even for files not mentioned by the diff. I do not think that is a show-stopper, though. It just means that we need to load the new index as one step (done now with read_tree_recursive, but ideally using diff), and then walk over the whole resulting index applying our pathspec again (instead of relying on CE_UPDATE flags). This turns out not to be a big deal, because the existing code is already doing most of that second pathspec application anyway. It does it because read_tree_recursive is not smart enough to update the seen bits for the pathspec. But now we would have another reason to do it this way. :) So just to be clear, the behavior we want is that: echo foo some-new-path git add some-new-path git checkout HEAD -- . will delete some-new-path (whereas the current code turns it into an untracked file). What should: git checkout HEAD -- some-new-path do in that case? With the current code, it actually barfs, complaining that nothing matched some-new-path (because it is not part of HEAD, and therefore we don't consider it at all), and aborts the whole operation. I think we would want to delete some-new-path in that case, too. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] git checkout $tree -- $path always rewrites files
On Sat, Nov 08, 2014 at 03:30:40AM -0500, Jeff King wrote: So just to be clear, the behavior we want is that: echo foo some-new-path git add some-new-path git checkout HEAD -- . will delete some-new-path (whereas the current code turns it into an untracked file). What should: git checkout HEAD -- some-new-path do in that case? With the current code, it actually barfs, complaining that nothing matched some-new-path (because it is not part of HEAD, and therefore we don't consider it at all), and aborts the whole operation. I think we would want to delete some-new-path in that case, too. Also, t2022.3 has me very confused. It is explicitly checking that if we have subdir/foo unmerged in the index, and we git checkout $tree -- subdir, and $tree does not mention foo, that we _leave_ foo in place. That seems very counter-intuitive to me. If you asked to make subdir look like $tree, then we should clobber it. That change comes from e721c15 (checkout: avoid unnecessary match_pathspec calls, 2013-03-27), where it is mentioned as a _bugfix_. That in turn references 0a1283b (checkout $tree $path: do not clobber local changes in $path not in $tree, 2011-09-30), which explicitly goes against the goal we are talking about here. It is not make my index and working tree look like $tree at all. So now I'm doubly confused about what we want to do. If we want to retain that behavior, I think we can still cover these cases by marking items missing from $tree as to remove during the diff/update the index phase, and then being more gentle with to remove files (e.g., not clobbering changed worktree files unless -f is given). I am not sure that provides a sane user experience, though. Why is it OK to clobber local changes to a file if we are replacing it with other content, but _not_ if we are replacing it with nothing? Either the content we are losing is valuable or not, but it has nothing to do with what we are replacing. And Junio argued in the thread linked elsewhere that the point of git checkout $tree -- $path is to clobber what is in $path, which I would agree with. I think the argument made in 0a1283b is that git checkout $tree $path is not make $path like $tree, but rather pick bits of $path out of $tree. Which would mean this whole deletion thing we are talking about is completely contrary to that. So which is it? -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] git checkout $tree -- $path always rewrites files
First of all, thanks again for spending time on this. On Sat, Nov 8, 2014 at 12:30 AM, Jeff King p...@peff.net wrote: On Fri, Nov 07, 2014 at 11:35:59PM -0800, Junio C Hamano wrote: So just to be clear, the behavior we want is that: echo foo some-new-path git add some-new-path git checkout HEAD -- . will delete some-new-path (whereas the current code turns it into an untracked file). Yes, I think that's what I would expect. What should: git checkout HEAD -- some-new-path do in that case? With the current code, it actually barfs, complaining that nothing matched some-new-path (because it is not part of HEAD, and therefore we don't consider it at all), and aborts the whole operation. I think we would want to delete some-new-path in that case, too. I don't think we'd want it to be deleted. I would view 'git reset --hard' as the role model here, and that command (without paths) would not remove the file. And applying it to a path should not change the behavior, just restrict it to the paths, right? -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC] git checkout $tree -- $path always rewrites files
I noticed that git checkout $tree -- $path will _always_ unlink and write a new copy of each matching path, even if they are up-to-date with the index and the content in $tree is the same. Here's a simple reproduction: # some repo with a file in it git init echo content foo git add foo git commit -m foo # give the file a known timestamp (it doesn't matter what) perl -e 'utime(123, 123, @ARGV)' foo git update-index --refresh # now check out an identical copy git checkout HEAD -- foo # and check whether it was updated perl -le 'print ((stat)[9]) for @ARGV' foo which yields a modern timestamp, showing that the file was rewritten. If you use git checkout -- foo instead to checkout from the index, that works fine (the final line prints 123). Similarly, if you load the new index and checkout separately, like: git read-tree -m $tree git checkout -- foo that also works. Of course it is not quite the same; read-tree loads the whole index from $tree, whereas checkout would only touch a subset of the entries. And that's the culprit; checkout does not use unpack-trees to only touch the entries that need updating. It uses read_tree_recursive with a pathspec, and just replaces any index entries that match. Below is a patch which makes it do what I expect by silently copying flags and stat-data from the old entry, but I feel like it is going in the wrong direction. Besides causing a few tests to fail (which I suspect is because I implemented it at the wrong layer), it really feels like this should be using unpack-trees or something similar. After all, that is what git reset $tree -- foo does, and it gets this case right (in fact, you can replace the read-tree above with it). It looks like it actually does a pathspec-limited index-diff rather than unpack-trees, but the end effect is the same (and while checkout could just pass update to unpack-trees, we need to handle checking out directly from the index anyway, so I do not think it is a big deal to keep the index update as a separate step). Is there a reason that we don't use this diff technique for checkout? Anyway, here is the (wrong) patch I came up with initially. diff --git a/builtin/checkout.c b/builtin/checkout.c index 29b0f6d..802e556 100644 --- a/builtin/checkout.c +++ b/builtin/checkout.c @@ -273,19 +273,13 @@ static int checkout_paths(const struct checkout_opts *opts, ce-ce_flags = ~CE_MATCHED; if (!opts-ignore_skipworktree ce_skip_worktree(ce)) continue; - if (opts-source_tree !(ce-ce_flags CE_UPDATE)) - /* -* git checkout tree-ish -- path, but this entry -* is in the original index; it will not be checked -* out to the working tree and it does not matter -* if pathspec matched this entry. We will not do -* anything to this entry at all. -*/ - continue; + /* -* Either this entry came from the tree-ish we are -* checking the paths out of, or we are checking out -* of the index. +* If we are checking out from a tree-ish, we already +* did our pathspec matching. However, since we then +* drop the CE_UPDATE flag from any paths that do not +* need updating, we don't know which ones were not +* mentioned and which ones were simply up-to-date. * * If it comes from the tree-ish, we already know it * matches the pathspec and could just stamp diff --git a/read-cache.c b/read-cache.c index 8f3e9eb..fb2240b 100644 --- a/read-cache.c +++ b/read-cache.c @@ -962,8 +962,13 @@ static int add_index_entry_with_check(struct index_state *istate, struct cache_e /* existing match? Just replace it. */ if (pos = 0) { - if (!new_only) + if (!new_only) { + struct cache_entry *old = istate-cache[pos]; + if (!is_null_sha1(ce-sha1) + !hashcmp(old-sha1, ce-sha1)) + copy_cache_entry(ce, old); replace_index_entry(istate, pos, ce); + } return 0; } pos = -pos-1; -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] git checkout $tree -- $path always rewrites files
On Fri, Nov 07, 2014 at 03:13:24AM -0500, Jeff King wrote: I noticed that git checkout $tree -- $path will _always_ unlink and write a new copy of each matching path, even if they are up-to-date with the index and the content in $tree is the same. By the way, one other thing I wondered while looking at this code: when we checkout a working tree file, we unlink the old one and write the new one in-place. Is there a particular reason we do this versus writing to a temporary file and renaming it into place? That would give simultaneous readers a more atomic view. I suspect the answer is something like: you cannot always do a rename, because you might have a typechange, directory becoming a file, or vice versa; so anyone relying on an atomic view during a checkout operation is already Doing It Wrong. Handling a content-change of an existing path would complicate the code, so we do not bother. But I would be curious to hear confirmation of that. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] git checkout $tree -- $path always rewrites files
On Fri, Nov 7, 2014 at 3:38 PM, Jeff King p...@peff.net wrote: On Fri, Nov 07, 2014 at 03:13:24AM -0500, Jeff King wrote: I noticed that git checkout $tree -- $path will _always_ unlink and write a new copy of each matching path, even if they are up-to-date with the index and the content in $tree is the same. By the way, one other thing I wondered while looking at this code: when we checkout a working tree file, we unlink the old one and write the new one in-place. Is there a particular reason we do this versus writing to a temporary file and renaming it into place? That would give simultaneous readers a more atomic view. I suspect the answer is something like: you cannot always do a rename, because you might have a typechange, directory becoming a file, or vice versa; so anyone relying on an atomic view during a checkout operation is already Doing It Wrong. Handling a content-change of an existing path would complicate the code, so we do not bother. Not a confirmation, but it looks like Linus did it just to make sure he had new permissions right, in e447947 (Be much more liberal about the file mode bits. - 2005-04-16). -- Duy -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] git checkout $tree -- $path always rewrites files
Duy Nguyen pclo...@gmail.com writes: On Fri, Nov 7, 2014 at 3:38 PM, Jeff King p...@peff.net wrote: On Fri, Nov 07, 2014 at 03:13:24AM -0500, Jeff King wrote: I noticed that git checkout $tree -- $path will _always_ unlink and write a new copy of each matching path, even if they are up-to-date with the index and the content in $tree is the same. By the way, one other thing I wondered while looking at this code: when we checkout a working tree file, we unlink the old one and write the new one in-place. Is there a particular reason we do this versus writing to a temporary file and renaming it into place? That would give simultaneous readers a more atomic view. I suspect the answer is something like: you cannot always do a rename, because you might have a typechange, directory becoming a file, or vice versa; so anyone relying on an atomic view during a checkout operation is already Doing It Wrong. Handling a content-change of an existing path would complicate the code, so we do not bother. Not a confirmation, but it looks like Linus did it just to make sure he had new permissions right, in e447947 (Be much more liberal about the file mode bits. - 2005-04-16). I think you are referring to the ... to get the new one with the right permissions comment in that patch, but I do not think that affects the choice between (1) unlink and write anew to the final and (2) create a new temporary and rename. Either way, you do not update the existing file in-place and try to checkout the permission bits with chmod(2). -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] git checkout $tree -- $path always rewrites files
Jeff King p...@peff.net writes: Is there a reason that we don't use this diff technique for checkout? I suspect that the reasons are probably mixture of these: (1) the code path may descend from checkout-index and predates the in-core diff machinery; (2) in the context of checkout-index, it was more desirable to be able to say I want the contents on the filesystem, even if you think I already have it there, as you as a new software are likely to be wrong and I know better; or (3) it was easier to code that way ;-) I do not see there is any reason not to do what you suggest. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] git checkout $tree -- $path always rewrites files
On Fri, Nov 07, 2014 at 05:13:47PM +0700, Duy Nguyen wrote: By the way, one other thing I wondered while looking at this code: when we checkout a working tree file, we unlink the old one and write the new one in-place. Is there a particular reason we do this versus writing to a temporary file and renaming it into place? That would give simultaneous readers a more atomic view. I suspect the answer is something like: you cannot always do a rename, because you might have a typechange, directory becoming a file, or vice versa; so anyone relying on an atomic view during a checkout operation is already Doing It Wrong. Handling a content-change of an existing path would complicate the code, so we do not bother. Not a confirmation, but it looks like Linus did it just to make sure he had new permissions right, in e447947 (Be much more liberal about the file mode bits. - 2005-04-16). Thanks for digging that up. I think that only gives us half the story, though. That explains why we would unlink/open instead of relying on just open(O_TRUNC). But I think opening a new tempfile would work the same as the current code in that respect. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] git checkout $tree -- $path always rewrites files
On Fri, Nov 07, 2014 at 09:14:42AM -0800, Junio C Hamano wrote: Jeff King p...@peff.net writes: Is there a reason that we don't use this diff technique for checkout? I suspect that the reasons are probably mixture of these: (1) the code path may descend from checkout-index and predates the in-core diff machinery; (2) in the context of checkout-index, it was more desirable to be able to say I want the contents on the filesystem, even if you think I already have it there, as you as a new software are likely to be wrong and I know better; or (3) it was easier to code that way ;-) I do not see there is any reason not to do what you suggest. OK. It's not very much code (and can mostly be lifted from git-reset), so I may take a stab at it. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] git checkout $tree -- $path always rewrites files
Trying again from plain old gmail which I think does not send a multipart content. On Fri, Nov 7, 2014 at 11:06 PM, Martin von Zweigbergk martinv...@gmail.com wrote: Is this also related to git checkout $rev . not removing removed files? What you say about the difference in implementation between checkout and reset sounds vaguely familiar from when I looked at it some years ago. Curiously, I just talked to Jonathan about this over lunch yesterday. I think we would both be happy to get that oddity of checkout fixed. If what you mention here is indeed related to fixing that, it does complicate things with regards to backwards compatibility. On Fri Nov 07 2014 at 11:24:09 AM Jeff King p...@peff.net wrote: On Fri, Nov 07, 2014 at 09:14:42AM -0800, Junio C Hamano wrote: Jeff King p...@peff.net writes: Is there a reason that we don't use this diff technique for checkout? I suspect that the reasons are probably mixture of these: (1) the code path may descend from checkout-index and predates the in-core diff machinery; (2) in the context of checkout-index, it was more desirable to be able to say I want the contents on the filesystem, even if you think I already have it there, as you as a new software are likely to be wrong and I know better; or (3) it was easier to code that way ;-) I do not see there is any reason not to do what you suggest. OK. It's not very much code (and can mostly be lifted from git-reset), so I may take a stab at it. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html