Re: [PATCH 3/3] t3600: test rm of path with changed leading symlinks
On Thu, Apr 04, 2013 at 09:59:49PM -0700, Junio C Hamano wrote: > > That means the worst case is not the accidental loss of content, > > but rather confusion by the user when a copy of a file another > > part of the tree is removed. > > A copy of a file that is on the filesystem that may not be related > to the project at all may be lost, and the user may not notice the > lossage for quite a while. A symlink that points at /etc/passwd may > cause the file to be removed and the user will hopefully notice, but > if the pointed-at file is something in $HOME/tmp/ that you occasionally > use, you may not notice the lossage immediately, and when you notice > the loss, the only assurance you have is that there is a blob that > records what was lost _somewhere_ in _some_ of your project that had > a symlink that points at $HOME/tmp/ at some point in the past. It's actually quite hard to lose those files. We will only remove the file if it has a matching index entry. So you cannot do: ln -s /etc foo git rm foo/passwd because there is no index entry for foo/passwd. You would have to do: mkdir foo echo content >foo/passwd git add foo/passwd rm -rf foo ln -s /etc foo git rm foo/passwd and then you only lose it if it matches exactly "content". And recovering it, you know that the original path that held the content was called "passwd". So yes, technically you could lose a file outside of the repo and have trouble finding which path it came from later. But in practice, not really. Anyway, it is academic at this point. :) -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] t3600: test rm of path with changed leading symlinks
Jeff King writes: > Here's a replacement for patch 3, then. I wasn't sure if the > editorializing in the last 2 paragraphs should go in the commit message > or the cover letter; feel free to tweak as you see fit. They look fine as they are. > That means the worst case is not the accidental loss of content, > but rather confusion by the user when a copy of a file another > part of the tree is removed. A copy of a file that is on the filesystem that may not be related to the project at all may be lost, and the user may not notice the lossage for quite a while. A symlink that points at /etc/passwd may cause the file to be removed and the user will hopefully notice, but if the pointed-at file is something in $HOME/tmp/ that you occasionally use, you may not notice the lossage immediately, and when you notice the loss, the only assurance you have is that there is a blob that records what was lost _somewhere_ in _some_ of your project that had a symlink that points at $HOME/tmp/ at some point in the past. "Exists somewhere, not lost" is not a very useful assurance, if you ask me ;-) -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] t3600: test rm of path with changed leading symlinks
On Thu, Apr 04, 2013 at 04:33:39PM -0700, Junio C Hamano wrote: > Jeff King writes: > > > So let's drop patch 3. Do we want instead to have an expect_failure > > documenting the correct behavior? > > I think that is very much preferred. Here's a replacement for patch 3, then. I wasn't sure if the editorializing in the last 2 paragraphs should go in the commit message or the cover letter; feel free to tweak as you see fit. -- >8 -- Subject: [PATCH] t3600: document failure of rm across symbolic links If we have a symlink "d" that points to a directory, we should not be able to remove "d/f". In the normal case, where "d/f" does not exist in the index, we already disallow this, as we only remove things that git knows about in the index. So for something like: ln -s /outside/repo foo git add foo git rm foo/bar we will properly produce an error (as there is no index entry for foo/bar). However, if there is an index entry for the path (e.g., because the movement is due to working tree changes that have not yet been reflected in the index), we will happily delete it, even though the path we delete from the filesystem is not the same as the path in the index. This patch documents that failure with a test. While this is a bug, it should not be possible to cause serious data loss with it. For any path that does not have an index entry, we will complain and bail. For a path which does have an index entry, we will do the usual up-to-date content check. So even if the deleted path in the filesystem is not the same as the one we are removing from the index, we do know that they at least have the same content, and that the content is included in HEAD. That means the worst case is not the accidental loss of content, but rather confusion by the user when a copy of a file another part of the tree is removed. Which makes this bug a minor and hard-to-trigger annoyance rather than a data-loss bug (and hence the fix can be saved for a rainy day when somebody feels like working on it). Signed-off-by: Jeff King --- t/t3600-rm.sh | 28 1 file changed, 28 insertions(+) diff --git a/t/t3600-rm.sh b/t/t3600-rm.sh index a2e1a03..0c44e9f 100755 --- a/t/t3600-rm.sh +++ b/t/t3600-rm.sh @@ -659,4 +659,32 @@ test_expect_success 'rm of file when it has become a directory' ' test_path_is_file d/f ' +test_expect_success SYMLINKS 'rm across a symlinked leading path (no index)' ' + rm -rf d e && + mkdir e && + echo content >e/f && + ln -s e d && + git add -A e d && + git commit -m "symlink d to e, e/f exists" && + test_must_fail git rm d/f && + git rev-parse --verify :d && + git rev-parse --verify :e/f && + test -h d && + test_path_is_file e/f +' + +test_expect_failure SYMLINKS 'rm across a symlinked leading path (w/ index)' ' + rm -rf d e && + mkdir d && + echo content >d/f && + git add -A e d && + git commit -m "d/f exists" && + mv d e && + ln -s e d && + test_must_fail git rm d/f && + git rev-parse --verify :d/f && + test -h d && + test_path_is_file e/f +' + test_done -- 1.8.2.rc0.33.gd915649 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] t3600: test rm of path with changed leading symlinks
Jeff King writes: > So let's drop patch 3. Do we want instead to have an expect_failure > documenting the correct behavior? I think that is very much preferred. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] t3600: test rm of path with changed leading symlinks
On Thu, Apr 04, 2013 at 04:12:01PM -0700, Junio C Hamano wrote: > Jeff King writes: > > > Deleting across symlinks inside the repo can be brushed aside with "eh, > > well, it is just another way to mention the same name in the > > filesystem". But deleting anything outside of the repo seems actively > > wrong. > > Yup, you finally caught up ;-) IIRC, such an outside repository > target was the case people realized that "git add" shouldn't see > across symlinks. It would help if you spelled it out rather than making me come to it while arguing against you. ;P > > Hmm. I think you have convinced me (or perhaps I have convinced myself) > > that we should generally not be crossing symlink boundaries in > > translating names between the filesystem and index. I still don't want > > to work on it, though. :) > > That is OK. Just let's not etch a wrong behaviour in stone with > that test. So let's drop patch 3. Do we want instead to have an expect_failure documenting the correct behavior? -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] t3600: test rm of path with changed leading symlinks
Jeff King writes: > Deleting across symlinks inside the repo can be brushed aside with "eh, > well, it is just another way to mention the same name in the > filesystem". But deleting anything outside of the repo seems actively > wrong. Yup, you finally caught up ;-) IIRC, such an outside repository target was the case people realized that "git add" shouldn't see across symlinks. > Hmm. I think you have convinced me (or perhaps I have convinced myself) > that we should generally not be crossing symlink boundaries in > translating names between the filesystem and index. I still don't want > to work on it, though. :) That is OK. Just let's not etch a wrong behaviour in stone with that test. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] t3600: test rm of path with changed leading symlinks
On Thu, Apr 04, 2013 at 01:31:43PM -0700, Junio C Hamano wrote: > Jeff King writes: > > >> If you do this: > >> > >>rm -fr d e > >> mkdir e > >> >e/f > >> ln -s e d > >> git add d/f > >> > >> we do complain that d/f is beyond a symlink (meaning that all you > >> can add is the symlink d that may happen to point at something). > > > > Right, but that is because you are adding a bogus entry to the index; we > > cannot have both 'd' as a symlink and 'd/f' as a path in our git tree. > > But in the removal case, the index manipulation is perfectly reasonable. > > I think you misread me. I am not adding 'd' as a symlink at all. > IIRC, ancient versions of Git got this case wrong and added d/f to > the index, which we later fixed. I think I just spoke sloppily. What is bogus about "d/f" is not that "d" is necessarily in the index right now, but that adding "d/f" implies that "d" is a tree, which it clearly is not. Git maps filesystem symlinks into the index and into its trees without dereferencing them. So whether we have "d" in the index right now or not, "d/f" is wrong conceptually. But I do not think the "we are mis-representing the filesystem" problem applies to this "rm" case. We are not adding anything bogus into the index; on the contrary, we are deleting something that no longer matches the filesystem representation (and is actually the same bogosity that we avoid adding under the rule above). I do agree that it violates git's general behavior with symlinks (i.e., that they are not dereferenced). > I have been hinting that we should do the same safety not to touch > (even compare the contents of) e/f, because the only reason we even > look at it is because it appears beyond a symbolic link 'd'. I can certainly see the safety argument that crossing a symlink at "d" means the resulting "d/f" is not necessarily related to the original "d/f" that is in the index. As I said, I do not mind having the extra protection; my argument was only that the content-check already protects us, so the extra protection is not necessary. And the implication is that I do not feel like working on it. :) I do not mind at all if you drop my third patch (and that is part of the reason I split it out from patch 2, which I do think is a no-brainer), and I am happy to review patches to do the symlink check if you want to work on it. Having made the argument that the content-check is enough, though, I think there is an interesting corner case where it might not be. I don't mind "git rm d/f" deleting "e/f" inside the repository when "d" is a symlink to "e". But what would happen with: rm -rf d ln -s /path/outside/repo d git rm d/f Deleting across symlinks inside the repo can be brushed aside with "eh, well, it is just another way to mention the same name in the filesystem". But deleting anything outside of the repo seems actively wrong. And more on that "brushed aside". I think it is easy in the cases we have been talking about, namely where "d/f" still exists in the index, to think that "git rm d/f" is useful and the question is one of safety: should we delete e/f if it is pointed to? But let us imagine that d/f is _not_ in the index, but "d" is a symlink pointing to some/long/path". The user wants to be lazy and say "git rm d/f", because typing "some/long/path" is too much work. But what happens to the index? We should probably not be removing "some/long/path". Hmm. I think you have convinced me (or perhaps I have convinced myself) that we should generally not be crossing symlink boundaries in translating names between the filesystem and index. I still don't want to work on it, though. :) -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] t3600: test rm of path with changed leading symlinks
Jeff King writes: >> If you do this: >> >> rm -fr d e >> mkdir e >> >e/f >> ln -s e d >> git add d/f >> >> we do complain that d/f is beyond a symlink (meaning that all you >> can add is the symlink d that may happen to point at something). > > Right, but that is because you are adding a bogus entry to the index; we > cannot have both 'd' as a symlink and 'd/f' as a path in our git tree. > But in the removal case, the index manipulation is perfectly reasonable. I think you misread me. I am not adding 'd' as a symlink at all. IIRC, ancient versions of Git got this case wrong and added d/f to the index, which we later fixed. I have been hinting that we should do the same safety not to touch (even compare the contents of) e/f, because the only reason we even look at it is because it appears beyond a symbolic link 'd'. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] t3600: test rm of path with changed leading symlinks
On Thu, Apr 04, 2013 at 12:42:54PM -0700, Junio C Hamano wrote: > Jeff King writes: > > > +test_expect_success SYMLINKS 'replace dir with symlink to dir (same > > content)' ' > > + git reset --hard && > > + rm -rf d e && > > + mkdir e && > > + echo content >e/f && > > + ln -s e d && > > + git rm d/f && > > + test_must_fail git rev-parse --verify :d/f && > > + test -h d && > > + test_path_is_dir e > > +' > > This does not check if e/f still exists in the working tree, and I > suspect "git rm d/f" removes it. I guess I should have been more clear in my test; I think it _should_ be removed (and it is). You do not necessarily care that "d" is now the symlink and not the actual path; it is safe to remove d/f even though it is behind a symlink now, because it has the exact same content that it had before (it is of course important that we still remove the actual d/f index entry, but as far as the working tree goes, we only care that it is safe to remove, and that we remove it). IOW, I should have been more explicit like this: diff --git a/t/t3600-rm.sh b/t/t3600-rm.sh index 9eaec08..3b51a63 100755 --- a/t/t3600-rm.sh +++ b/t/t3600-rm.sh @@ -687,7 +687,8 @@ test_expect_success SYMLINKS 'replace dir with symlink to dir (same content)' ' git rm d/f && test_must_fail git rev-parse --verify :d/f && test -h d && - test_path_is_dir e + test_path_is_dir e && + test_path_is_missing e/f ' test_expect_success SYMLINKS 'replace dir with symlink to dir (new content)' ' > If you do this: > > rm -fr d e > mkdir e > >e/f > ln -s e d > git add d/f > > we do complain that d/f is beyond a symlink (meaning that all you > can add is the symlink d that may happen to point at something). Right, but that is because you are adding a bogus entry to the index; we cannot have both 'd' as a symlink and 'd/f' as a path in our git tree. But in the removal case, the index manipulation is perfectly reasonable. You are deleting the existing "d/f" entry. The only confusion comes from the fact that the working tree does not match that anymore. > Silent removal of e/f that is unrelated to the current project's > tracked contents feels very wrong, and at the same time it looks to > me that it is inconsistent with what we do when adding. > > I need a bit more persuading to understand why it is not a bug, I > think. But that's the point of the two content tests. It _isn't_ unrelated to the current project's tracked contents; it's the exact same content at the same path (albeit accessed via symlinks now). The likely case for this is something like: mv dir somewhere/else ln -s somewhere/else/dir dir I do not mind if you want to insert extra protection to not cross symlink boundaries (which would obviously invalidate my test). But I don't think it is necessary because of the existing content-level protections. Adding extra protections would disallow "git rm dir/file" in the above case, but I don't think it's that inconvenient; the user just has to make the index aware of the typechange first via "git add". -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] t3600: test rm of path with changed leading symlinks
Jeff King writes: > +test_expect_success SYMLINKS 'replace dir with symlink to dir (same > content)' ' > + git reset --hard && > + rm -rf d e && > + mkdir e && > + echo content >e/f && > + ln -s e d && > + git rm d/f && > + test_must_fail git rev-parse --verify :d/f && > + test -h d && > + test_path_is_dir e > +' This does not check if e/f still exists in the working tree, and I suspect "git rm d/f" removes it. If you do this: rm -fr d e mkdir e >e/f ln -s e d git add d/f we do complain that d/f is beyond a symlink (meaning that all you can add is the symlink d that may happen to point at something). Silent removal of e/f that is unrelated to the current project's tracked contents feels very wrong, and at the same time it looks to me that it is inconsistent with what we do when adding. I need a bit more persuading to understand why it is not a bug, I think. > +test_expect_success SYMLINKS 'replace dir with symlink to dir (new content)' > ' > + git reset --hard && > + rm -rf d e && > + mkdir e && > + echo changed >e/f && > + ln -s e d && > + test_must_fail git rm d/f && > + git rev-parse --verify :d/f && > + test -h d && > + test_path_is_file e/f > +' > + > test_done -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] t3600: test rm of path with changed leading symlinks
If we have a path "d/f" but replace "d" with a symlink to a new directory "e", how should we handle "git rm d/f"? It may seem at first like we need new protections to make sure that we do not delete random content from "e/f". However, we are already covered by git-rm's existing protections: it is happy if the working tree file is either already deleted, or if its content matches that of the index and HEAD (and otherwise requires "-f"). Let's add some tests to make sure that these protections remain in place when used across symlinks. We also want to make sure that neither the symlink nor the pointed-to directory is accidentally removed in an attempt to clean up empty elements of the leading path. Signed-off-by: Jeff King --- t/t3600-rm.sh | 43 +++ 1 file changed, 43 insertions(+) diff --git a/t/t3600-rm.sh b/t/t3600-rm.sh index a2e1a03..9eaec08 100755 --- a/t/t3600-rm.sh +++ b/t/t3600-rm.sh @@ -659,4 +659,47 @@ test_expect_success 'rm of file when it has become a directory' ' test_path_is_file d/f ' +test_expect_success 'set up commit with d/f' ' + rm -rf d e && + mkdir d && + echo content >d/f && + git add d && + git commit -m d +' + +test_expect_success SYMLINKS 'replace dir with symlink to dir (file missing)' ' + git reset --hard && + rm -rf d e && + mkdir e && + ln -s e d && + git rm d/f && + test_must_fail git rev-parse --verify :d/f && + test -h d && + test_path_is_dir e +' + +test_expect_success SYMLINKS 'replace dir with symlink to dir (same content)' ' + git reset --hard && + rm -rf d e && + mkdir e && + echo content >e/f && + ln -s e d && + git rm d/f && + test_must_fail git rev-parse --verify :d/f && + test -h d && + test_path_is_dir e +' + +test_expect_success SYMLINKS 'replace dir with symlink to dir (new content)' ' + git reset --hard && + rm -rf d e && + mkdir e && + echo changed >e/f && + ln -s e d && + test_must_fail git rm d/f && + git rev-parse --verify :d/f && + test -h d && + test_path_is_file e/f +' + test_done -- 1.8.2.rc0.33.gd915649 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html