Re: [PATCH 3/3] t3600: test rm of path with changed leading symlinks

2013-04-04 Thread Jeff King
On Thu, Apr 04, 2013 at 09:59:49PM -0700, Junio C Hamano wrote:

> > That means the worst case is not the accidental loss of content,
> > but rather confusion by the user when a copy of a file another
> > part of the tree is removed.
> 
> A copy of a file that is on the filesystem that may not be related
> to the project at all may be lost, and the user may not notice the
> lossage for quite a while.  A symlink that points at /etc/passwd may
> cause the file to be removed and the user will hopefully notice, but
> if the pointed-at file is something in $HOME/tmp/ that you occasionally
> use, you may not notice the lossage immediately, and when you notice
> the loss, the only assurance you have is that there is a blob that
> records what was lost _somewhere_ in _some_ of your project that had
> a symlink that points at $HOME/tmp/ at some point in the past.

It's actually quite hard to lose those files. We will only remove the
file if it has a matching index entry. So you cannot do:

  ln -s /etc foo
  git rm foo/passwd

because there is no index entry for foo/passwd. You would have to do:

  mkdir foo
  echo content >foo/passwd
  git add foo/passwd
  rm -rf foo
  ln -s /etc foo
  git rm foo/passwd

and then you only lose it if it matches exactly "content". And
recovering it, you know that the original path that held the content was
called "passwd". So yes, technically you could lose a file outside of
the repo and have trouble finding which path it came from later. But in
practice, not really.

Anyway, it is academic at this point. :)

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] t3600: test rm of path with changed leading symlinks

2013-04-04 Thread Junio C Hamano
Jeff King  writes:

> Here's a replacement for patch 3, then. I wasn't sure if the
> editorializing in the last 2 paragraphs should go in the commit message
> or the cover letter; feel free to tweak as you see fit.

They look fine as they are.

> That means the worst case is not the accidental loss of content,
> but rather confusion by the user when a copy of a file another
> part of the tree is removed.

A copy of a file that is on the filesystem that may not be related
to the project at all may be lost, and the user may not notice the
lossage for quite a while.  A symlink that points at /etc/passwd may
cause the file to be removed and the user will hopefully notice, but
if the pointed-at file is something in $HOME/tmp/ that you occasionally
use, you may not notice the lossage immediately, and when you notice
the loss, the only assurance you have is that there is a blob that
records what was lost _somewhere_ in _some_ of your project that had
a symlink that points at $HOME/tmp/ at some point in the past.

"Exists somewhere, not lost" is not a very useful assurance, if you
ask me ;-)

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] t3600: test rm of path with changed leading symlinks

2013-04-04 Thread Jeff King
On Thu, Apr 04, 2013 at 04:33:39PM -0700, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > So let's drop patch 3. Do we want instead to have an expect_failure
> > documenting the correct behavior?
> 
> I think that is very much preferred.

Here's a replacement for patch 3, then. I wasn't sure if the
editorializing in the last 2 paragraphs should go in the commit message
or the cover letter; feel free to tweak as you see fit.

-- >8 --
Subject: [PATCH] t3600: document failure of rm across symbolic links

If we have a symlink "d" that points to a directory, we
should not be able to remove "d/f". In the normal case,
where "d/f" does not exist in the index, we already disallow
this, as we only remove things that git knows about in the
index. So for something like:

  ln -s /outside/repo foo
  git add foo
  git rm foo/bar

we will properly produce an error (as there is no index
entry for foo/bar). However, if there is an index entry for
the path (e.g., because the movement is due to working tree
changes that have not yet been reflected in the index), we
will happily delete it, even though the path we delete from the
filesystem is not the same as the path in the index.

This patch documents that failure with a test.

While this is a bug, it should not be possible to cause
serious data loss with it. For any path that does not have
an index entry, we will complain and bail. For a path which
does have an index entry, we will do the usual up-to-date
content check. So even if the deleted path in the filesystem
is not the same as the one we are removing from the index,
we do know that they at least have the same content, and
that the content is included in HEAD.

That means the worst case is not the accidental loss of
content, but rather confusion by the user when a copy of a
file another part of the tree is removed. Which makes this
bug a minor and hard-to-trigger annoyance rather than a
data-loss bug (and hence the fix can be saved for a rainy
day when somebody feels like working on it).

Signed-off-by: Jeff King 
---
 t/t3600-rm.sh | 28 
 1 file changed, 28 insertions(+)

diff --git a/t/t3600-rm.sh b/t/t3600-rm.sh
index a2e1a03..0c44e9f 100755
--- a/t/t3600-rm.sh
+++ b/t/t3600-rm.sh
@@ -659,4 +659,32 @@ test_expect_success 'rm of file when it has become a 
directory' '
test_path_is_file d/f
 '
 
+test_expect_success SYMLINKS 'rm across a symlinked leading path (no index)' '
+   rm -rf d e &&
+   mkdir e &&
+   echo content >e/f &&
+   ln -s e d &&
+   git add -A e d &&
+   git commit -m "symlink d to e, e/f exists" &&
+   test_must_fail git rm d/f &&
+   git rev-parse --verify :d &&
+   git rev-parse --verify :e/f &&
+   test -h d &&
+   test_path_is_file e/f
+'
+
+test_expect_failure SYMLINKS 'rm across a symlinked leading path (w/ index)' '
+   rm -rf d e &&
+   mkdir d &&
+   echo content >d/f &&
+   git add -A e d &&
+   git commit -m "d/f exists" &&
+   mv d e &&
+   ln -s e d &&
+   test_must_fail git rm d/f &&
+   git rev-parse --verify :d/f &&
+   test -h d &&
+   test_path_is_file e/f
+'
+
 test_done
-- 
1.8.2.rc0.33.gd915649

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] t3600: test rm of path with changed leading symlinks

2013-04-04 Thread Junio C Hamano
Jeff King  writes:

> So let's drop patch 3. Do we want instead to have an expect_failure
> documenting the correct behavior?

I think that is very much preferred.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] t3600: test rm of path with changed leading symlinks

2013-04-04 Thread Jeff King
On Thu, Apr 04, 2013 at 04:12:01PM -0700, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > Deleting across symlinks inside the repo can be brushed aside with "eh,
> > well, it is just another way to mention the same name in the
> > filesystem". But deleting anything outside of the repo seems actively
> > wrong.
> 
> Yup, you finally caught up ;-) IIRC, such an outside repository
> target was the case people realized that "git add" shouldn't see
> across symlinks.

It would help if you spelled it out rather than making me come to it
while arguing against you. ;P

> > Hmm. I think you have convinced me (or perhaps I have convinced myself)
> > that we should generally not be crossing symlink boundaries in
> > translating names between the filesystem and index. I still don't want
> > to work on it, though. :)
> 
> That is OK.  Just let's not etch a wrong behaviour in stone with
> that test.

So let's drop patch 3. Do we want instead to have an expect_failure
documenting the correct behavior?

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] t3600: test rm of path with changed leading symlinks

2013-04-04 Thread Junio C Hamano
Jeff King  writes:

> Deleting across symlinks inside the repo can be brushed aside with "eh,
> well, it is just another way to mention the same name in the
> filesystem". But deleting anything outside of the repo seems actively
> wrong.

Yup, you finally caught up ;-) IIRC, such an outside repository
target was the case people realized that "git add" shouldn't see
across symlinks.

> Hmm. I think you have convinced me (or perhaps I have convinced myself)
> that we should generally not be crossing symlink boundaries in
> translating names between the filesystem and index. I still don't want
> to work on it, though. :)

That is OK.  Just let's not etch a wrong behaviour in stone with
that test.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] t3600: test rm of path with changed leading symlinks

2013-04-04 Thread Jeff King
On Thu, Apr 04, 2013 at 01:31:43PM -0700, Junio C Hamano wrote:

> Jeff King  writes:
> 
> >> If you do this:
> >> 
> >>rm -fr d e
> >> mkdir e
> >> >e/f
> >> ln -s e d
> >> git add d/f
> >> 
> >> we do complain that d/f is beyond a symlink (meaning that all you
> >> can add is the symlink d that may happen to point at something).
> >
> > Right, but that is because you are adding a bogus entry to the index; we
> > cannot have both 'd' as a symlink and 'd/f' as a path in our git tree.
> > But in the removal case, the index manipulation is perfectly reasonable.
> 
> I think you misread me.  I am not adding 'd' as a symlink at all.
> IIRC, ancient versions of Git got this case wrong and added d/f to
> the index, which we later fixed.

I think I just spoke sloppily. What is bogus about "d/f" is not that "d"
is necessarily in the index right now, but that adding "d/f" implies
that "d" is a tree, which it clearly is not. Git maps filesystem
symlinks into the index and into its trees without dereferencing them.
So whether we have "d" in the index right now or not, "d/f" is wrong
conceptually.

But I do not think the "we are mis-representing the filesystem" problem
applies to this "rm" case. We are not adding anything bogus into the
index; on the contrary, we are deleting something that no longer matches
the filesystem representation (and is actually the same bogosity that we
avoid adding under the rule above).

I do agree that it violates git's general behavior with symlinks (i.e.,
that they are not dereferenced).

> I have been hinting that we should do the same safety not to touch
> (even compare the contents of) e/f, because the only reason we even
> look at it is because it appears beyond a symbolic link 'd'.

I can certainly see the safety argument that crossing a symlink at "d"
means the resulting "d/f" is not necessarily related to the original
"d/f" that is in the index. As I said, I do not mind having the extra
protection; my argument was only that the content-check already protects
us, so the extra protection is not necessary. And the implication is
that I do not feel like working on it. :) I do not mind at all if you
drop my third patch (and that is part of the reason I split it out from
patch 2, which I do think is a no-brainer), and I am happy to review
patches to do the symlink check if you want to work on it.

Having made the argument that the content-check is enough, though, I
think there is an interesting corner case where it might not be. I don't
mind "git rm d/f" deleting "e/f" inside the repository when "d" is a
symlink to "e". But what would happen with:

  rm -rf d
  ln -s /path/outside/repo d
  git rm d/f

Deleting across symlinks inside the repo can be brushed aside with "eh,
well, it is just another way to mention the same name in the
filesystem". But deleting anything outside of the repo seems actively
wrong.

And more on that "brushed aside". I think it is easy in the cases we
have been talking about, namely where "d/f" still exists in the index,
to think that "git rm d/f" is useful and the question is one of safety:
should we delete e/f if it is pointed to? But let us imagine that d/f is
_not_ in the index, but "d" is a symlink pointing to some/long/path".
The user wants to be lazy and say "git rm d/f", because typing
"some/long/path" is too much work. But what happens to the index? We
should probably not be removing "some/long/path".

Hmm. I think you have convinced me (or perhaps I have convinced myself)
that we should generally not be crossing symlink boundaries in
translating names between the filesystem and index. I still don't want
to work on it, though. :)

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] t3600: test rm of path with changed leading symlinks

2013-04-04 Thread Junio C Hamano
Jeff King  writes:

>> If you do this:
>> 
>>  rm -fr d e
>> mkdir e
>> >e/f
>> ln -s e d
>> git add d/f
>> 
>> we do complain that d/f is beyond a symlink (meaning that all you
>> can add is the symlink d that may happen to point at something).
>
> Right, but that is because you are adding a bogus entry to the index; we
> cannot have both 'd' as a symlink and 'd/f' as a path in our git tree.
> But in the removal case, the index manipulation is perfectly reasonable.

I think you misread me.  I am not adding 'd' as a symlink at all.
IIRC, ancient versions of Git got this case wrong and added d/f to
the index, which we later fixed.

I have been hinting that we should do the same safety not to touch
(even compare the contents of) e/f, because the only reason we even
look at it is because it appears beyond a symbolic link 'd'.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] t3600: test rm of path with changed leading symlinks

2013-04-04 Thread Jeff King
On Thu, Apr 04, 2013 at 12:42:54PM -0700, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > +test_expect_success SYMLINKS 'replace dir with symlink to dir (same 
> > content)' '
> > +   git reset --hard &&
> > +   rm -rf d e &&
> > +   mkdir e &&
> > +   echo content >e/f &&
> > +   ln -s e d &&
> > +   git rm d/f &&
> > +   test_must_fail git rev-parse --verify :d/f &&
> > +   test -h d &&
> > +   test_path_is_dir e
> > +'
> 
> This does not check if e/f still exists in the working tree, and I
> suspect "git rm d/f" removes it.

I guess I should have been more clear in my test; I think it _should_ be
removed (and it is). You do not necessarily care that "d" is now the
symlink and not the actual path; it is safe to remove d/f even though it
is behind a symlink now, because it has the exact same content that it
had before (it is of course important that we still remove the actual
d/f index entry, but as far as the working tree goes, we only care that
it is safe to remove, and that we remove it).

IOW, I should have been more explicit like this:

diff --git a/t/t3600-rm.sh b/t/t3600-rm.sh
index 9eaec08..3b51a63 100755
--- a/t/t3600-rm.sh
+++ b/t/t3600-rm.sh
@@ -687,7 +687,8 @@ test_expect_success SYMLINKS 'replace dir with symlink to 
dir (same content)' '
git rm d/f &&
test_must_fail git rev-parse --verify :d/f &&
test -h d &&
-   test_path_is_dir e
+   test_path_is_dir e &&
+   test_path_is_missing e/f
 '
 
 test_expect_success SYMLINKS 'replace dir with symlink to dir (new content)' '

> If you do this:
> 
>   rm -fr d e
> mkdir e
> >e/f
> ln -s e d
> git add d/f
> 
> we do complain that d/f is beyond a symlink (meaning that all you
> can add is the symlink d that may happen to point at something).

Right, but that is because you are adding a bogus entry to the index; we
cannot have both 'd' as a symlink and 'd/f' as a path in our git tree.
But in the removal case, the index manipulation is perfectly reasonable.
You are deleting the existing "d/f" entry. The only confusion comes from
the fact that the working tree does not match that anymore.

> Silent removal of e/f that is unrelated to the current project's
> tracked contents feels very wrong, and at the same time it looks to
> me that it is inconsistent with what we do when adding.
> 
> I need a bit more persuading to understand why it is not a bug, I
> think.

But that's the point of the two content tests. It _isn't_ unrelated to
the current project's tracked contents; it's the exact same content at
the same path (albeit accessed via symlinks now). The likely case for
this is something like:

  mv dir somewhere/else
  ln -s somewhere/else/dir dir

I do not mind if you want to insert extra protection to not cross
symlink boundaries (which would obviously invalidate my test).  But I
don't think it is necessary because of the existing content-level
protections.  Adding extra protections would disallow "git rm dir/file" in
the above case, but I don't think it's that inconvenient; the user just
has to make the index aware of the typechange first via "git add".

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] t3600: test rm of path with changed leading symlinks

2013-04-04 Thread Junio C Hamano
Jeff King  writes:

> +test_expect_success SYMLINKS 'replace dir with symlink to dir (same 
> content)' '
> + git reset --hard &&
> + rm -rf d e &&
> + mkdir e &&
> + echo content >e/f &&
> + ln -s e d &&
> + git rm d/f &&
> + test_must_fail git rev-parse --verify :d/f &&
> + test -h d &&
> + test_path_is_dir e
> +'

This does not check if e/f still exists in the working tree, and I
suspect "git rm d/f" removes it.

If you do this:

rm -fr d e
mkdir e
>e/f
ln -s e d
git add d/f

we do complain that d/f is beyond a symlink (meaning that all you
can add is the symlink d that may happen to point at something).

Silent removal of e/f that is unrelated to the current project's
tracked contents feels very wrong, and at the same time it looks to
me that it is inconsistent with what we do when adding.

I need a bit more persuading to understand why it is not a bug, I
think.

> +test_expect_success SYMLINKS 'replace dir with symlink to dir (new content)' 
> '
> + git reset --hard &&
> + rm -rf d e &&
> + mkdir e &&
> + echo changed >e/f &&
> + ln -s e d &&
> + test_must_fail git rm d/f &&
> + git rev-parse --verify :d/f &&
> + test -h d &&
> + test_path_is_file e/f
> +'
> +
>  test_done
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] t3600: test rm of path with changed leading symlinks

2013-04-04 Thread Jeff King
If we have a path "d/f" but replace "d" with a symlink to a
new directory "e", how should we handle "git rm d/f"?

It may seem at first like we need new protections to make
sure that we do not delete random content from "e/f".
However, we are already covered by git-rm's existing
protections: it is happy if the working tree file is either
already deleted, or if its content matches that of the index
and HEAD (and otherwise requires "-f").

Let's add some tests to make sure that these protections
remain in place when used across symlinks. We also want to
make sure that neither the symlink nor the pointed-to
directory is accidentally removed in an attempt to clean up
empty elements of the leading path.

Signed-off-by: Jeff King 
---
 t/t3600-rm.sh | 43 +++
 1 file changed, 43 insertions(+)

diff --git a/t/t3600-rm.sh b/t/t3600-rm.sh
index a2e1a03..9eaec08 100755
--- a/t/t3600-rm.sh
+++ b/t/t3600-rm.sh
@@ -659,4 +659,47 @@ test_expect_success 'rm of file when it has become a 
directory' '
test_path_is_file d/f
 '
 
+test_expect_success 'set up commit with d/f' '
+   rm -rf d e &&
+   mkdir d &&
+   echo content >d/f &&
+   git add d &&
+   git commit -m d
+'
+
+test_expect_success SYMLINKS 'replace dir with symlink to dir (file missing)' '
+   git reset --hard &&
+   rm -rf d e &&
+   mkdir e &&
+   ln -s e d &&
+   git rm d/f &&
+   test_must_fail git rev-parse --verify :d/f &&
+   test -h d &&
+   test_path_is_dir e
+'
+
+test_expect_success SYMLINKS 'replace dir with symlink to dir (same content)' '
+   git reset --hard &&
+   rm -rf d e &&
+   mkdir e &&
+   echo content >e/f &&
+   ln -s e d &&
+   git rm d/f &&
+   test_must_fail git rev-parse --verify :d/f &&
+   test -h d &&
+   test_path_is_dir e
+'
+
+test_expect_success SYMLINKS 'replace dir with symlink to dir (new content)' '
+   git reset --hard &&
+   rm -rf d e &&
+   mkdir e &&
+   echo changed >e/f &&
+   ln -s e d &&
+   test_must_fail git rm d/f &&
+   git rev-parse --verify :d/f &&
+   test -h d &&
+   test_path_is_file e/f
+'
+
 test_done
-- 
1.8.2.rc0.33.gd915649
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html