Re: Behavior of git rm

2013-04-04 Thread Jeff King
On Wed, Apr 03, 2013 at 10:35:52AM -0700, Junio C Hamano wrote:

 Jeff King p...@peff.net writes:
 
  Of the two situations, I think the first one is less likely to be
  destructive (noticing that a file is already gone via ENOTDIR), as we
  are only proceeding with the index deletion, and we end up not touching
  the filesystem at all.
 
 Nice to see sound reasoning.

Here's a patch series which I think covers what we've discussed.

  [1/3]: rm: do not complain about d/f conflicts during deletion
  [2/3]: t3600: test behavior of reverse-d/f conflict
  [3/3]: t3600: test rm of path with changed leading symlinks

The first one is the code change, and the rest just documents the cases
we discussed.

The third one is a little subtle. For the most part is it just testing
the normal changed content requires --force behavior of rm. But I
think it is worth having because it also makes sure that after deleting
d/f when d is a symlink to e, that we do not remove the new
directory e nor the symlink d. I do not think this case was
explicitly planned for, but it does do the right thing now, and given
the subtlety, I'd rather somebody who changes it notice the breakage in
the test suite.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Behavior of git rm

2013-04-03 Thread Jeff King
On Wed, Apr 03, 2013 at 07:50:24AM -0700, jpinheiro wrote:

 While experimenting with git we found an unexpected behavior with git rm.
 Here is a trace of the unexpected behavior:
 
 $ git init
 $ mkdir D
 $ echo Hi  D/F
 $ git add D/F
 $ rm -r D
 $ echo Hey  D
 $ git rm D/F
 warning: 'D/F': Not a directory
 rm 'D/F'
 fatal: git rm: 'D/F': Not a directory

We drop the D/F entry from the index, but then fail to actually remove
it from the filesystem, because it has already been replaced. It is
impossible to tell from this toy example what the true intent was, but
in such a situation, there is a reasonable chance that the user should
have invoked rm --cached in the first place.

That being said, we do try to handle files which have already gone
missing; when unlink() fails, we do not consider it an error if we got
ENOENT. We could perhaps add ENOTDIR to that list, as it also indicates
that the file is gone (it just happens that one of its prefix
directories was replaced with something else).

The opposite case is also interesting:

  $ git init
  $ echo 1 D
  $ git add D
  $ rm D
  $ mkdir D
  $ echo 2 D/F
  $ git rm D
  rm 'D'
  fatal: git rm: 'D': Is a directory

We expect to see 'D' as a file, but it is now a directory. We _could_
recursively remove the directory, but that has the potential to delete
files that the user does not expect.

So in both cases, git rm could certainly detect the situation and
proceed with the destructive operation. But when there is such a
conflict between what's in the working tree and what's in the index, I
think we may be better off erring on the conservative side and bailing,
and letting the user reconcile the differences themselves (using either
git add or git rm --cached to update the index, or deciding how to
handle the working tree contents themselves with regular rm).

Of the two situations, I think the first one is less likely to be
destructive (noticing that a file is already gone via ENOTDIR), as we
are only proceeding with the index deletion, and we end up not touching
the filesystem at all. That patch would look something like:

diff --git a/builtin/rm.c b/builtin/rm.c
index dabfcf6..7b91d52 100644
--- a/builtin/rm.c
+++ b/builtin/rm.c
@@ -110,7 +110,7 @@ static int check_local_mod(unsigned char *head, int 
index_only)
ce = active_cache[pos];
 
if (lstat(ce-name, st)  0) {
-   if (errno != ENOENT)
+   if (errno != ENOENT  errno != ENOTDIR)
warning('%s': %s, ce-name, strerror(errno));
/* It already vanished from the working tree */
continue;
diff --git a/dir.c b/dir.c
index 57394e4..f9e7355 100644
--- a/dir.c
+++ b/dir.c
@@ -1603,7 +1603,7 @@ int remove_path(const char *name)
 {
char *slash;
 
-   if (unlink(name)  errno != ENOENT)
+   if (unlink(name)  errno != ENOENT  errno != ENOTDIR)
return -1;
 
slash = strrchr(name, '/');
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Behavior of git rm

2013-04-03 Thread Junio C Hamano
Jeff King p...@peff.net writes:

 Of the two situations, I think the first one is less likely to be
 destructive (noticing that a file is already gone via ENOTDIR), as we
 are only proceeding with the index deletion, and we end up not touching
 the filesystem at all.

Nice to see sound reasoning.


 diff --git a/builtin/rm.c b/builtin/rm.c
 index dabfcf6..7b91d52 100644
 --- a/builtin/rm.c
 +++ b/builtin/rm.c
 @@ -110,7 +110,7 @@ static int check_local_mod(unsigned char *head, int 
 index_only)
   ce = active_cache[pos];
  
   if (lstat(ce-name, st)  0) {
 - if (errno != ENOENT)
 + if (errno != ENOENT  errno != ENOTDIR)

OK.  We may be running lstat() on D/F but there may be D that is not
a directory.  If it is a file, we get ENOTDIR.

By the way, if D is a dangling symlink, we get ENOENT; in such a
case, we report rm 'D/F' on the output and remove the index entry.

$ rm -f .git/index  rm -fr D E
$ mkdir D  D/F  git add D  rm -fr D
$ ln -s erewhon D  git rm D/F  git ls-files
rm 'D/F'

Also if D is a symlink that point at a directory E, git rm does
something interesting.

(1) Perhaps we want a complaint in this case.

$ rm -f .git/index  rm -fr D E
$ mkdir D  D/F  git add D  rm -fr D
$ mkdir E  ln -s E D  git rm D/F

(2) Perhaps we want to make sure D/F is not beyond a symlink in this
case.

$ rm -f .git/index  rm -fr D E
$ mkdir D  D/F  git add D  rm -fr D
$ mkdir E  ln -s E D  date E/F  git rm D/F


$ git rm -f D/F

 diff --git a/dir.c b/dir.c
 index 57394e4..f9e7355 100644
 --- a/dir.c
 +++ b/dir.c
 @@ -1603,7 +1603,7 @@ int remove_path(const char *name)
  {
   char *slash;
  
 - if (unlink(name)  errno != ENOENT)
 + if (unlink(name)  errno != ENOENT  errno != ENOTDIR)
   return -1;

Ditto.

  
   slash = strrchr(name, '/');
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Behavior of git rm

2013-04-03 Thread Jeff King
On Wed, Apr 03, 2013 at 10:35:52AM -0700, Junio C Hamano wrote:

  diff --git a/builtin/rm.c b/builtin/rm.c
  index dabfcf6..7b91d52 100644
  --- a/builtin/rm.c
  +++ b/builtin/rm.c
  @@ -110,7 +110,7 @@ static int check_local_mod(unsigned char *head, int 
  index_only)
  ce = active_cache[pos];
   
  if (lstat(ce-name, st)  0) {
  -   if (errno != ENOENT)
  +   if (errno != ENOENT  errno != ENOTDIR)
 
 OK.  We may be running lstat() on D/F but there may be D that is not
 a directory.  If it is a file, we get ENOTDIR.
 
 By the way, if D is a dangling symlink, we get ENOENT; in such a
 case, we report rm 'D/F' on the output and remove the index entry.

   $ rm -f .git/index  rm -fr D E
   $ mkdir D  D/F  git add D  rm -fr D
 $ ln -s erewhon D  git rm D/F  git ls-files
 rm 'D/F'

That seems sane to me, and makes me feel like handling ENOTDIR here is
the right direction.  What that conditional is trying to say is if it
is because the file is not there..., and so far we know of three
conditions where it is not there:

  1. There is no entry at that path.

  2. There is a non-directory in the prefix of that path.

  3. There is a dangling symlink in the prefix of that path.

(1) and (3) we already handle via ENOENT. I think it is sane to handle
(2) the same as (3), but we do not do so currently.

 Also if D is a symlink that point at a directory E, git rm does
 something interesting.
 
 (1) Perhaps we want a complaint in this case.
 
   $ rm -f .git/index  rm -fr D E
   $ mkdir D  D/F  git add D  rm -fr D
   $ mkdir E  ln -s E D  git rm D/F

I think that is OK without complaint; the user asked to get rid of D/F,
and it is indeed gone (as well as its index entry) after the call
finishes. And we did not even need to delete anything, so we cannot be
losing data. I am much more concerned about this case:

 (2) Perhaps we want to make sure D/F is not beyond a symlink in this
 case.
 
   $ rm -f .git/index  rm -fr D E
   $ mkdir D  D/F  git add D  rm -fr D
   $ mkdir E  ln -s E D  date E/F  git rm D/F

where the user is deleting something that may or may not be related to
the original D/F. On the other hand, I don't have that much sympathy;
rm would make the same deletion. But hmm...shouldn't we be doing an
up-to-date check? Indeed:

  $ git rm D/F
  error: 'D/F' has staged content different from both the file and the HEAD
  (use -f to force removal)
  $ git commit -m foo  git rm D/F
  $ git rm D/F
  error: 'D/F' has local modifications
  (use --cached to keep the file, or -f to force removal)

So I do not think we need any extra safety; the content-level checks
should be enough to make sure we are not losing anything.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html