Re: [RFC v2 PATCH] Teach rm to remove submodules unless they contain a git directory

2012-08-28 Thread Jens Lehmann
Am 27.08.2012 22:59, schrieb Junio C Hamano:
 Jens Lehmann jens.lehm...@web.de writes:
 +{
 +int i;
 +int errs = 0;
 +
 +for (i = 0; i  list.nr; i++) {
 +const char *name = list.entry[i].name;
 +int pos;
 +struct cache_entry *ce;
 +struct stat st;
 +
 +pos = cache_name_pos(name, strlen(name));
 +if (pos  0)
 +continue; /* ignore unmerged entry */
 
 Would this cause git rm -f path for an unmerged submodule bypass
 the safety check?

Oops, thanks for spotting that. So replacing the continue; with
pos = -pos-1; should do the trick here, right? Will add some
tests for unmerged submodules ...

 +ce = active_cache[pos];
 +
 +if (!S_ISGITLINK(ce-ce_mode) ||
 +(lstat(ce-name, st)  0) ||
 +is_empty_dir(name))
 +continue;
 +
 +if (!submodule_uses_gitfile(name))
 +errs = error(_(submodule '%s' (or one of its nested 
 + submodules) uses a .git directory\n
 + (use 'rm -rf' if you really want to 
 remove 
 + it including all of its history)), name);
 +}
 +
 +return errs;
 +}
 
 The call to this function comes at the very end and gives us yes/no
 for the entire set of paths.  After getting this error for one
 submodule and bunch of other non-submodule paths, what is the
 procedure for the user to remove it that we want to recommend in our
 documentation?  Would it go like this?
 
   $ git rm path1 path2 sub path3
   ... get the above error ...
   $ git submodule --to-gitfile sub
 $ rm -fr sub
 $ git rm sub
 ... then finally ...
 $ git rm path1 path2 path3

With current git I'd recommend:

$ git rm path1 path2 sub path3
... get the above error ...
$ rm -fr sub
... try again ...
$ git rm path1 path2 sub path3

Maybe I should add the hint to repeat the git rm after removing the
submodule to the error output?

Once we implemented git submodule --to-gitfile it could be used
instead of rm -fr sub to preserve the submodule's repo if the user
wants to.

BTW: I added the same message twice, here for the forced case and in
check_local_mod() when not forced. Is there a recommended way to assign
a localized message to a static variable, so I could define it only once
and reuse it?

 @@ -80,8 +116,11 @@ static int check_local_mod(unsigned char *head, int 
 index_only)

  /*
   * Is the index different from the file in the work tree?
 + * If it's a submodule, is its work tree modified?
   */
 -if (ce_match_stat(ce, st, 0))
 +if (ce_match_stat(ce, st, 0) ||
 +(S_ISGITLINK(ce-ce_mode) 
 + !ok_to_remove_submodule(ce-name)))
  local_changes = 1;
 
 As noted before, because we also skip these does it match the
 index?  does it match the HEAD? checks for unmerged paths in this
 function, a submodule that has local changes or new files is
 eligible for removal during a conflicted merge.  I have a feeling
 that this should be tightened a bit; wouldn't we want to check at
 least in the checked out version (i.e. stage #2 in the index) if the
 path were a submodule, even if we are in the middle of a conflicted
 merge?  After all, the top level merge shouldn't have touched the
 submodule working tree, so the local modes and new files must have
 come from the end user action that was done _before_ the conflicted
 merge started, and not expendable, no?

Right, I'll change that.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC v2 PATCH] Teach rm to remove submodules unless they contain a git directory

2012-08-27 Thread Jens Lehmann
Currently using git rm on a submodule - populated or not - fails with
this error:
fatal: git rm: 'submodule path': Is a directory
This made sense in the past as there was no way to remove a submodule
without possibly removing unpushed parts of the submodule's history
contained in its .git directory too, so erroring out here protected the
user from possible loss of data.

But submodules cloned with a recent git version do not contain the .git
directory anymore, they use a gitfile to point to their git directory
which is safely stored inside the superproject's .git directory. The work
tree of these submodules can safely be removed without loosing history, so
let's teach git to do so.

Using rm on an unpopulated submodule now removes the empty directory from
the work tree and the gitlink from the index. If the submodule's directory
is missing from the work tree, it will still be removed from the index.

Using rm on a populated submodule using a gitfile will apply the usual
checks for work tree modification adapted to submodules (unless forced).
For a submodule that means that the HEAD is the same as recorded in the
index, no tracked files are modified and no untracked files that aren't
ignored are present in the submodules work tree (Ignored files are deemed
expendable and won't stop a submodule's work tree from being removed).
That logic has to be applied in all nested submodules too.

Using rm on a submodule which has its .git directory inside the work trees
top level directory will just error out like it did before, forced or not.
In the future git could either provide a message informing the user to
convert the submodule to use a gitfile or even attempt to do the
conversion itself, but that is not part of this change.

Signed-off-by: Jens Lehmann jens.lehm...@web.de
---


This is the reroll of the Teach rm to better handle submodules series
($gmane/201015). It does not attempt to convert submodules that still
contain their git directory (by moving their git directory into
.git/modules/name and replacing it with a gitfile pointing there).
That will be subject to a future patch, as I'm not sure yet if git rm
should do that automagically or rather tell the user to use a (still
to be added) git submodule to-gitfile path invocation to achieve
that.

In a follow up patch I'll teach git rm submod/ to not barf about the
trailing '/', as that is added by TAB completion.


 Documentation/git-rm.txt |  15 
 builtin/rm.c |  95 ++---
 submodule.c  |  80 ++
 submodule.h  |   2 +
 t/t3600-rm.sh| 210 +++
 5 files changed, 389 insertions(+), 13 deletions(-)

diff --git a/Documentation/git-rm.txt b/Documentation/git-rm.txt
index 5d31860..3c76f9c 100644
--- a/Documentation/git-rm.txt
+++ b/Documentation/git-rm.txt
@@ -107,6 +107,21 @@ as well as modifications of existing paths.
 Typically you would first remove all tracked files from the working
 tree using this command:

+Submodules
+
+Only submodules using a gitfile (which means they were cloned
+with a git version 1.7.8 or newer) will be removed from the work
+tree, as their repository lives inside the .git directory of the
+superproject. If a submodule (or one of those nested inside it)
+still use a .git directory, `git rm` will fail - no matter if forced
+or not - to protect the submodules history.
+
+A submodule is considered up-to-date when the HEAD is the same as
+recorded in the index, no tracked files are modified and no untracked
+files that aren't ignored are present in the submodules work tree.
+Ignored files are deemed expendable and won't stop a submodule's work
+tree from being removed.
+
 
 git ls-files -z | xargs -0 rm -f
 
diff --git a/builtin/rm.c b/builtin/rm.c
index 90c8a50..cb927a8 100644
--- a/builtin/rm.c
+++ b/builtin/rm.c
@@ -9,6 +9,7 @@
 #include cache-tree.h
 #include tree-walk.h
 #include parse-options.h
+#include submodule.h

 static const char * const builtin_rm_usage[] = {
git rm [options] [--] file...,
@@ -17,9 +18,43 @@ static const char * const builtin_rm_usage[] = {

 static struct {
int nr, alloc;
-   const char **name;
+   struct {
+   const char *name;
+   char is_submodule;
+   } *entry;
 } list;

+static int check_submodules_use_gitfiles()
+{
+   int i;
+   int errs = 0;
+
+   for (i = 0; i  list.nr; i++) {
+   const char *name = list.entry[i].name;
+   int pos;
+   struct cache_entry *ce;
+   struct stat st;
+
+   pos = cache_name_pos(name, strlen(name));
+   if (pos  0)
+   continue; /* ignore unmerged entry */
+   ce = active_cache[pos];
+
+   if (!S_ISGITLINK(ce-ce_mode) ||
+   (lstat(ce-name, st)  0) ||
+   is_empty_dir(name))
+   

Re: [RFC v2 PATCH] Teach rm to remove submodules unless they contain a git directory

2012-08-27 Thread Junio C Hamano
Jens Lehmann jens.lehm...@web.de writes:

 diff --git a/Documentation/git-rm.txt b/Documentation/git-rm.txt
 index 5d31860..3c76f9c 100644
 --- a/Documentation/git-rm.txt
 +++ b/Documentation/git-rm.txt
 @@ -107,6 +107,21 @@ as well as modifications of existing paths.
  Typically you would first remove all tracked files from the working
  tree using this command:

 +Submodules
 +

You need to match the underline to the text if you want to make this
a heading.

 diff --git a/builtin/rm.c b/builtin/rm.c
 index 90c8a50..cb927a8 100644
 --- a/builtin/rm.c
 +++ b/builtin/rm.c
 @@ -9,6 +9,7 @@
  #include cache-tree.h
  #include tree-walk.h
  #include parse-options.h
 +#include submodule.h

  static const char * const builtin_rm_usage[] = {
   git rm [options] [--] file...,
 @@ -17,9 +18,43 @@ static const char * const builtin_rm_usage[] = {

  static struct {
   int nr, alloc;
 - const char **name;
 + struct {
 + const char *name;
 + char is_submodule;
 + } *entry;
  } list;

 +static int check_submodules_use_gitfiles()

static int check_submodules_use_gitfiles(void)

 +{
 + int i;
 + int errs = 0;
 +
 + for (i = 0; i  list.nr; i++) {
 + const char *name = list.entry[i].name;
 + int pos;
 + struct cache_entry *ce;
 + struct stat st;
 +
 + pos = cache_name_pos(name, strlen(name));
 + if (pos  0)
 + continue; /* ignore unmerged entry */

Would this cause git rm -f path for an unmerged submodule bypass
the safety check?

With or without this patch, check_local_mod() will allow you to
remove unmerged entry and the file in the working tree, and that is
perfectly fine for a regular file or a symlink (as the path is
involved in a conflicted merge (or other mergy operation), and its
change from the HEAD can only come from that merge, because we would
not let merge touch a path and leave its index entry unmerged if the
path has local changes in the first place).  Resolving the merge as
a removal at the index level for a submodule is also fine in such a
case, but don't you want to still keep the submodule working tree if
it has its sole copy of the repository?  And as far as I can tell,
this function is the only thing that protects the user in such a
situation.

 + ce = active_cache[pos];
 +
 + if (!S_ISGITLINK(ce-ce_mode) ||
 + (lstat(ce-name, st)  0) ||
 + is_empty_dir(name))
 + continue;
 +
 + if (!submodule_uses_gitfile(name))
 + errs = error(_(submodule '%s' (or one of its nested 
 +  submodules) uses a .git directory\n
 +  (use 'rm -rf' if you really want to 
 remove 
 +  it including all of its history)), name);
 + }
 +
 + return errs;
 +}

The call to this function comes at the very end and gives us yes/no
for the entire set of paths.  After getting this error for one
submodule and bunch of other non-submodule paths, what is the
procedure for the user to remove it that we want to recommend in our
documentation?  Would it go like this?

$ git rm path1 path2 sub path3
... get the above error ...
$ git submodule --to-gitfile sub
$ rm -fr sub
$ git rm sub
... then finally ...
$ git rm path1 path2 path3

This is not a complaint about the error message above, by the way.

 @@ -37,7 +72,7 @@ static int check_local_mod(unsigned char *head, int 
 index_only)
   struct stat st;
   int pos;
   struct cache_entry *ce;
 - const char *name = list.name[i];
 + const char *name = list.entry[i].name;
   unsigned char sha1[20];
   unsigned mode;
   int local_changes = 0;
 @@ -58,9 +93,10 @@ static int check_local_mod(unsigned char *head, int 
 index_only)
   /* if a file was removed and it is now a
* directory, that is the same as ENOENT as
* far as git is concerned; we do not track
 -  * directories.
 +  * directories unless they are submodules.
*/
 - continue;
 + if (!S_ISGITLINK(ce-ce_mode))
 + continue;
   }

   /*
 @@ -80,8 +116,11 @@ static int check_local_mod(unsigned char *head, int 
 index_only)

   /*
* Is the index different from the file in the work tree?
 +  * If it's a submodule, is its work tree modified?
*/
 - if (ce_match_stat(ce, st, 0))
 + if (ce_match_stat(ce, st, 0) ||
 + (S_ISGITLINK(ce-ce_mode) 
 +  !ok_to_remove_submodule(ce-name)))