Re: [RFC v2 PATCH] Teach rm to remove submodules unless they contain a git directory
Am 27.08.2012 22:59, schrieb Junio C Hamano: Jens Lehmann jens.lehm...@web.de writes: +{ +int i; +int errs = 0; + +for (i = 0; i list.nr; i++) { +const char *name = list.entry[i].name; +int pos; +struct cache_entry *ce; +struct stat st; + +pos = cache_name_pos(name, strlen(name)); +if (pos 0) +continue; /* ignore unmerged entry */ Would this cause git rm -f path for an unmerged submodule bypass the safety check? Oops, thanks for spotting that. So replacing the continue; with pos = -pos-1; should do the trick here, right? Will add some tests for unmerged submodules ... +ce = active_cache[pos]; + +if (!S_ISGITLINK(ce-ce_mode) || +(lstat(ce-name, st) 0) || +is_empty_dir(name)) +continue; + +if (!submodule_uses_gitfile(name)) +errs = error(_(submodule '%s' (or one of its nested + submodules) uses a .git directory\n + (use 'rm -rf' if you really want to remove + it including all of its history)), name); +} + +return errs; +} The call to this function comes at the very end and gives us yes/no for the entire set of paths. After getting this error for one submodule and bunch of other non-submodule paths, what is the procedure for the user to remove it that we want to recommend in our documentation? Would it go like this? $ git rm path1 path2 sub path3 ... get the above error ... $ git submodule --to-gitfile sub $ rm -fr sub $ git rm sub ... then finally ... $ git rm path1 path2 path3 With current git I'd recommend: $ git rm path1 path2 sub path3 ... get the above error ... $ rm -fr sub ... try again ... $ git rm path1 path2 sub path3 Maybe I should add the hint to repeat the git rm after removing the submodule to the error output? Once we implemented git submodule --to-gitfile it could be used instead of rm -fr sub to preserve the submodule's repo if the user wants to. BTW: I added the same message twice, here for the forced case and in check_local_mod() when not forced. Is there a recommended way to assign a localized message to a static variable, so I could define it only once and reuse it? @@ -80,8 +116,11 @@ static int check_local_mod(unsigned char *head, int index_only) /* * Is the index different from the file in the work tree? + * If it's a submodule, is its work tree modified? */ -if (ce_match_stat(ce, st, 0)) +if (ce_match_stat(ce, st, 0) || +(S_ISGITLINK(ce-ce_mode) + !ok_to_remove_submodule(ce-name))) local_changes = 1; As noted before, because we also skip these does it match the index? does it match the HEAD? checks for unmerged paths in this function, a submodule that has local changes or new files is eligible for removal during a conflicted merge. I have a feeling that this should be tightened a bit; wouldn't we want to check at least in the checked out version (i.e. stage #2 in the index) if the path were a submodule, even if we are in the middle of a conflicted merge? After all, the top level merge shouldn't have touched the submodule working tree, so the local modes and new files must have come from the end user action that was done _before_ the conflicted merge started, and not expendable, no? Right, I'll change that. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC v2 PATCH] Teach rm to remove submodules unless they contain a git directory
Currently using git rm on a submodule - populated or not - fails with this error: fatal: git rm: 'submodule path': Is a directory This made sense in the past as there was no way to remove a submodule without possibly removing unpushed parts of the submodule's history contained in its .git directory too, so erroring out here protected the user from possible loss of data. But submodules cloned with a recent git version do not contain the .git directory anymore, they use a gitfile to point to their git directory which is safely stored inside the superproject's .git directory. The work tree of these submodules can safely be removed without loosing history, so let's teach git to do so. Using rm on an unpopulated submodule now removes the empty directory from the work tree and the gitlink from the index. If the submodule's directory is missing from the work tree, it will still be removed from the index. Using rm on a populated submodule using a gitfile will apply the usual checks for work tree modification adapted to submodules (unless forced). For a submodule that means that the HEAD is the same as recorded in the index, no tracked files are modified and no untracked files that aren't ignored are present in the submodules work tree (Ignored files are deemed expendable and won't stop a submodule's work tree from being removed). That logic has to be applied in all nested submodules too. Using rm on a submodule which has its .git directory inside the work trees top level directory will just error out like it did before, forced or not. In the future git could either provide a message informing the user to convert the submodule to use a gitfile or even attempt to do the conversion itself, but that is not part of this change. Signed-off-by: Jens Lehmann jens.lehm...@web.de --- This is the reroll of the Teach rm to better handle submodules series ($gmane/201015). It does not attempt to convert submodules that still contain their git directory (by moving their git directory into .git/modules/name and replacing it with a gitfile pointing there). That will be subject to a future patch, as I'm not sure yet if git rm should do that automagically or rather tell the user to use a (still to be added) git submodule to-gitfile path invocation to achieve that. In a follow up patch I'll teach git rm submod/ to not barf about the trailing '/', as that is added by TAB completion. Documentation/git-rm.txt | 15 builtin/rm.c | 95 ++--- submodule.c | 80 ++ submodule.h | 2 + t/t3600-rm.sh| 210 +++ 5 files changed, 389 insertions(+), 13 deletions(-) diff --git a/Documentation/git-rm.txt b/Documentation/git-rm.txt index 5d31860..3c76f9c 100644 --- a/Documentation/git-rm.txt +++ b/Documentation/git-rm.txt @@ -107,6 +107,21 @@ as well as modifications of existing paths. Typically you would first remove all tracked files from the working tree using this command: +Submodules + +Only submodules using a gitfile (which means they were cloned +with a git version 1.7.8 or newer) will be removed from the work +tree, as their repository lives inside the .git directory of the +superproject. If a submodule (or one of those nested inside it) +still use a .git directory, `git rm` will fail - no matter if forced +or not - to protect the submodules history. + +A submodule is considered up-to-date when the HEAD is the same as +recorded in the index, no tracked files are modified and no untracked +files that aren't ignored are present in the submodules work tree. +Ignored files are deemed expendable and won't stop a submodule's work +tree from being removed. + git ls-files -z | xargs -0 rm -f diff --git a/builtin/rm.c b/builtin/rm.c index 90c8a50..cb927a8 100644 --- a/builtin/rm.c +++ b/builtin/rm.c @@ -9,6 +9,7 @@ #include cache-tree.h #include tree-walk.h #include parse-options.h +#include submodule.h static const char * const builtin_rm_usage[] = { git rm [options] [--] file..., @@ -17,9 +18,43 @@ static const char * const builtin_rm_usage[] = { static struct { int nr, alloc; - const char **name; + struct { + const char *name; + char is_submodule; + } *entry; } list; +static int check_submodules_use_gitfiles() +{ + int i; + int errs = 0; + + for (i = 0; i list.nr; i++) { + const char *name = list.entry[i].name; + int pos; + struct cache_entry *ce; + struct stat st; + + pos = cache_name_pos(name, strlen(name)); + if (pos 0) + continue; /* ignore unmerged entry */ + ce = active_cache[pos]; + + if (!S_ISGITLINK(ce-ce_mode) || + (lstat(ce-name, st) 0) || + is_empty_dir(name)) +
Re: [RFC v2 PATCH] Teach rm to remove submodules unless they contain a git directory
Jens Lehmann jens.lehm...@web.de writes: diff --git a/Documentation/git-rm.txt b/Documentation/git-rm.txt index 5d31860..3c76f9c 100644 --- a/Documentation/git-rm.txt +++ b/Documentation/git-rm.txt @@ -107,6 +107,21 @@ as well as modifications of existing paths. Typically you would first remove all tracked files from the working tree using this command: +Submodules + You need to match the underline to the text if you want to make this a heading. diff --git a/builtin/rm.c b/builtin/rm.c index 90c8a50..cb927a8 100644 --- a/builtin/rm.c +++ b/builtin/rm.c @@ -9,6 +9,7 @@ #include cache-tree.h #include tree-walk.h #include parse-options.h +#include submodule.h static const char * const builtin_rm_usage[] = { git rm [options] [--] file..., @@ -17,9 +18,43 @@ static const char * const builtin_rm_usage[] = { static struct { int nr, alloc; - const char **name; + struct { + const char *name; + char is_submodule; + } *entry; } list; +static int check_submodules_use_gitfiles() static int check_submodules_use_gitfiles(void) +{ + int i; + int errs = 0; + + for (i = 0; i list.nr; i++) { + const char *name = list.entry[i].name; + int pos; + struct cache_entry *ce; + struct stat st; + + pos = cache_name_pos(name, strlen(name)); + if (pos 0) + continue; /* ignore unmerged entry */ Would this cause git rm -f path for an unmerged submodule bypass the safety check? With or without this patch, check_local_mod() will allow you to remove unmerged entry and the file in the working tree, and that is perfectly fine for a regular file or a symlink (as the path is involved in a conflicted merge (or other mergy operation), and its change from the HEAD can only come from that merge, because we would not let merge touch a path and leave its index entry unmerged if the path has local changes in the first place). Resolving the merge as a removal at the index level for a submodule is also fine in such a case, but don't you want to still keep the submodule working tree if it has its sole copy of the repository? And as far as I can tell, this function is the only thing that protects the user in such a situation. + ce = active_cache[pos]; + + if (!S_ISGITLINK(ce-ce_mode) || + (lstat(ce-name, st) 0) || + is_empty_dir(name)) + continue; + + if (!submodule_uses_gitfile(name)) + errs = error(_(submodule '%s' (or one of its nested + submodules) uses a .git directory\n + (use 'rm -rf' if you really want to remove + it including all of its history)), name); + } + + return errs; +} The call to this function comes at the very end and gives us yes/no for the entire set of paths. After getting this error for one submodule and bunch of other non-submodule paths, what is the procedure for the user to remove it that we want to recommend in our documentation? Would it go like this? $ git rm path1 path2 sub path3 ... get the above error ... $ git submodule --to-gitfile sub $ rm -fr sub $ git rm sub ... then finally ... $ git rm path1 path2 path3 This is not a complaint about the error message above, by the way. @@ -37,7 +72,7 @@ static int check_local_mod(unsigned char *head, int index_only) struct stat st; int pos; struct cache_entry *ce; - const char *name = list.name[i]; + const char *name = list.entry[i].name; unsigned char sha1[20]; unsigned mode; int local_changes = 0; @@ -58,9 +93,10 @@ static int check_local_mod(unsigned char *head, int index_only) /* if a file was removed and it is now a * directory, that is the same as ENOENT as * far as git is concerned; we do not track - * directories. + * directories unless they are submodules. */ - continue; + if (!S_ISGITLINK(ce-ce_mode)) + continue; } /* @@ -80,8 +116,11 @@ static int check_local_mod(unsigned char *head, int index_only) /* * Is the index different from the file in the work tree? + * If it's a submodule, is its work tree modified? */ - if (ce_match_stat(ce, st, 0)) + if (ce_match_stat(ce, st, 0) || + (S_ISGITLINK(ce-ce_mode) + !ok_to_remove_submodule(ce-name)))