Re: [PATCH v4] clone: report duplicate entries on case-insensitive filesystems
Torsten Bögershausen writes: > check_stat is 0 on Windows, and inum is allways 0 in lstat(). > I was thinking about systems which don't have inodes and inum, > and then generate an inum in memory, sometimes random. > After a reboot or a re-mount of the file systems those ino values > change. > However, for the initial clone we are fine in any case. Yup. > Now back to the compiler switch: > Windows always set inum to 0 and I can't think about a situation where > a file in a working tree gets inum = 0, can we use the following: > > static void mark_colliding_entries(const struct checkout *state, > struct cache_entry *ce, struct stat *st) > { > int i; > ce->ce_flags |= CE_MATCHED; > > for (i = 0; i < state->istate->cache_nr; i++) { > struct cache_entry *dup = state->istate->cache[i]; > int folded = 0; > > if (dup == ce) > break; > > if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE)) > continue; > /* >* Windows sets ino to 0. On other FS ino = 0 will already be >* used, so we don't see it for a file in a Git working tree >*/ > if (st->st_ino && (dup->ce_stat_data.sd_ino == st->st_ino)) > folded = 1; Hmm, that is tempting but feels slightly too magical to my taste. Others may easily be able to persuade me to change my mind in this case, though. > /* >* Fallback for NTFS and other case insenstive FS, >* which don't use POSIX inums >*/ > if (!fspathcmp(dup->name, ce->name)) > folded = 1; > > if (folded) { > dup->ce_flags |= CE_MATCHED; > break; > } > } > } > > >> >> Another thing we maybe want to see is if we can update the caller of >> this function so that we do not overwrite the earlier checkout with >> the data for this path. When two paths collide, we check out one of >> the paths without reporting (because we cannot notice), then attempt >> to check out the other path and report (because we do notice the >> previous one with lstat()). The current code then goes on and overwrites >> the file with the contents from the "other" path. >> >> Even if we had false negative in this loop, if we leave the contents >> for the earlier path while reporting the "other" path, then the user >> can get curious, inspect what contents the "other" path has on the >> filesystem, and can notice that it belongs to the (unreported--due >> to false negative) earlier path. >> > [snip]
Re: [PATCH v4] clone: report duplicate entries on case-insensitive filesystems
On Thu, Aug 16, 2018 at 4:03 PM Torsten Bögershausen wrote: > check_stat is 0 on Windows, How? check_stat is 1 by default. And "git init" does not add this key on new repos. > Now back to the compiler switch: > Windows always set inum to 0 and I can't think about a situation where > a file in a working tree gets inum = 0, can we use the following: I did consider using zero inum, but dropped it when thinking about checking all file systems. Are you sure zero inode can't be valid? -- Duy
Re: [PATCH v4] clone: report duplicate entries on case-insensitive filesystems
On Wed, Aug 15, 2018 at 12:38:49PM -0700, Junio C Hamano wrote: This should answer Duys comments as well. > Torsten Bögershausen writes: > [snip] > > Should the following be protected by core.checkstat ? > > if (check_stat) { > > I do not think such a if statement is strictly necessary. > > Even if check_stat tells us "when checking if a cached stat > information tells us that the path may have modified, use minimum > set of fields from the 'struct stat'", we still capture and update > the values from the same "full" set of fields when we mark a cache > entry up-to-date. So it all depends on why you are limiting with > check_stat. Is it because stdev is unusable? Is it because nsec is > unusable? Is it because ino is unusable? Only in the last case, > paying attention to check_stat will reduce the false positive. > > But then you made me wonder what value check_stat has on Windows. > If it is false, perhaps we do not even need the conditional > compilation, which is a huge plus. Agreed: check_stat is 0 on Windows, and inum is allways 0 in lstat(). I was thinking about systems which don't have inodes and inum, and then generate an inum in memory, sometimes random. After a reboot or a re-mount of the file systems those ino values change. However, for the initial clone we are fine in any case. > > >> + if (dup->ce_stat_data.sd_ino == st->st_ino) { > >> + dup->ce_flags |= CE_MATCHED; > >> + break; > >> + } > >> + } > >> +#endif > > > > Another thing is that we switch of the ASCII case-folding-detection-logic > > off for Windows users, even if we otherwise rely on icase. > > I think we can use fspathcmp() as a fallback. when inodes fail, > > because we may be on a network file system. > > > > (I don't have a test setup at the moment, but what happens with inodes > > when a Windows machine exports a share to Linux or Mac ?) > > > > Is there a chance to get the fspathcmp() back, like this ? > > If fspathcmp() never gives false positives, I do not think we would > mind using it like your update. False negatives are fine, as that > is better than just punting the whole thing when there is no usable > inum. And we do not care all that much if it is more expensive; > this is an error codepath after all. > > And from code structure's point of view, I think it makes sense. It > would be even better if we can lose the conditional compilation. The current implementation of fspathcmp() does not give false positvies, and future versions should not either. All case-insentive file systems have always treated 'a-z' equal to 'A-Z'. In FAT MS/DOS there had only been uppercase letters as file names, and `type file.txt` (the equivilant to ´cat file.txt´ in *nix) simply resultet in `type FILE.TXT` Later, with VFAT and later with HPFS/NTFS a file could be stored on disk as "File.txt". >From now on ´type FILE.TXT´ still worked, (and all other upper-lowercase combinations). This all is probably nothing new. The main point should be that fspathcmp() should never return a false positive, and I think we all agree on that. Now back to the compiler switch: Windows always set inum to 0 and I can't think about a situation where a file in a working tree gets inum = 0, can we use the following: static void mark_colliding_entries(const struct checkout *state, struct cache_entry *ce, struct stat *st) { int i; ce->ce_flags |= CE_MATCHED; for (i = 0; i < state->istate->cache_nr; i++) { struct cache_entry *dup = state->istate->cache[i]; int folded = 0; if (dup == ce) break; if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE)) continue; /* * Windows sets ino to 0. On other FS ino = 0 will already be * used, so we don't see it for a file in a Git working tree */ if (st->st_ino && (dup->ce_stat_data.sd_ino == st->st_ino)) folded = 1; /* * Fallback for NTFS and other case insenstive FS, * which don't use POSIX inums */ if (!fspathcmp(dup->name, ce->name)) folded = 1; if (folded) { dup->ce_flags |= CE_MATCHED; break; } } } > > Another thing we maybe want to see is if we can update the caller of > this function so that we do not overwrite the earlier checkout with > the data for this path. When two paths collide, we check out one of > the paths without reporting (because we cannot notice), then attempt > to check out the other path and report (because we do notice the > previous one with lstat()). The current code then goes on and overwrites > the file with the contents from the
Re: [PATCH v4] clone: report duplicate entries on case-insensitive filesystems
Torsten Bögershausen writes: >> + >> +#if !defined(GIT_WINDOWS_NATIVE) /* inode is always zero on Windows */ >> +for (i = 0; i < state->istate->cache_nr; i++) { >> +struct cache_entry *dup = state->istate->cache[i]; >> + >> +if (dup == ce) >> +break; >> + >> +if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE)) >> +continue; >> + > > Should the following be protected by core.checkstat ? > if (check_stat) { I do not think such a if statement is strictly necessary. Even if check_stat tells us "when checking if a cached stat information tells us that the path may have modified, use minimum set of fields from the 'struct stat'", we still capture and update the values from the same "full" set of fields when we mark a cache entry up-to-date. So it all depends on why you are limiting with check_stat. Is it because stdev is unusable? Is it because nsec is unusable? Is it because ino is unusable? Only in the last case, paying attention to check_stat will reduce the false positive. But then you made me wonder what value check_stat has on Windows. If it is false, perhaps we do not even need the conditional compilation, which is a huge plus. >> +if (dup->ce_stat_data.sd_ino == st->st_ino) { >> +dup->ce_flags |= CE_MATCHED; >> +break; >> +} >> +} >> +#endif > > Another thing is that we switch of the ASCII case-folding-detection-logic > off for Windows users, even if we otherwise rely on icase. > I think we can use fspathcmp() as a fallback. when inodes fail, > because we may be on a network file system. > > (I don't have a test setup at the moment, but what happens with inodes > when a Windows machine exports a share to Linux or Mac ?) > > Is there a chance to get the fspathcmp() back, like this ? If fspathcmp() never gives false positives, I do not think we would mind using it like your update. False negatives are fine, as that is better than just punting the whole thing when there is no usable inum. And we do not care all that much if it is more expensive; this is an error codepath after all. And from code structure's point of view, I think it makes sense. It would be even better if we can lose the conditional compilation. Another thing we maybe want to see is if we can update the caller of this function so that we do not overwrite the earlier checkout with the data for this path. When two paths collide, we check out one of the paths without reporting (because we cannot notice), then attempt to check out the other path and report (because we do notice the previous one with lstat()). The current code then goes on and overwrites the file with the contents from the "other" path. Even if we had false negative in this loop, if we leave the contents for the earlier path while reporting the "other" path, then the user can get curious, inspect what contents the "other" path has on the filesystem, and can notice that it belongs to the (unreported--due to false negative) earlier path. > static void mark_colliding_entries(const struct checkout *state, > struct cache_entry *ce, struct stat *st) > { > int i; > ce->ce_flags |= CE_MATCHED; > > for (i = 0; i < state->istate->cache_nr; i++) { > struct cache_entry *dup = state->istate->cache[i]; > int folded = 0; > > if (dup == ce) > break; > > if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE)) > continue; > > if (!fspathcmp(dup->name, ce->name)) > folded = 1; > > #if !defined(GIT_WINDOWS_NATIVE) /* inode is always zero on Windows */ > if (check_stat && (dup->ce_stat_data.sd_ino == st->st_ino)) > folded = 1; > #endif > if (folded) { > dup->ce_flags |= CE_MATCHED; > break; > } > } > }
Re: [PATCH v4] clone: report duplicate entries on case-insensitive filesystems
On Wed, Aug 15, 2018 at 9:08 PM Torsten Bögershausen wrote: > > +#if !defined(GIT_WINDOWS_NATIVE) /* inode is always zero on Windows */ > > + for (i = 0; i < state->istate->cache_nr; i++) { > > + struct cache_entry *dup = state->istate->cache[i]; > > + > > + if (dup == ce) > > + break; > > + > > + if (dup->ce_flags & (CE_MATCHED | CE_VALID | > > CE_SKIP_WORKTREE)) > > + continue; > > + > > Should the following be protected by core.checkstat ? > if (check_stat) { Good catch! st_ino is ignored if core.checkStat is false. I will probably send a separate patch to add more details to config.txt about this key. > > + if (dup->ce_stat_data.sd_ino == st->st_ino) { > > + dup->ce_flags |= CE_MATCHED; > > + break; > > + } > > + } > > +#endif > > Another thing is that we switch of the ASCII case-folding-detection-logic > off for Windows users, even if we otherwise rely on icase. > I think we can use fspathcmp() as a fallback. when inodes fail, > because we may be on a network file system. I admit I did not think about network file system. Will spend some time (and hopefully not on nfs kernel code) on it. For falling back on fspathcmp even on Windows, is it really safe? I'm on Linux and never have to deal with this issue to have any experience. It does sound good though because it should be a subset for any "weird" filesystems out there. > (I don't have a test setup at the moment, but what happens with inodes > when a Windows machine exports a share to Linux or Mac ?) > > Is there a chance to get the fspathcmp() back, like this ? > > static void mark_colliding_entries(const struct checkout *state, >struct cache_entry *ce, struct stat *st) > { > int i; > ce->ce_flags |= CE_MATCHED; > > for (i = 0; i < state->istate->cache_nr; i++) { > struct cache_entry *dup = state->istate->cache[i]; > int folded = 0; > > if (dup == ce) > break; > > if (dup->ce_flags & (CE_MATCHED | CE_VALID | > CE_SKIP_WORKTREE)) > continue; > > if (!fspathcmp(dup->name, ce->name)) > folded = 1; > > #if !defined(GIT_WINDOWS_NATIVE) /* inode is always zero on Windows */ > if (check_stat && (dup->ce_stat_data.sd_ino == st->st_ino)) > folded = 1; > #endif > if (folded) { > dup->ce_flags |= CE_MATCHED; > break; > } > } > } > -- Duy
Re: [PATCH v4] clone: report duplicate entries on case-insensitive filesystems
On Sun, Aug 12, 2018 at 11:07:14AM +0200, Nguyễn Thái Ngọc Duy wrote: > Paths that only differ in case work fine in a case-sensitive > filesystems, but if those repos are cloned in a case-insensitive one, > you'll get problems. The first thing to notice is "git status" will > never be clean with no indication what exactly is "dirty". > > This patch helps the situation a bit by pointing out the problem at > clone time. Even though this patch talks about case sensitivity, the > patch makes no assumption about folding rules by the filesystem. It > simply observes that if an entry has been already checked out at clone > time when we're about to write a new path, some folding rules are > behind this. > > This patch is tested with vim-colorschemes repository on a JFS partition > with case insensitive support on Linux. This repository has two files > darkBlue.vim and darkblue.vim. > > Signed-off-by: Nguyễn Thái Ngọc Duy > --- > v4 removes nr_duplicates (and fixes that false warning Szeder > reported). It also hints about case insensitivity as a cause of > problem because it's most likely the case when this warning shows up. > > builtin/clone.c | 1 + > cache.h | 1 + > entry.c | 28 > t/t5601-clone.sh | 8 +++- > unpack-trees.c | 28 > unpack-trees.h | 1 + > 6 files changed, 66 insertions(+), 1 deletion(-) > > diff --git a/builtin/clone.c b/builtin/clone.c > index 5c439f1394..0702b0e9d0 100644 > --- a/builtin/clone.c > +++ b/builtin/clone.c > @@ -747,6 +747,7 @@ static int checkout(int submodule_progress) > memset(, 0, sizeof opts); > opts.update = 1; > opts.merge = 1; > + opts.clone = 1; > opts.fn = oneway_merge; > opts.verbose_update = (option_verbosity >= 0); > opts.src_index = _index; > diff --git a/cache.h b/cache.h > index 8b447652a7..6d6138f4f1 100644 > --- a/cache.h > +++ b/cache.h > @@ -1455,6 +1455,7 @@ struct checkout { > unsigned force:1, >quiet:1, >not_new:1, > + clone:1, >refresh_cache:1; > }; > #define CHECKOUT_INIT { NULL, "" } > diff --git a/entry.c b/entry.c > index b5d1d3cf23..c70340df8e 100644 > --- a/entry.c > +++ b/entry.c > @@ -399,6 +399,31 @@ static int check_path(const char *path, int len, struct > stat *st, int skiplen) > return lstat(path, st); > } > > +static void mark_colliding_entries(const struct checkout *state, > +struct cache_entry *ce, struct stat *st) > +{ > + int i; > + > + ce->ce_flags |= CE_MATCHED; > + > +#if !defined(GIT_WINDOWS_NATIVE) /* inode is always zero on Windows */ > + for (i = 0; i < state->istate->cache_nr; i++) { > + struct cache_entry *dup = state->istate->cache[i]; > + > + if (dup == ce) > + break; > + > + if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE)) > + continue; > + Should the following be protected by core.checkstat ? if (check_stat) { > + if (dup->ce_stat_data.sd_ino == st->st_ino) { > + dup->ce_flags |= CE_MATCHED; > + break; > + } > + } > +#endif Another thing is that we switch of the ASCII case-folding-detection-logic off for Windows users, even if we otherwise rely on icase. I think we can use fspathcmp() as a fallback. when inodes fail, because we may be on a network file system. (I don't have a test setup at the moment, but what happens with inodes when a Windows machine exports a share to Linux or Mac ?) Is there a chance to get the fspathcmp() back, like this ? static void mark_colliding_entries(const struct checkout *state, struct cache_entry *ce, struct stat *st) { int i; ce->ce_flags |= CE_MATCHED; for (i = 0; i < state->istate->cache_nr; i++) { struct cache_entry *dup = state->istate->cache[i]; int folded = 0; if (dup == ce) break; if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE)) continue; if (!fspathcmp(dup->name, ce->name)) folded = 1; #if !defined(GIT_WINDOWS_NATIVE) /* inode is always zero on Windows */ if (check_stat && (dup->ce_stat_data.sd_ino == st->st_ino)) folded = 1; #endif if (folded) { dup->ce_flags |= CE_MATCHED; break; } } }
Re: [PATCH v4] clone: report duplicate entries on case-insensitive filesystems
Nguyễn Thái Ngọc Duy writes: > Paths that only differ in case work fine in a case-sensitive > filesystems, but if those repos are cloned in a case-insensitive one, > you'll get problems. The first thing to notice is "git status" will > never be clean with no indication what exactly is "dirty". > > This patch helps the situation a bit by pointing out the problem at > clone time. Even though this patch talks about case sensitivity, the > patch makes no assumption about folding rules by the filesystem. It > simply observes that if an entry has been already checked out at clone > time when we're about to write a new path, some folding rules are > behind this. > > This patch is tested with vim-colorschemes repository on a JFS partition > with case insensitive support on Linux. This repository has two files > darkBlue.vim and darkblue.vim. > > Signed-off-by: Nguyễn Thái Ngọc Duy > --- > v4 removes nr_duplicates (and fixes that false warning Szeder > reported). It also hints about case insensitivity as a cause of > problem because it's most likely the case when this warning shows up. Ah, you no longer have that counter and the pointer to the counter, as you do not even report how many paths collide ;-) Makes sense. > diff --git a/entry.c b/entry.c > index b5d1d3cf23..c70340df8e 100644 > --- a/entry.c > +++ b/entry.c > @@ -399,6 +399,31 @@ static int check_path(const char *path, int len, struct > stat *st, int skiplen) > return lstat(path, st); > } > > +static void mark_colliding_entries(const struct checkout *state, > +struct cache_entry *ce, struct stat *st) > +{ > + int i; > + > + ce->ce_flags |= CE_MATCHED; > + > +#if !defined(GIT_WINDOWS_NATIVE) /* inode is always zero on Windows */ > + for (i = 0; i < state->istate->cache_nr; i++) { > + struct cache_entry *dup = state->istate->cache[i]; > + > + if (dup == ce) > + break; > + > + if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE)) > + continue; > + > + if (dup->ce_stat_data.sd_ino == st->st_ino) { > + dup->ce_flags |= CE_MATCHED; > + break; > + } > + } > +#endif > +} OK. The whole loop might want to become a call to a helper function whose implementation is platform dependent in the future, but that should be kept outside the topic and left for a future enhancement. > @@ -455,6 +480,9 @@ int checkout_entry(struct cache_entry *ce, > return -1; > } > > + if (state->clone) > + mark_colliding_entries(state, ce, ); OK. I haven't carefully looked at the codepath but is it more involved to instead *not* check out this ce (and leave the working tree file that is already there for another path in the index alone)? I suspect it won't be as simple as if (state->clone) { mark_colliding_entries(state, ce, ); return -1; } but I think it would give much more pleasant end-user experience if we can do so, especially on GIT_WINDOWS_NATIVE. I would imagine that the first thing those who see the message "foo.txt have collided with something else we are not telling you" would want to do is to see what "foo.txt" contains---and it may be obvious to human that it contains the contents intended for "Foo.txt" instead, if we somehow refrained from overwriting it here, which would compensate for the lack of "this is the other path that collided with your file." It is perfectly an acceptable answer if it is "I looked at it, and it is a lot more involved as there are these fallouts from the codepaths that the control flows later from this point you haven't checked and considered---let's keep overwriting, it is much safer." > diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh > index 0b62037744..f2eb73bc74 100755 > --- a/t/t5601-clone.sh > +++ b/t/t5601-clone.sh > @@ -624,10 +624,16 @@ test_expect_success 'clone on case-insensitive fs' ' > git hash-object -w -t tree --stdin) && > c=$(git commit-tree -m bogus $t) && > git update-ref refs/heads/bogus $c && > - git clone -b bogus . bogus > + git clone -b bogus . bogus 2>warning > ) > ' > > +test_expect_success !MINGW,!CYGWIN,CASE_INSENSITIVE_FS 'colliding file > detection' ' > + grep X icasefs/warning && > + grep x icasefs/warning && > + test_i18ngrep "the following paths have collided" icasefs/warning > +' > + Ah, I was wondering why possible error message needs to be hidden in the previous test---it is not hiding; it is capturing to look for the paths in the message. Makes sense.
Re: [PATCH v4] clone: report duplicate entries on case-insensitive filesystems
On 8/12/2018 5:07 AM, Nguyễn Thái Ngọc Duy wrote: Paths that only differ in case work fine in a case-sensitive filesystems, but if those repos are cloned in a case-insensitive one, you'll get problems. The first thing to notice is "git status" will never be clean with no indication what exactly is "dirty". This patch helps the situation a bit by pointing out the problem at clone time. Even though this patch talks about case sensitivity, the patch makes no assumption about folding rules by the filesystem. It simply observes that if an entry has been already checked out at clone time when we're about to write a new path, some folding rules are behind this. This patch is tested with vim-colorschemes repository on a JFS partition with case insensitive support on Linux. This repository has two files darkBlue.vim and darkblue.vim. Signed-off-by: Nguyễn Thái Ngọc Duy --- v4 removes nr_duplicates (and fixes that false warning Szeder reported). It also hints about case insensitivity as a cause of problem because it's most likely the case when this warning shows up. builtin/clone.c | 1 + cache.h | 1 + entry.c | 28 t/t5601-clone.sh | 8 +++- unpack-trees.c | 28 unpack-trees.h | 1 + 6 files changed, 66 insertions(+), 1 deletion(-) diff --git a/builtin/clone.c b/builtin/clone.c index 5c439f1394..0702b0e9d0 100644 --- a/builtin/clone.c +++ b/builtin/clone.c @@ -747,6 +747,7 @@ static int checkout(int submodule_progress) memset(, 0, sizeof opts); opts.update = 1; opts.merge = 1; + opts.clone = 1; opts.fn = oneway_merge; opts.verbose_update = (option_verbosity >= 0); opts.src_index = _index; diff --git a/cache.h b/cache.h index 8b447652a7..6d6138f4f1 100644 --- a/cache.h +++ b/cache.h @@ -1455,6 +1455,7 @@ struct checkout { unsigned force:1, quiet:1, not_new:1, +clone:1, refresh_cache:1; }; #define CHECKOUT_INIT { NULL, "" } diff --git a/entry.c b/entry.c index b5d1d3cf23..c70340df8e 100644 --- a/entry.c +++ b/entry.c @@ -399,6 +399,31 @@ static int check_path(const char *path, int len, struct stat *st, int skiplen) return lstat(path, st); } +static void mark_colliding_entries(const struct checkout *state, + struct cache_entry *ce, struct stat *st) +{ + int i; + + ce->ce_flags |= CE_MATCHED; + +#if !defined(GIT_WINDOWS_NATIVE) /* inode is always zero on Windows */ + for (i = 0; i < state->istate->cache_nr; i++) { + struct cache_entry *dup = state->istate->cache[i]; + + if (dup == ce) + break; + + if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE)) + continue; + + if (dup->ce_stat_data.sd_ino == st->st_ino) { + dup->ce_flags |= CE_MATCHED; + break; + } + } +#endif +} + /* * Write the contents from ce out to the working tree. * @@ -455,6 +480,9 @@ int checkout_entry(struct cache_entry *ce, return -1; } + if (state->clone) + mark_colliding_entries(state, ce, ); + /* * We unlink the old file, to get the new one with the * right permissions (including umask, which is nasty diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh index 0b62037744..f2eb73bc74 100755 --- a/t/t5601-clone.sh +++ b/t/t5601-clone.sh @@ -624,10 +624,16 @@ test_expect_success 'clone on case-insensitive fs' ' git hash-object -w -t tree --stdin) && c=$(git commit-tree -m bogus $t) && git update-ref refs/heads/bogus $c && - git clone -b bogus . bogus + git clone -b bogus . bogus 2>warning ) ' +test_expect_success !MINGW,!CYGWIN,CASE_INSENSITIVE_FS 'colliding file detection' ' + grep X icasefs/warning && + grep x icasefs/warning && + test_i18ngrep "the following paths have collided" icasefs/warning +' + partial_clone () { SERVER="$1" && URL="$2" && diff --git a/unpack-trees.c b/unpack-trees.c index cd0680f11e..443df048ef 100644 --- a/unpack-trees.c +++ b/unpack-trees.c @@ -359,6 +359,12 @@ static int check_updates(struct unpack_trees_options *o) state.refresh_cache = 1; state.istate = index; + if (o->clone) { + state.clone = 1; + for (i = 0; i < index->cache_nr; i++) + index->cache[i]->ce_flags &= ~CE_MATCHED; + } + progress = get_progress(o); if (o->update) @@ -423,6 +429,28 @@ static int check_updates(struct unpack_trees_options *o) errs |= finish_delayed_checkout(); if (o->update)
[PATCH v4] clone: report duplicate entries on case-insensitive filesystems
Paths that only differ in case work fine in a case-sensitive filesystems, but if those repos are cloned in a case-insensitive one, you'll get problems. The first thing to notice is "git status" will never be clean with no indication what exactly is "dirty". This patch helps the situation a bit by pointing out the problem at clone time. Even though this patch talks about case sensitivity, the patch makes no assumption about folding rules by the filesystem. It simply observes that if an entry has been already checked out at clone time when we're about to write a new path, some folding rules are behind this. This patch is tested with vim-colorschemes repository on a JFS partition with case insensitive support on Linux. This repository has two files darkBlue.vim and darkblue.vim. Signed-off-by: Nguyễn Thái Ngọc Duy --- v4 removes nr_duplicates (and fixes that false warning Szeder reported). It also hints about case insensitivity as a cause of problem because it's most likely the case when this warning shows up. builtin/clone.c | 1 + cache.h | 1 + entry.c | 28 t/t5601-clone.sh | 8 +++- unpack-trees.c | 28 unpack-trees.h | 1 + 6 files changed, 66 insertions(+), 1 deletion(-) diff --git a/builtin/clone.c b/builtin/clone.c index 5c439f1394..0702b0e9d0 100644 --- a/builtin/clone.c +++ b/builtin/clone.c @@ -747,6 +747,7 @@ static int checkout(int submodule_progress) memset(, 0, sizeof opts); opts.update = 1; opts.merge = 1; + opts.clone = 1; opts.fn = oneway_merge; opts.verbose_update = (option_verbosity >= 0); opts.src_index = _index; diff --git a/cache.h b/cache.h index 8b447652a7..6d6138f4f1 100644 --- a/cache.h +++ b/cache.h @@ -1455,6 +1455,7 @@ struct checkout { unsigned force:1, quiet:1, not_new:1, +clone:1, refresh_cache:1; }; #define CHECKOUT_INIT { NULL, "" } diff --git a/entry.c b/entry.c index b5d1d3cf23..c70340df8e 100644 --- a/entry.c +++ b/entry.c @@ -399,6 +399,31 @@ static int check_path(const char *path, int len, struct stat *st, int skiplen) return lstat(path, st); } +static void mark_colliding_entries(const struct checkout *state, + struct cache_entry *ce, struct stat *st) +{ + int i; + + ce->ce_flags |= CE_MATCHED; + +#if !defined(GIT_WINDOWS_NATIVE) /* inode is always zero on Windows */ + for (i = 0; i < state->istate->cache_nr; i++) { + struct cache_entry *dup = state->istate->cache[i]; + + if (dup == ce) + break; + + if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE)) + continue; + + if (dup->ce_stat_data.sd_ino == st->st_ino) { + dup->ce_flags |= CE_MATCHED; + break; + } + } +#endif +} + /* * Write the contents from ce out to the working tree. * @@ -455,6 +480,9 @@ int checkout_entry(struct cache_entry *ce, return -1; } + if (state->clone) + mark_colliding_entries(state, ce, ); + /* * We unlink the old file, to get the new one with the * right permissions (including umask, which is nasty diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh index 0b62037744..f2eb73bc74 100755 --- a/t/t5601-clone.sh +++ b/t/t5601-clone.sh @@ -624,10 +624,16 @@ test_expect_success 'clone on case-insensitive fs' ' git hash-object -w -t tree --stdin) && c=$(git commit-tree -m bogus $t) && git update-ref refs/heads/bogus $c && - git clone -b bogus . bogus + git clone -b bogus . bogus 2>warning ) ' +test_expect_success !MINGW,!CYGWIN,CASE_INSENSITIVE_FS 'colliding file detection' ' + grep X icasefs/warning && + grep x icasefs/warning && + test_i18ngrep "the following paths have collided" icasefs/warning +' + partial_clone () { SERVER="$1" && URL="$2" && diff --git a/unpack-trees.c b/unpack-trees.c index cd0680f11e..443df048ef 100644 --- a/unpack-trees.c +++ b/unpack-trees.c @@ -359,6 +359,12 @@ static int check_updates(struct unpack_trees_options *o) state.refresh_cache = 1; state.istate = index; + if (o->clone) { + state.clone = 1; + for (i = 0; i < index->cache_nr; i++) + index->cache[i]->ce_flags &= ~CE_MATCHED; + } + progress = get_progress(o); if (o->update) @@ -423,6 +429,28 @@ static int check_updates(struct unpack_trees_options *o) errs |= finish_delayed_checkout(); if (o->update) git_attr_set_direction(GIT_ATTR_CHECKIN, NULL); + + if