Re: [PATCH v4] clone: report duplicate entries on case-insensitive filesystems

2018-08-16 Thread Junio C Hamano
Torsten Bögershausen  writes:

> check_stat is 0 on Windows, and inum is allways 0 in lstat().
> I was thinking about systems which don't have inodes and inum,
> and then generate an inum in memory, sometimes random.
> After a reboot or a re-mount of the file systems those ino values
> change.
> However, for the initial clone we are fine in any case.

Yup.

> Now back to the compiler switch:
> Windows always set inum to 0 and I can't think about a situation where
> a file in a working tree gets inum = 0, can we use the following:
>
> static void mark_colliding_entries(const struct checkout *state,
>  struct cache_entry *ce, struct stat *st)
> {
>   int i;
>   ce->ce_flags |= CE_MATCHED;
>
>   for (i = 0; i < state->istate->cache_nr; i++) {
>   struct cache_entry *dup = state->istate->cache[i];
>   int folded = 0;
>
>   if (dup == ce)
>   break;
>
>   if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE))
>   continue;
>   /*
>* Windows sets ino to 0. On other FS ino = 0 will already be
>*  used, so we don't see it for a file in a Git working tree
>*/
>   if (st->st_ino && (dup->ce_stat_data.sd_ino == st->st_ino))
>   folded = 1;

Hmm, that is tempting but feels slightly too magical to my taste.
Others may easily be able to persuade me to change my mind in this
case, though.

>   /*
>* Fallback for NTFS and other case insenstive FS,
>* which don't use POSIX inums
>*/
>   if (!fspathcmp(dup->name, ce->name))
>   folded = 1;
>
>   if (folded) {
>   dup->ce_flags |= CE_MATCHED;
>   break;
>   }
>   }
> }
>
>
>> 
>> Another thing we maybe want to see is if we can update the caller of
>> this function so that we do not overwrite the earlier checkout with
>> the data for this path.  When two paths collide, we check out one of
>> the paths without reporting (because we cannot notice), then attempt
>> to check out the other path and report (because we do notice the
>> previous one with lstat()).  The current code then goes on and overwrites
>> the file with the contents from the "other" path.
>> 
>> Even if we had false negative in this loop, if we leave the contents
>> for the earlier path while reporting the "other" path, then the user
>> can get curious, inspect what contents the "other" path has on the
>> filesystem, and can notice that it belongs to the (unreported--due
>> to false negative) earlier path.
>> 
> [snip]


Re: [PATCH v4] clone: report duplicate entries on case-insensitive filesystems

2018-08-16 Thread Duy Nguyen
On Thu, Aug 16, 2018 at 4:03 PM Torsten Bögershausen  wrote:
> check_stat is 0 on Windows,

How? check_stat is 1 by default. And "git init" does not add this key
on new repos.

> Now back to the compiler switch:
> Windows always set inum to 0 and I can't think about a situation where
> a file in a working tree gets inum = 0, can we use the following:

I did consider using zero inum, but dropped it when thinking about
checking all file systems. Are you sure zero inode can't be valid?
-- 
Duy


Re: [PATCH v4] clone: report duplicate entries on case-insensitive filesystems

2018-08-16 Thread Torsten Bögershausen
On Wed, Aug 15, 2018 at 12:38:49PM -0700, Junio C Hamano wrote:

This should answer Duys comments as well.
> Torsten Bögershausen  writes:
> 
[snip]
> > Should the following be protected by core.checkstat ? 
> > if (check_stat) {
> 
> I do not think such a if statement is strictly necessary.
> 
> Even if check_stat tells us "when checking if a cached stat
> information tells us that the path may have modified, use minimum
> set of fields from the 'struct stat'", we still capture and update
> the values from the same "full" set of fields when we mark a cache
> entry up-to-date.  So it all depends on why you are limiting with
> check_stat.  Is it because stdev is unusable?  Is it because nsec is
> unusable?  Is it because ino is unusable?  Only in the last case,
> paying attention to check_stat will reduce the false positive.
> 
> But then you made me wonder what value check_stat has on Windows.
> If it is false, perhaps we do not even need the conditional
> compilation, which is a huge plus.

Agreed:
check_stat is 0 on Windows, and inum is allways 0 in lstat().
I was thinking about systems which don't have inodes and inum,
and then generate an inum in memory, sometimes random.
After a reboot or a re-mount of the file systems those ino values
change.
However, for the initial clone we are fine in any case.

> 
> >> +  if (dup->ce_stat_data.sd_ino == st->st_ino) {
> >> +  dup->ce_flags |= CE_MATCHED;
> >> +  break;
> >> +  }
> >> +  }
> >> +#endif
> >
> > Another thing is that we switch of the ASCII case-folding-detection-logic
> > off for Windows users, even if we otherwise rely on icase.
> > I think we can use fspathcmp() as a fallback. when inodes fail,
> > because we may be on a network file system.
> >
> > (I don't have a test setup at the moment, but what happens with inodes
> > when a Windows machine exports a share to Linux or Mac ?)
> >
> > Is there a chance to get the fspathcmp() back, like this ?
> 
> If fspathcmp() never gives false positives, I do not think we would
> mind using it like your update.  False negatives are fine, as that
> is better than just punting the whole thing when there is no usable
> inum.  And we do not care all that much if it is more expensive;
> this is an error codepath after all.
> 
> And from code structure's point of view, I think it makes sense.  It
> would be even better if we can lose the conditional compilation.

The current implementation of fspathcmp() does not give false positvies,
and future versions should not either.
All case-insentive file systems have always treated 'a-z' equal to 'A-Z'.
In FAT MS/DOS there had only been uppercase letters as file names,
and `type file.txt` (the equivilant to ´cat file.txt´ in *nix)
simply resultet in `type FILE.TXT`
Later, with VFAT and later with HPFS/NTFS a file could be stored on
disk as "File.txt".
>From now on  ´type FILE.TXT´ still worked, (and all other upper-lowercase
combinations).
This all is probably nothing new.
The main point should be that fspathcmp() should never return a false positive,
and I think we all agree on that. 


Now back to the compiler switch:
Windows always set inum to 0 and I can't think about a situation where
a file in a working tree gets inum = 0, can we use the following:

static void mark_colliding_entries(const struct checkout *state,
   struct cache_entry *ce, struct stat *st)
{
int i;
ce->ce_flags |= CE_MATCHED;

for (i = 0; i < state->istate->cache_nr; i++) {
struct cache_entry *dup = state->istate->cache[i];
int folded = 0;

if (dup == ce)
break;

if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE))
continue;
/*
 * Windows sets ino to 0. On other FS ino = 0 will already be
 *  used, so we don't see it for a file in a Git working tree
 */
if (st->st_ino && (dup->ce_stat_data.sd_ino == st->st_ino))
folded = 1;

/*
 * Fallback for NTFS and other case insenstive FS,
 * which don't use POSIX inums
 */
if (!fspathcmp(dup->name, ce->name))
folded = 1;

if (folded) {
dup->ce_flags |= CE_MATCHED;
break;
}
}
}


> 
> Another thing we maybe want to see is if we can update the caller of
> this function so that we do not overwrite the earlier checkout with
> the data for this path.  When two paths collide, we check out one of
> the paths without reporting (because we cannot notice), then attempt
> to check out the other path and report (because we do notice the
> previous one with lstat()).  The current code then goes on and overwrites
> the file with the contents from the 

Re: [PATCH v4] clone: report duplicate entries on case-insensitive filesystems

2018-08-15 Thread Junio C Hamano
Torsten Bögershausen  writes:

>> +
>> +#if !defined(GIT_WINDOWS_NATIVE) /* inode is always zero on Windows */
>> +for (i = 0; i < state->istate->cache_nr; i++) {
>> +struct cache_entry *dup = state->istate->cache[i];
>> +
>> +if (dup == ce)
>> +break;
>> +
>> +if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE))
>> +continue;
>> +
>
> Should the following be protected by core.checkstat ? 
>   if (check_stat) {

I do not think such a if statement is strictly necessary.

Even if check_stat tells us "when checking if a cached stat
information tells us that the path may have modified, use minimum
set of fields from the 'struct stat'", we still capture and update
the values from the same "full" set of fields when we mark a cache
entry up-to-date.  So it all depends on why you are limiting with
check_stat.  Is it because stdev is unusable?  Is it because nsec is
unusable?  Is it because ino is unusable?  Only in the last case,
paying attention to check_stat will reduce the false positive.

But then you made me wonder what value check_stat has on Windows.
If it is false, perhaps we do not even need the conditional
compilation, which is a huge plus.

>> +if (dup->ce_stat_data.sd_ino == st->st_ino) {
>> +dup->ce_flags |= CE_MATCHED;
>> +break;
>> +}
>> +}
>> +#endif
>
> Another thing is that we switch of the ASCII case-folding-detection-logic
> off for Windows users, even if we otherwise rely on icase.
> I think we can use fspathcmp() as a fallback. when inodes fail,
> because we may be on a network file system.
>
> (I don't have a test setup at the moment, but what happens with inodes
> when a Windows machine exports a share to Linux or Mac ?)
>
> Is there a chance to get the fspathcmp() back, like this ?

If fspathcmp() never gives false positives, I do not think we would
mind using it like your update.  False negatives are fine, as that
is better than just punting the whole thing when there is no usable
inum.  And we do not care all that much if it is more expensive;
this is an error codepath after all.

And from code structure's point of view, I think it makes sense.  It
would be even better if we can lose the conditional compilation.

Another thing we maybe want to see is if we can update the caller of
this function so that we do not overwrite the earlier checkout with
the data for this path.  When two paths collide, we check out one of
the paths without reporting (because we cannot notice), then attempt
to check out the other path and report (because we do notice the
previous one with lstat()).  The current code then goes on and overwrites
the file with the contents from the "other" path.

Even if we had false negative in this loop, if we leave the contents
for the earlier path while reporting the "other" path, then the user
can get curious, inspect what contents the "other" path has on the
filesystem, and can notice that it belongs to the (unreported--due
to false negative) earlier path.

> static void mark_colliding_entries(const struct checkout *state,
>  struct cache_entry *ce, struct stat *st)
> {
>   int i;
>   ce->ce_flags |= CE_MATCHED;
>
>   for (i = 0; i < state->istate->cache_nr; i++) {
>   struct cache_entry *dup = state->istate->cache[i];
>   int folded = 0;
>
>   if (dup == ce)
>   break;
>
>   if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE))
>   continue;
>
>   if (!fspathcmp(dup->name, ce->name))
>   folded = 1;
>
> #if !defined(GIT_WINDOWS_NATIVE) /* inode is always zero on Windows */
>   if (check_stat && (dup->ce_stat_data.sd_ino == st->st_ino))
>   folded = 1;
> #endif
>   if (folded) {
>   dup->ce_flags |= CE_MATCHED;
>   break;
>   }
>   }
> }


Re: [PATCH v4] clone: report duplicate entries on case-insensitive filesystems

2018-08-15 Thread Duy Nguyen
On Wed, Aug 15, 2018 at 9:08 PM Torsten Bögershausen  wrote:
> > +#if !defined(GIT_WINDOWS_NATIVE) /* inode is always zero on Windows */
> > + for (i = 0; i < state->istate->cache_nr; i++) {
> > + struct cache_entry *dup = state->istate->cache[i];
> > +
> > + if (dup == ce)
> > + break;
> > +
> > + if (dup->ce_flags & (CE_MATCHED | CE_VALID | 
> > CE_SKIP_WORKTREE))
> > + continue;
> > +
>
> Should the following be protected by core.checkstat ?
> if (check_stat) {

Good catch! st_ino is ignored if core.checkStat is false. I will
probably send a separate patch to add more details to config.txt about
this key.

> > + if (dup->ce_stat_data.sd_ino == st->st_ino) {
> > + dup->ce_flags |= CE_MATCHED;
> > + break;
> > + }
> > + }
> > +#endif
>
> Another thing is that we switch of the ASCII case-folding-detection-logic
> off for Windows users, even if we otherwise rely on icase.
> I think we can use fspathcmp() as a fallback. when inodes fail,
> because we may be on a network file system.

I admit I did not think about network file system. Will spend some
time (and hopefully not on nfs kernel code) on it.

For falling back on fspathcmp even on Windows, is it really safe? I'm
on Linux and never have to deal with this issue to have any
experience. It does sound good though because it should be a subset
for any "weird" filesystems out there.

> (I don't have a test setup at the moment, but what happens with inodes
> when a Windows machine exports a share to Linux or Mac ?)
>
> Is there a chance to get the fspathcmp() back, like this ?
>
> static void mark_colliding_entries(const struct checkout *state,
>struct cache_entry *ce, struct stat *st)
> {
> int i;
> ce->ce_flags |= CE_MATCHED;
>
> for (i = 0; i < state->istate->cache_nr; i++) {
> struct cache_entry *dup = state->istate->cache[i];
> int folded = 0;
>
> if (dup == ce)
> break;
>
> if (dup->ce_flags & (CE_MATCHED | CE_VALID | 
> CE_SKIP_WORKTREE))
> continue;
>
> if (!fspathcmp(dup->name, ce->name))
> folded = 1;
>
> #if !defined(GIT_WINDOWS_NATIVE) /* inode is always zero on Windows */
> if (check_stat && (dup->ce_stat_data.sd_ino == st->st_ino))
> folded = 1;
> #endif
> if (folded) {
> dup->ce_flags |= CE_MATCHED;
> break;
> }
> }
> }
>


-- 
Duy


Re: [PATCH v4] clone: report duplicate entries on case-insensitive filesystems

2018-08-15 Thread Torsten Bögershausen
On Sun, Aug 12, 2018 at 11:07:14AM +0200, Nguyễn Thái Ngọc Duy wrote:
> Paths that only differ in case work fine in a case-sensitive
> filesystems, but if those repos are cloned in a case-insensitive one,
> you'll get problems. The first thing to notice is "git status" will
> never be clean with no indication what exactly is "dirty".
> 
> This patch helps the situation a bit by pointing out the problem at
> clone time. Even though this patch talks about case sensitivity, the
> patch makes no assumption about folding rules by the filesystem. It
> simply observes that if an entry has been already checked out at clone
> time when we're about to write a new path, some folding rules are
> behind this.
> 
> This patch is tested with vim-colorschemes repository on a JFS partition
> with case insensitive support on Linux. This repository has two files
> darkBlue.vim and darkblue.vim.
> 
> Signed-off-by: Nguyễn Thái Ngọc Duy 
> ---
>  v4 removes nr_duplicates (and fixes that false warning Szeder
>  reported). It also hints about case insensitivity as a cause of
>  problem because it's most likely the case when this warning shows up.
> 
>  builtin/clone.c  |  1 +
>  cache.h  |  1 +
>  entry.c  | 28 
>  t/t5601-clone.sh |  8 +++-
>  unpack-trees.c   | 28 
>  unpack-trees.h   |  1 +
>  6 files changed, 66 insertions(+), 1 deletion(-)
> 
> diff --git a/builtin/clone.c b/builtin/clone.c
> index 5c439f1394..0702b0e9d0 100644
> --- a/builtin/clone.c
> +++ b/builtin/clone.c
> @@ -747,6 +747,7 @@ static int checkout(int submodule_progress)
>   memset(, 0, sizeof opts);
>   opts.update = 1;
>   opts.merge = 1;
> + opts.clone = 1;
>   opts.fn = oneway_merge;
>   opts.verbose_update = (option_verbosity >= 0);
>   opts.src_index = _index;
> diff --git a/cache.h b/cache.h
> index 8b447652a7..6d6138f4f1 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1455,6 +1455,7 @@ struct checkout {
>   unsigned force:1,
>quiet:1,
>not_new:1,
> +  clone:1,
>refresh_cache:1;
>  };
>  #define CHECKOUT_INIT { NULL, "" }
> diff --git a/entry.c b/entry.c
> index b5d1d3cf23..c70340df8e 100644
> --- a/entry.c
> +++ b/entry.c
> @@ -399,6 +399,31 @@ static int check_path(const char *path, int len, struct 
> stat *st, int skiplen)
>   return lstat(path, st);
>  }
>  
> +static void mark_colliding_entries(const struct checkout *state,
> +struct cache_entry *ce, struct stat *st)
> +{
> + int i;
> +
> + ce->ce_flags |= CE_MATCHED;
> +
> +#if !defined(GIT_WINDOWS_NATIVE) /* inode is always zero on Windows */
> + for (i = 0; i < state->istate->cache_nr; i++) {
> + struct cache_entry *dup = state->istate->cache[i];
> +
> + if (dup == ce)
> + break;
> +
> + if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE))
> + continue;
> +

Should the following be protected by core.checkstat ? 
if (check_stat) {


> + if (dup->ce_stat_data.sd_ino == st->st_ino) {
> + dup->ce_flags |= CE_MATCHED;
> + break;
> + }
> + }
> +#endif

Another thing is that we switch of the ASCII case-folding-detection-logic
off for Windows users, even if we otherwise rely on icase.
I think we can use fspathcmp() as a fallback. when inodes fail,
because we may be on a network file system.
(I don't have a test setup at the moment, but what happens with inodes
when a Windows machine exports a share to Linux or Mac ?)

Is there a chance to get the fspathcmp() back, like this ?

static void mark_colliding_entries(const struct checkout *state,
   struct cache_entry *ce, struct stat *st)
{
int i;
ce->ce_flags |= CE_MATCHED;

for (i = 0; i < state->istate->cache_nr; i++) {
struct cache_entry *dup = state->istate->cache[i];
int folded = 0;

if (dup == ce)
break;

if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE))
continue;

if (!fspathcmp(dup->name, ce->name))
folded = 1;

#if !defined(GIT_WINDOWS_NATIVE) /* inode is always zero on Windows */
if (check_stat && (dup->ce_stat_data.sd_ino == st->st_ino))
folded = 1;
#endif
if (folded) {
dup->ce_flags |= CE_MATCHED;
break;
}
}
}



Re: [PATCH v4] clone: report duplicate entries on case-insensitive filesystems

2018-08-13 Thread Junio C Hamano
Nguyễn Thái Ngọc Duy   writes:

> Paths that only differ in case work fine in a case-sensitive
> filesystems, but if those repos are cloned in a case-insensitive one,
> you'll get problems. The first thing to notice is "git status" will
> never be clean with no indication what exactly is "dirty".
>
> This patch helps the situation a bit by pointing out the problem at
> clone time. Even though this patch talks about case sensitivity, the
> patch makes no assumption about folding rules by the filesystem. It
> simply observes that if an entry has been already checked out at clone
> time when we're about to write a new path, some folding rules are
> behind this.
>
> This patch is tested with vim-colorschemes repository on a JFS partition
> with case insensitive support on Linux. This repository has two files
> darkBlue.vim and darkblue.vim.
>
> Signed-off-by: Nguyễn Thái Ngọc Duy 
> ---
>  v4 removes nr_duplicates (and fixes that false warning Szeder
>  reported). It also hints about case insensitivity as a cause of
>  problem because it's most likely the case when this warning shows up.

Ah, you no longer have that counter and the pointer to the counter,
as you do not even report how many paths collide ;-)  Makes sense.

> diff --git a/entry.c b/entry.c
> index b5d1d3cf23..c70340df8e 100644
> --- a/entry.c
> +++ b/entry.c
> @@ -399,6 +399,31 @@ static int check_path(const char *path, int len, struct 
> stat *st, int skiplen)
>   return lstat(path, st);
>  }
>  
> +static void mark_colliding_entries(const struct checkout *state,
> +struct cache_entry *ce, struct stat *st)
> +{
> + int i;
> +
> + ce->ce_flags |= CE_MATCHED;
> +
> +#if !defined(GIT_WINDOWS_NATIVE) /* inode is always zero on Windows */
> + for (i = 0; i < state->istate->cache_nr; i++) {
> + struct cache_entry *dup = state->istate->cache[i];
> +
> + if (dup == ce)
> + break;
> +
> + if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE))
> + continue;
> +
> + if (dup->ce_stat_data.sd_ino == st->st_ino) {
> + dup->ce_flags |= CE_MATCHED;
> + break;
> + }
> + }
> +#endif
> +}

OK.  The whole loop might want to become a call to a helper function
whose implementation is platform dependent in the future, but that
should be kept outside the topic and left for a future enhancement.

> @@ -455,6 +480,9 @@ int checkout_entry(struct cache_entry *ce,
>   return -1;
>   }
>  
> + if (state->clone)
> + mark_colliding_entries(state, ce, );

OK.  I haven't carefully looked at the codepath but is it more
involved to instead *not* check out this ce (and leave the working
tree file that is already there for another path in the index
alone)?  I suspect it won't be as simple as

if (state->clone) {
mark_colliding_entries(state, ce, );
return -1;
}

but I think it would give much more pleasant end-user experience if
we can do so, especially on GIT_WINDOWS_NATIVE.  I would imagine
that the first thing those who see the message "foo.txt have
collided with something else we are not telling you" would want to
do is to see what "foo.txt" contains---and it may be obvious to
human that it contains the contents intended for "Foo.txt" instead,
if we somehow refrained from overwriting it here, which would
compensate for the lack of "this is the other path that collided
with your file."

It is perfectly an acceptable answer if it is "I looked at it, and
it is a lot more involved as there are these fallouts from the
codepaths that the control flows later from this point you haven't
checked and considered---let's keep overwriting, it is much safer."

> diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh
> index 0b62037744..f2eb73bc74 100755
> --- a/t/t5601-clone.sh
> +++ b/t/t5601-clone.sh
> @@ -624,10 +624,16 @@ test_expect_success 'clone on case-insensitive fs' '
>   git hash-object -w -t tree --stdin) &&
>   c=$(git commit-tree -m bogus $t) &&
>   git update-ref refs/heads/bogus $c &&
> - git clone -b bogus . bogus
> + git clone -b bogus . bogus 2>warning
>   )
>  '
>  
> +test_expect_success !MINGW,!CYGWIN,CASE_INSENSITIVE_FS 'colliding file 
> detection' '
> + grep X icasefs/warning &&
> + grep x icasefs/warning &&
> + test_i18ngrep "the following paths have collided" icasefs/warning
> +'
> +

Ah, I was wondering why possible error message needs to be hidden in
the previous test---it is not hiding; it is capturing to look for
the paths in the message.  Makes sense.



Re: [PATCH v4] clone: report duplicate entries on case-insensitive filesystems

2018-08-13 Thread Jeff Hostetler




On 8/12/2018 5:07 AM, Nguyễn Thái Ngọc Duy wrote:

Paths that only differ in case work fine in a case-sensitive
filesystems, but if those repos are cloned in a case-insensitive one,
you'll get problems. The first thing to notice is "git status" will
never be clean with no indication what exactly is "dirty".

This patch helps the situation a bit by pointing out the problem at
clone time. Even though this patch talks about case sensitivity, the
patch makes no assumption about folding rules by the filesystem. It
simply observes that if an entry has been already checked out at clone
time when we're about to write a new path, some folding rules are
behind this.

This patch is tested with vim-colorschemes repository on a JFS partition
with case insensitive support on Linux. This repository has two files
darkBlue.vim and darkblue.vim.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
  v4 removes nr_duplicates (and fixes that false warning Szeder
  reported). It also hints about case insensitivity as a cause of
  problem because it's most likely the case when this warning shows up.

  builtin/clone.c  |  1 +
  cache.h  |  1 +
  entry.c  | 28 
  t/t5601-clone.sh |  8 +++-
  unpack-trees.c   | 28 
  unpack-trees.h   |  1 +
  6 files changed, 66 insertions(+), 1 deletion(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 5c439f1394..0702b0e9d0 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -747,6 +747,7 @@ static int checkout(int submodule_progress)
memset(, 0, sizeof opts);
opts.update = 1;
opts.merge = 1;
+   opts.clone = 1;
opts.fn = oneway_merge;
opts.verbose_update = (option_verbosity >= 0);
opts.src_index = _index;
diff --git a/cache.h b/cache.h
index 8b447652a7..6d6138f4f1 100644
--- a/cache.h
+++ b/cache.h
@@ -1455,6 +1455,7 @@ struct checkout {
unsigned force:1,
 quiet:1,
 not_new:1,
+clone:1,
 refresh_cache:1;
  };
  #define CHECKOUT_INIT { NULL, "" }
diff --git a/entry.c b/entry.c
index b5d1d3cf23..c70340df8e 100644
--- a/entry.c
+++ b/entry.c
@@ -399,6 +399,31 @@ static int check_path(const char *path, int len, struct 
stat *st, int skiplen)
return lstat(path, st);
  }
  
+static void mark_colliding_entries(const struct checkout *state,

+  struct cache_entry *ce, struct stat *st)
+{
+   int i;
+
+   ce->ce_flags |= CE_MATCHED;
+
+#if !defined(GIT_WINDOWS_NATIVE) /* inode is always zero on Windows */
+   for (i = 0; i < state->istate->cache_nr; i++) {
+   struct cache_entry *dup = state->istate->cache[i];
+
+   if (dup == ce)
+   break;
+
+   if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE))
+   continue;
+
+   if (dup->ce_stat_data.sd_ino == st->st_ino) {
+   dup->ce_flags |= CE_MATCHED;
+   break;
+   }
+   }
+#endif
+}
+
  /*
   * Write the contents from ce out to the working tree.
   *
@@ -455,6 +480,9 @@ int checkout_entry(struct cache_entry *ce,
return -1;
}
  
+		if (state->clone)

+   mark_colliding_entries(state, ce, );
+
/*
 * We unlink the old file, to get the new one with the
 * right permissions (including umask, which is nasty
diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh
index 0b62037744..f2eb73bc74 100755
--- a/t/t5601-clone.sh
+++ b/t/t5601-clone.sh
@@ -624,10 +624,16 @@ test_expect_success 'clone on case-insensitive fs' '
git hash-object -w -t tree --stdin) &&
c=$(git commit-tree -m bogus $t) &&
git update-ref refs/heads/bogus $c &&
-   git clone -b bogus . bogus
+   git clone -b bogus . bogus 2>warning
)
  '
  
+test_expect_success !MINGW,!CYGWIN,CASE_INSENSITIVE_FS 'colliding file detection' '

+   grep X icasefs/warning &&
+   grep x icasefs/warning &&
+   test_i18ngrep "the following paths have collided" icasefs/warning
+'
+
  partial_clone () {
   SERVER="$1" &&
   URL="$2" &&
diff --git a/unpack-trees.c b/unpack-trees.c
index cd0680f11e..443df048ef 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -359,6 +359,12 @@ static int check_updates(struct unpack_trees_options *o)
state.refresh_cache = 1;
state.istate = index;
  
+	if (o->clone) {

+   state.clone = 1;
+   for (i = 0; i < index->cache_nr; i++)
+   index->cache[i]->ce_flags &= ~CE_MATCHED;
+   }
+
progress = get_progress(o);
  
  	if (o->update)

@@ -423,6 +429,28 @@ static int check_updates(struct unpack_trees_options *o)
errs |= finish_delayed_checkout();
if (o->update)

[PATCH v4] clone: report duplicate entries on case-insensitive filesystems

2018-08-12 Thread Nguyễn Thái Ngọc Duy
Paths that only differ in case work fine in a case-sensitive
filesystems, but if those repos are cloned in a case-insensitive one,
you'll get problems. The first thing to notice is "git status" will
never be clean with no indication what exactly is "dirty".

This patch helps the situation a bit by pointing out the problem at
clone time. Even though this patch talks about case sensitivity, the
patch makes no assumption about folding rules by the filesystem. It
simply observes that if an entry has been already checked out at clone
time when we're about to write a new path, some folding rules are
behind this.

This patch is tested with vim-colorschemes repository on a JFS partition
with case insensitive support on Linux. This repository has two files
darkBlue.vim and darkblue.vim.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 v4 removes nr_duplicates (and fixes that false warning Szeder
 reported). It also hints about case insensitivity as a cause of
 problem because it's most likely the case when this warning shows up.

 builtin/clone.c  |  1 +
 cache.h  |  1 +
 entry.c  | 28 
 t/t5601-clone.sh |  8 +++-
 unpack-trees.c   | 28 
 unpack-trees.h   |  1 +
 6 files changed, 66 insertions(+), 1 deletion(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 5c439f1394..0702b0e9d0 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -747,6 +747,7 @@ static int checkout(int submodule_progress)
memset(, 0, sizeof opts);
opts.update = 1;
opts.merge = 1;
+   opts.clone = 1;
opts.fn = oneway_merge;
opts.verbose_update = (option_verbosity >= 0);
opts.src_index = _index;
diff --git a/cache.h b/cache.h
index 8b447652a7..6d6138f4f1 100644
--- a/cache.h
+++ b/cache.h
@@ -1455,6 +1455,7 @@ struct checkout {
unsigned force:1,
 quiet:1,
 not_new:1,
+clone:1,
 refresh_cache:1;
 };
 #define CHECKOUT_INIT { NULL, "" }
diff --git a/entry.c b/entry.c
index b5d1d3cf23..c70340df8e 100644
--- a/entry.c
+++ b/entry.c
@@ -399,6 +399,31 @@ static int check_path(const char *path, int len, struct 
stat *st, int skiplen)
return lstat(path, st);
 }
 
+static void mark_colliding_entries(const struct checkout *state,
+  struct cache_entry *ce, struct stat *st)
+{
+   int i;
+
+   ce->ce_flags |= CE_MATCHED;
+
+#if !defined(GIT_WINDOWS_NATIVE) /* inode is always zero on Windows */
+   for (i = 0; i < state->istate->cache_nr; i++) {
+   struct cache_entry *dup = state->istate->cache[i];
+
+   if (dup == ce)
+   break;
+
+   if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE))
+   continue;
+
+   if (dup->ce_stat_data.sd_ino == st->st_ino) {
+   dup->ce_flags |= CE_MATCHED;
+   break;
+   }
+   }
+#endif
+}
+
 /*
  * Write the contents from ce out to the working tree.
  *
@@ -455,6 +480,9 @@ int checkout_entry(struct cache_entry *ce,
return -1;
}
 
+   if (state->clone)
+   mark_colliding_entries(state, ce, );
+
/*
 * We unlink the old file, to get the new one with the
 * right permissions (including umask, which is nasty
diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh
index 0b62037744..f2eb73bc74 100755
--- a/t/t5601-clone.sh
+++ b/t/t5601-clone.sh
@@ -624,10 +624,16 @@ test_expect_success 'clone on case-insensitive fs' '
git hash-object -w -t tree --stdin) &&
c=$(git commit-tree -m bogus $t) &&
git update-ref refs/heads/bogus $c &&
-   git clone -b bogus . bogus
+   git clone -b bogus . bogus 2>warning
)
 '
 
+test_expect_success !MINGW,!CYGWIN,CASE_INSENSITIVE_FS 'colliding file 
detection' '
+   grep X icasefs/warning &&
+   grep x icasefs/warning &&
+   test_i18ngrep "the following paths have collided" icasefs/warning
+'
+
 partial_clone () {
   SERVER="$1" &&
   URL="$2" &&
diff --git a/unpack-trees.c b/unpack-trees.c
index cd0680f11e..443df048ef 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -359,6 +359,12 @@ static int check_updates(struct unpack_trees_options *o)
state.refresh_cache = 1;
state.istate = index;
 
+   if (o->clone) {
+   state.clone = 1;
+   for (i = 0; i < index->cache_nr; i++)
+   index->cache[i]->ce_flags &= ~CE_MATCHED;
+   }
+
progress = get_progress(o);
 
if (o->update)
@@ -423,6 +429,28 @@ static int check_updates(struct unpack_trees_options *o)
errs |= finish_delayed_checkout();
if (o->update)
git_attr_set_direction(GIT_ATTR_CHECKIN, NULL);
+
+   if