Re: [PATCH v5] clone: report duplicate entries on case-insensitive filesystems

2018-11-19 Thread Junio C Hamano
Duy Nguyen  writes:

>  
> - for (i = 0; i < state->istate->cache_nr; i++) {
> + for (i = 0; i < trust_ino && state->istate->cache_nr; i++) {

There is some typo here, but modulo that this looks like the right
thing to do.

> @@ -419,10 +419,24 @@ static void mark_colliding_entries(const struct 
> checkout *state,
>   if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE))
>   continue;
>  
> - if ((trust_ino && dup->ce_stat_data.sd_ino == st->st_ino) ||
> - (!trust_ino && !fspathcmp(ce->name, dup->name))) {
> + if (dup->ce_stat_data.sd_ino == (unsigned int)st->st_ino) {

This is slightly unfortunate but is the best we can do for now.  

The reason why the design of the "cached stat info" mechanism allows
the sd_* fields to be narrower than the underlying fields is because
they are used only as an early-culling measure (if the value saved
with truncation is different from the current value with truncation,
then they cannot possibly be the same, so we know that the file
changed without looking at the contents).

This use however is different.  Equality of truncated values
immediately declare CE_MATCHED here, producing false negative, which
is not what we want, no?

>   dup->ce_flags |= CE_MATCHED;
> + return;
> + }
> + }



Re: [PATCH v5] clone: report duplicate entries on case-insensitive filesystems

2018-11-19 Thread Carlo Arenas
ok 99 - colliding file detection

as well in macOS with APFS

Carlo


Re: [PATCH v5] clone: report duplicate entries on case-insensitive filesystems

2018-11-19 Thread Ramsay Jones



On 19/11/2018 23:29, Ramsay Jones wrote:
> 
> 
> On 19/11/2018 21:03, Duy Nguyen wrote:
>> First of all, Ramsay, it would be great if you could test the below
>> patch and see if it works on Cygwin. I assume since Cygwin shares the
>> underlying filesystem, it will share the same "no trusting inode"
>> issue with native builds (or it calculates inodes anyway using some
>> other source?).
> 
> Hmm, I have no idea why you would like me to try this patch - care
> to explain? [I just saw, "Has this been tested on cygwin?" and, since
> it has been happily passing for some time, responded yes!]
> 
> Just for the giggles, I removed the !CYGWIN prerequisite from the
> test and when, as expected, the test failed, had a look around:
> 
> $ pwd
> /home/ramsay/git/t/trash directory.t5601-clone
> $ cat icasefs/warning 
> Cloning into 'bogus'...
> done.
> warning: the following paths have collided (e.g. case-sensitive paths
> on a case-insensitive filesystem) and only one from the same
> colliding group is in the working tree:
> 
>   'x'
> $ cd icasefs/bogus
> $ ls -l
> total 0
> -rw-r--r-- 1 ramsay None 0 Nov 19 22:40 x
> $ git ls-files --debug
> ignoring EOIE extension
> X
>   ctime: 1542667201:664036600
>   mtime: 1542667201:663055400
>   dev: 2378432ino: 324352
>   uid: 1001   gid: 513
>   size: 0 flags: 0
> x
>   ctime: 1542667201:665026800
>   mtime: 1542667201:665026800
>   dev: 2378432ino: 324352
>   uid: 1001   gid: 513
>   size: 0 flags: 0
> $ 
> 
> So, both X and x are in the index with the same inode number.
> 
> Does that help?

Well, I haven't even looked at the patch, but when I apply it to
the current 'pu' branch (just what I happened to have checked out)
and run that one test:

$ ./t5601-clone.sh
...
ok 96 - shallow clone locally
ok 97 - GIT_TRACE_PACKFILE produces a usable pack
ok 98 - clone on case-insensitive fs
ok 99 - colliding file detection
ok 100 - partial clone
ok 101 - partial clone: warn if server does not support object filtering
ok 102 - batch missing blob request during checkout
ok 103 - batch missing blob request does not inadvertently try to fetch gitlinks
# passed all 103 test(s)
# SKIP no web server found at '/usr/sbin/apache2'
1..103
$ 

... the colliding file detection test passes!

ATB,
Ramsay Jones




Re: [PATCH v5] clone: report duplicate entries on case-insensitive filesystems

2018-11-19 Thread Ramsay Jones



On 19/11/2018 21:03, Duy Nguyen wrote:
> First of all, Ramsay, it would be great if you could test the below
> patch and see if it works on Cygwin. I assume since Cygwin shares the
> underlying filesystem, it will share the same "no trusting inode"
> issue with native builds (or it calculates inodes anyway using some
> other source?).

Hmm, I have no idea why you would like me to try this patch - care
to explain? [I just saw, "Has this been tested on cygwin?" and, since
it has been happily passing for some time, responded yes!]

Just for the giggles, I removed the !CYGWIN prerequisite from the
test and when, as expected, the test failed, had a look around:

$ pwd
/home/ramsay/git/t/trash directory.t5601-clone
$ cat icasefs/warning 
Cloning into 'bogus'...
done.
warning: the following paths have collided (e.g. case-sensitive paths
on a case-insensitive filesystem) and only one from the same
colliding group is in the working tree:

  'x'
$ cd icasefs/bogus
$ ls -l
total 0
-rw-r--r-- 1 ramsay None 0 Nov 19 22:40 x
$ git ls-files --debug
ignoring EOIE extension
X
  ctime: 1542667201:664036600
  mtime: 1542667201:663055400
  dev: 2378432  ino: 324352
  uid: 1001 gid: 513
  size: 0   flags: 0
x
  ctime: 1542667201:665026800
  mtime: 1542667201:665026800
  dev: 2378432  ino: 324352
  uid: 1001 gid: 513
  size: 0   flags: 0
$ 

So, both X and x are in the index with the same inode number.

Does that help?

ATB,
Ramsay Jones


Re: [PATCH v5] clone: report duplicate entries on case-insensitive filesystems

2018-11-19 Thread Duy Nguyen
On Mon, Nov 19, 2018 at 10:03 PM Duy Nguyen  wrote:
> Thanks Carlo for the file and "stat" output. The problem is APFS has
> 64-bit inode (according to the Internet) while we store inodes as
> 32-bit, so it's truncated.
> ...
> We will have to deal with the same
> truncated inode elsewhere to make sure we index refresh performance
> does not degrade on APFS.

... and we don't have a problem there. Either Linus predicted dealing
with 64-bit inodes, or he had a habit of casting st_ino to unsigned
int, I cannot tell. This code

ce->st_ino != (unsigned int)st->st_ino

is from e83c516331 (Initial revision of "git", the information manager
from hell - 2005-04-07) and it's still used today for comparing sd_ino
with st->st_ino in read-cache.c. I guess I should have copied and
pasted more often.
-- 
Duy


Re: [PATCH v5] clone: report duplicate entries on case-insensitive filesystems

2018-11-19 Thread Duy Nguyen
... and I "dear Ramsay" without CCing him.. sigh.. sorry for the noise.

On Mon, Nov 19, 2018 at 10:03 PM Duy Nguyen  wrote:
>
> First of all, Ramsay, it would be great if you could test the below
> patch and see if it works on Cygwin. I assume since Cygwin shares the
> underlying filesystem, it will share the same "no trusting inode"
> issue with native builds (or it calculates inodes anyway using some
> other source?).
>
> Back to the APFS problem...
>
> On Mon, Nov 19, 2018 at 07:24:26PM +0100, Duy Nguyen wrote:
> > Could you send me the "index" file in  t/trash\
> > directory.t5601-clone/icasefs/bogus/.git/index ? Also the output of
> > "stat /path/to/icase/bogus/x"
> >
> > My only explanation is somehow the inode value we save is not the same
> > one on disk, which is weird and could even cause other problems. I'd
> > like to know why this happens before trying to fix anything.
>
> Thanks Carlo for the file and "stat" output. The problem is APFS has
> 64-bit inode (according to the Internet) while we store inodes as
> 32-bit, so it's truncated. Which means this comparison
>
> sd_ino == st_ino
>
> is never true because sd_ino is truncated (0x2121063) while st_ino is
> not (0x202121063).
>
> Carlo, it would be great if you could test this patch also with
> APFS. It should fix problem. We will have to deal with the same
> truncated inode elsewhere to make sure we index refresh performance
> does not degrade on APFS. But that's a separate problem. Thank you for
> bringing this up.
>
> -- 8< --
> diff --git a/entry.c b/entry.c
> index 5d136c5d55..809d3e2ba7 100644
> --- a/entry.c
> +++ b/entry.c
> @@ -404,13 +404,13 @@ static void mark_colliding_entries(const struct 
> checkout *state,
>  {
> int i, trust_ino = check_stat;
>
> -#if defined(GIT_WINDOWS_NATIVE)
> +#if defined(GIT_WINDOWS_NATIVE) || defined(__CYGWIN__)
> trust_ino = 0;
>  #endif
>
> ce->ce_flags |= CE_MATCHED;
>
> -   for (i = 0; i < state->istate->cache_nr; i++) {
> +   for (i = 0; i < trust_ino && state->istate->cache_nr; i++) {
> struct cache_entry *dup = state->istate->cache[i];
>
> if (dup == ce)
> @@ -419,10 +419,24 @@ static void mark_colliding_entries(const struct 
> checkout *state,
> if (dup->ce_flags & (CE_MATCHED | CE_VALID | 
> CE_SKIP_WORKTREE))
> continue;
>
> -   if ((trust_ino && dup->ce_stat_data.sd_ino == st->st_ino) ||
> -   (!trust_ino && !fspathcmp(ce->name, dup->name))) {
> +   if (dup->ce_stat_data.sd_ino == (unsigned int)st->st_ino) {
> dup->ce_flags |= CE_MATCHED;
> +   return;
> +   }
> +   }
> +
> +   for (i = 0; i < state->istate->cache_nr; i++) {
> +   struct cache_entry *dup = state->istate->cache[i];
> +
> +   if (dup == ce)
> break;
> +
> +   if (dup->ce_flags & (CE_MATCHED | CE_VALID | 
> CE_SKIP_WORKTREE))
> +   continue;
> +
> +   if (!fspathcmp(ce->name, dup->name)) {
> +   dup->ce_flags |= CE_MATCHED;
> +   return;
> }
> }
>  }
> diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh
> index f1a49e94f5..c28d51bd59 100755
> --- a/t/t5601-clone.sh
> +++ b/t/t5601-clone.sh
> @@ -628,7 +628,7 @@ test_expect_success 'clone on case-insensitive fs' '
> )
>  '
>
> -test_expect_success !MINGW,!CYGWIN,CASE_INSENSITIVE_FS 'colliding file 
> detection' '
> +test_expect_success !MINGW,CASE_INSENSITIVE_FS 'colliding file detection' '
> grep X icasefs/warning &&
> grep x icasefs/warning &&
> test_i18ngrep "the following paths have collided" icasefs/warning
> -- 8< --



-- 
Duy


Re: [PATCH v5] clone: report duplicate entries on case-insensitive filesystems

2018-11-19 Thread Duy Nguyen
First of all, Ramsay, it would be great if you could test the below
patch and see if it works on Cygwin. I assume since Cygwin shares the
underlying filesystem, it will share the same "no trusting inode"
issue with native builds (or it calculates inodes anyway using some
other source?).

Back to the APFS problem...

On Mon, Nov 19, 2018 at 07:24:26PM +0100, Duy Nguyen wrote:
> Could you send me the "index" file in  t/trash\
> directory.t5601-clone/icasefs/bogus/.git/index ? Also the output of
> "stat /path/to/icase/bogus/x"
> 
> My only explanation is somehow the inode value we save is not the same
> one on disk, which is weird and could even cause other problems. I'd
> like to know why this happens before trying to fix anything.

Thanks Carlo for the file and "stat" output. The problem is APFS has
64-bit inode (according to the Internet) while we store inodes as
32-bit, so it's truncated. Which means this comparison

sd_ino == st_ino

is never true because sd_ino is truncated (0x2121063) while st_ino is
not (0x202121063).

Carlo, it would be great if you could test this patch also with
APFS. It should fix problem. We will have to deal with the same
truncated inode elsewhere to make sure we index refresh performance
does not degrade on APFS. But that's a separate problem. Thank you for
bringing this up.

-- 8< --
diff --git a/entry.c b/entry.c
index 5d136c5d55..809d3e2ba7 100644
--- a/entry.c
+++ b/entry.c
@@ -404,13 +404,13 @@ static void mark_colliding_entries(const struct checkout 
*state,
 {
int i, trust_ino = check_stat;
 
-#if defined(GIT_WINDOWS_NATIVE)
+#if defined(GIT_WINDOWS_NATIVE) || defined(__CYGWIN__)
trust_ino = 0;
 #endif
 
ce->ce_flags |= CE_MATCHED;
 
-   for (i = 0; i < state->istate->cache_nr; i++) {
+   for (i = 0; i < trust_ino && state->istate->cache_nr; i++) {
struct cache_entry *dup = state->istate->cache[i];
 
if (dup == ce)
@@ -419,10 +419,24 @@ static void mark_colliding_entries(const struct checkout 
*state,
if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE))
continue;
 
-   if ((trust_ino && dup->ce_stat_data.sd_ino == st->st_ino) ||
-   (!trust_ino && !fspathcmp(ce->name, dup->name))) {
+   if (dup->ce_stat_data.sd_ino == (unsigned int)st->st_ino) {
dup->ce_flags |= CE_MATCHED;
+   return;
+   }
+   }
+
+   for (i = 0; i < state->istate->cache_nr; i++) {
+   struct cache_entry *dup = state->istate->cache[i];
+
+   if (dup == ce)
break;
+
+   if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE))
+   continue;
+
+   if (!fspathcmp(ce->name, dup->name)) {
+   dup->ce_flags |= CE_MATCHED;
+   return;
}
}
 }
diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh
index f1a49e94f5..c28d51bd59 100755
--- a/t/t5601-clone.sh
+++ b/t/t5601-clone.sh
@@ -628,7 +628,7 @@ test_expect_success 'clone on case-insensitive fs' '
)
 '
 
-test_expect_success !MINGW,!CYGWIN,CASE_INSENSITIVE_FS 'colliding file 
detection' '
+test_expect_success !MINGW,CASE_INSENSITIVE_FS 'colliding file detection' '
grep X icasefs/warning &&
grep x icasefs/warning &&
test_i18ngrep "the following paths have collided" icasefs/warning
-- 8< --


Re: [PATCH v5] clone: report duplicate entries on case-insensitive filesystems

2018-11-19 Thread Carlo Arenas
On Mon, Nov 19, 2018 at 9:23 AM Ramsay Jones
 wrote:
> ok 99 # skip colliding file detection (missing !CYGWIN of 
> !MINGW,!CYGWIN,CASE_INSENSITIVE_FS)

you need to enable this specific test first (removing !CYGWIN) so it
doesn't get skipped

Carlo


Re: [PATCH v5] clone: report duplicate entries on case-insensitive filesystems

2018-11-19 Thread Duy Nguyen
On Mon, Nov 19, 2018 at 6:14 PM Carlo Arenas  wrote:
>
> On Mon, Nov 19, 2018 at 4:28 AM Torsten Bögershausen  wrote:
> >
> > Did you test it on Mac ?
>
> macOS 10.14.1 but only using APFS, did you test my patch with HFS+?
>
> > So what exactly are you trying to fix ?
>
> I get
>
> not ok 99 - colliding file detection
> #
> # grep X icasefs/warning &&
> # grep x icasefs/warning &&
> # test_i18ngrep "the following paths have collided" icasefs/warning
> #
>
> and the output of "warning" only shows one of the conflicting files,
> instead of both:
>
> Cloning into 'bogus'...
> done.
> warning: the following paths have collided (e.g. case-sensitive paths
> on a case-insensitive filesystem) and only one from the same
> colliding group is in the working tree:
>
>   'x'
>
> Carlo

Could you send me the "index" file in  t/trash\
directory.t5601-clone/icasefs/bogus/.git/index ? Also the output of
"stat /path/to/icase/bogus/x"

My only explanation is somehow the inode value we save is not the same
one on disk, which is weird and could even cause other problems. I'd
like to know why this happens before trying to fix anything.
-- 
Duy


Re: [PATCH v5] clone: report duplicate entries on case-insensitive filesystems

2018-11-19 Thread Ramsay Jones



On 19/11/2018 12:28, Torsten Bögershausen wrote:
> On 2018-11-19 09:20, Carlo Marcelo Arenas Belón wrote:
>> While I don't have an HFS+ volume to test, I suspect this patch should be
>> needed for both, even if I have to say thay even the broken output was
>> better than the current state.
>>
>> Travis seems to be using a case sensitive filesystem so wouldn't catch this.
>>
>> Was windows/cygwin tested?
>>
>> Carlo
>> -- >8 --
>> Subject: [PATCH] entry: fix t5061 on macOS
>>
>> b878579ae7 ("clone: report duplicate entries on case-insensitive 
>> filesystems",
>> 2018-08-17) was tested on Linux with an excemption for Windows that needs
>> to be expanded for macOS (using APFS), which then would show :
>>
>> $ git clone git://git.kernel.org/pub/scm/docs/man-pages/man-pages.git
>> warning: the following paths have collided (e.g. case-sensitive paths
>> on a case-insensitive filesystem) and only one from the same
>> colliding group is in the working tree:
>>
>>   'man2/_Exit.2'
>>   'man2/_exit.2'
>>   'man3/NAN.3'
>>   'man3/nan.3'
>>
>> Signed-off-by: Carlo Marcelo Arenas Belón 
>> ---
>>  entry.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/entry.c b/entry.c
>> index 5d136c5d55..3845f570f7 100644
>> --- a/entry.c
>> +++ b/entry.c
>> @@ -404,7 +404,7 @@ static void mark_colliding_entries(const struct checkout 
>> *state,
>>  {
>>  int i, trust_ino = check_stat;
>>  
>> -#if defined(GIT_WINDOWS_NATIVE)
>> +#if defined(GIT_WINDOWS_NATIVE) || defined(__APPLE__)
>>  trust_ino = 0;
>>  #endif
>>  
>>
> 
> Sorry,
> but I can't reproduce your problem here.
> 
> Did you test it on Mac ?
> If I run t5601 on a case sensitive files system
> (Mac, mounted NFS, exported from Linux)
> I get:
> ok 99 # skip colliding file detection (missing CASE_INSENSITIVE_FS of
> !MINGW,!CYGWIN,CASE_INSENSITIVE_FS)

I tested v2.20.0-rc0 on cygwin last night and it passed just fine.
I just ran t5601-clone.sh on its own and got:

$ ./t5601-clone.sh
...
ok 98 - clone on case-insensitive fs
ok 99 # skip colliding file detection (missing !CYGWIN of 
!MINGW,!CYGWIN,CASE_INSENSITIVE_FS)
ok 100 - partial clone
ok 101 - partial clone: warn if server does not support object filtering
ok 102 - batch missing blob request during checkout
ok 103 - batch missing blob request does not inadvertently try to fetch 
gitlinks
# passed all 103 test(s)
# SKIP no web server found at '/usr/sbin/apache2'
1..103
$ 

ATB,
Ramsay Jones


Re: [PATCH v5] clone: report duplicate entries on case-insensitive filesystems

2018-11-19 Thread Carlo Arenas
On Mon, Nov 19, 2018 at 4:28 AM Torsten Bögershausen  wrote:
>
> Did you test it on Mac ?

macOS 10.14.1 but only using APFS, did you test my patch with HFS+?

> So what exactly are you trying to fix ?

I get

not ok 99 - colliding file detection
#
# grep X icasefs/warning &&
# grep x icasefs/warning &&
# test_i18ngrep "the following paths have collided" icasefs/warning
#

and the output of "warning" only shows one of the conflicting files,
instead of both:

Cloning into 'bogus'...
done.
warning: the following paths have collided (e.g. case-sensitive paths
on a case-insensitive filesystem) and only one from the same
colliding group is in the working tree:

  'x'

Carlo


Re: [PATCH v5] clone: report duplicate entries on case-insensitive filesystems

2018-11-19 Thread Torsten Bögershausen
On 2018-11-19 09:20, Carlo Marcelo Arenas Belón wrote:
> While I don't have an HFS+ volume to test, I suspect this patch should be
> needed for both, even if I have to say thay even the broken output was
> better than the current state.
> 
> Travis seems to be using a case sensitive filesystem so wouldn't catch this.
> 
> Was windows/cygwin tested?
> 
> Carlo
> -- >8 --
> Subject: [PATCH] entry: fix t5061 on macOS
> 
> b878579ae7 ("clone: report duplicate entries on case-insensitive filesystems",
> 2018-08-17) was tested on Linux with an excemption for Windows that needs
> to be expanded for macOS (using APFS), which then would show :
> 
> $ git clone git://git.kernel.org/pub/scm/docs/man-pages/man-pages.git
> warning: the following paths have collided (e.g. case-sensitive paths
> on a case-insensitive filesystem) and only one from the same
> colliding group is in the working tree:
> 
>   'man2/_Exit.2'
>   'man2/_exit.2'
>   'man3/NAN.3'
>   'man3/nan.3'
> 
> Signed-off-by: Carlo Marcelo Arenas Belón 
> ---
>  entry.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/entry.c b/entry.c
> index 5d136c5d55..3845f570f7 100644
> --- a/entry.c
> +++ b/entry.c
> @@ -404,7 +404,7 @@ static void mark_colliding_entries(const struct checkout 
> *state,
>  {
>   int i, trust_ino = check_stat;
>  
> -#if defined(GIT_WINDOWS_NATIVE)
> +#if defined(GIT_WINDOWS_NATIVE) || defined(__APPLE__)
>   trust_ino = 0;
>  #endif
>  
> 

Sorry,
but I can't reproduce your problem here.

Did you test it on Mac ?
If I run t5601 on a case sensitive files system
(Mac, mounted NFS, exported from Linux)
I get:
ok 99 # skip colliding file detection (missing CASE_INSENSITIVE_FS of
!MINGW,!CYGWIN,CASE_INSENSITIVE_FS)

And if I run it on a case-insensitive HFS+,
I get
ok 99 - colliding file detection

So what exactly are you trying to fix ?



[PATCH v5] clone: report duplicate entries on case-insensitive filesystems

2018-11-19 Thread Carlo Marcelo Arenas Belón
While I don't have an HFS+ volume to test, I suspect this patch should be
needed for both, even if I have to say thay even the broken output was
better than the current state.

Travis seems to be using a case sensitive filesystem so wouldn't catch this.

Was windows/cygwin tested?

Carlo
-- >8 --
Subject: [PATCH] entry: fix t5061 on macOS

b878579ae7 ("clone: report duplicate entries on case-insensitive filesystems",
2018-08-17) was tested on Linux with an excemption for Windows that needs
to be expanded for macOS (using APFS), which then would show :

$ git clone git://git.kernel.org/pub/scm/docs/man-pages/man-pages.git
warning: the following paths have collided (e.g. case-sensitive paths
on a case-insensitive filesystem) and only one from the same
colliding group is in the working tree:

  'man2/_Exit.2'
  'man2/_exit.2'
  'man3/NAN.3'
  'man3/nan.3'

Signed-off-by: Carlo Marcelo Arenas Belón 
---
 entry.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/entry.c b/entry.c
index 5d136c5d55..3845f570f7 100644
--- a/entry.c
+++ b/entry.c
@@ -404,7 +404,7 @@ static void mark_colliding_entries(const struct checkout 
*state,
 {
int i, trust_ino = check_stat;
 
-#if defined(GIT_WINDOWS_NATIVE)
+#if defined(GIT_WINDOWS_NATIVE) || defined(__APPLE__)
trust_ino = 0;
 #endif
 
-- 
2.20.0.rc0



Re: [PATCH v5] clone: report duplicate entries on case-insensitive filesystems

2018-08-17 Thread Torsten Bögershausen
On Fri, Aug 17, 2018 at 06:16:45PM +0200, Nguyễn Thái Ngọc Duy wrote:

The whole patch looks good to me.
(I was just sending a different version, but your version is better :-)

One minor remark, should the line
warning: the following paths have collided 
start with a capital letter:
Warning: the following paths have collided 

> Paths that only differ in case work fine in a case-sensitive
> filesystems, but if those repos are cloned in a case-insensitive one,
> you'll get problems. The first thing to notice is "git status" will
> never be clean with no indication what exactly is "dirty".
> 
> This patch helps the situation a bit by pointing out the problem at
> clone time. Even though this patch talks about case sensitivity, the
> patch makes no assumption about folding rules by the filesystem. It
> simply observes that if an entry has been already checked out at clone
> time when we're about to write a new path, some folding rules are
> behind this.
> 
> In the case that we can't rely on filesystem (via inode number) to do
> this check, fall back to fspathcmp() which is not perfect but should
> not give false positives.
> 
> This patch is tested with vim-colorschemes and Sublime-Gitignore
> repositories on a JFS partition with case insensitive support on
> Linux.

Now even tested under Mac OS/HFS+

[]
>  '
>  
> +test_expect_success !MINGW,!CYGWIN,CASE_INSENSITIVE_FS 'colliding file 
> detection' '

My ambition is to run the test under Windows (both CYGWIN and native) next week,
so that we can remove !MINGW and !CYGWIN 



Re: [PATCH v5] clone: report duplicate entries on case-insensitive filesystems

2018-08-17 Thread Duy Nguyen
On Fri, Aug 17, 2018 at 10:20:36AM -0700, Junio C Hamano wrote:
> I highly suspect that the above was written in that way to reduce
> the indentation level, but the right way to reduce the indentation
> level, if it bothers readers too much, is to make the whole thing
> inside the above if (o->clone) into a dedicated helper function
> "void report_collided_checkout(void)", I would think.

I read my mind. I thought of separating into a helper function too,
but was not happy that the clearing CE_MATCHED in preparation for this
test is in check_updates(), but the cleaning up CE_MATCHED() is in the
helper function.

So here is the version that separates _both_ phases into helper
functions.

-- 8< --
Subject: [PATCH v6] clone: report duplicate entries on case-insensitive 
filesystems

Paths that only differ in case work fine in a case-sensitive
filesystems, but if those repos are cloned in a case-insensitive one,
you'll get problems. The first thing to notice is "git status" will
never be clean with no indication what exactly is "dirty".

This patch helps the situation a bit by pointing out the problem at
clone time. Even though this patch talks about case sensitivity, the
patch makes no assumption about folding rules by the filesystem. It
simply observes that if an entry has been already checked out at clone
time when we're about to write a new path, some folding rules are
behind this.

In the case that we can't rely on filesystem (via inode number) to do
this check, fall back to fspathcmp() which is not perfect but should
not give false positives.

This patch is tested with vim-colorschemes and Sublime-Gitignore
repositories on a JFS partition with case insensitive support on
Linux.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 builtin/clone.c  |  1 +
 cache.h  |  1 +
 entry.c  | 31 +++
 t/t5601-clone.sh |  8 +++-
 unpack-trees.c   | 47 +++
 unpack-trees.h   |  1 +
 6 files changed, 88 insertions(+), 1 deletion(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 5c439f1394..0702b0e9d0 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -747,6 +747,7 @@ static int checkout(int submodule_progress)
memset(, 0, sizeof opts);
opts.update = 1;
opts.merge = 1;
+   opts.clone = 1;
opts.fn = oneway_merge;
opts.verbose_update = (option_verbosity >= 0);
opts.src_index = _index;
diff --git a/cache.h b/cache.h
index 8b447652a7..6d6138f4f1 100644
--- a/cache.h
+++ b/cache.h
@@ -1455,6 +1455,7 @@ struct checkout {
unsigned force:1,
 quiet:1,
 not_new:1,
+clone:1,
 refresh_cache:1;
 };
 #define CHECKOUT_INIT { NULL, "" }
diff --git a/entry.c b/entry.c
index b5d1d3cf23..8766e27255 100644
--- a/entry.c
+++ b/entry.c
@@ -399,6 +399,34 @@ static int check_path(const char *path, int len, struct 
stat *st, int skiplen)
return lstat(path, st);
 }
 
+static void mark_colliding_entries(const struct checkout *state,
+  struct cache_entry *ce, struct stat *st)
+{
+   int i, trust_ino = check_stat;
+
+#if defined(GIT_WINDOWS_NATIVE)
+   trust_ino = 0;
+#endif
+
+   ce->ce_flags |= CE_MATCHED;
+
+   for (i = 0; i < state->istate->cache_nr; i++) {
+   struct cache_entry *dup = state->istate->cache[i];
+
+   if (dup == ce)
+   break;
+
+   if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE))
+   continue;
+
+   if ((trust_ino && dup->ce_stat_data.sd_ino == st->st_ino) ||
+   (!trust_ino && !fspathcmp(ce->name, dup->name))) {
+   dup->ce_flags |= CE_MATCHED;
+   break;
+   }
+   }
+}
+
 /*
  * Write the contents from ce out to the working tree.
  *
@@ -455,6 +483,9 @@ int checkout_entry(struct cache_entry *ce,
return -1;
}
 
+   if (state->clone)
+   mark_colliding_entries(state, ce, );
+
/*
 * We unlink the old file, to get the new one with the
 * right permissions (including umask, which is nasty
diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh
index 0b62037744..f2eb73bc74 100755
--- a/t/t5601-clone.sh
+++ b/t/t5601-clone.sh
@@ -624,10 +624,16 @@ test_expect_success 'clone on case-insensitive fs' '
git hash-object -w -t tree --stdin) &&
c=$(git commit-tree -m bogus $t) &&
git update-ref refs/heads/bogus $c &&
-   git clone -b bogus . bogus
+   git clone -b bogus . bogus 2>warning
)
 '
 
+test_expect_success !MINGW,!CYGWIN,CASE_INSENSITIVE_FS 'colliding file 
detection' '
+   grep X icasefs/warning &&
+   grep x icasefs/warning &&
+   test_i18ngrep "the following paths have collided" 

Re: [PATCH v5] clone: report duplicate entries on case-insensitive filesystems

2018-08-17 Thread Junio C Hamano
Nguyễn Thái Ngọc Duy   writes:

>  I still don't trust magic st_ino zero, or core.checkStat being zero
>  on Windows, so the #if condition still remains but it covers smallest
>  area possible and I tested it by manually make it "#if 1"
>
>  The fallback with fspathcmp() is only done when inode can't be
>  trusted because strcmp is more expensive and when fspathcmp() learns
>  more about real world in the future, it could become even more
>  expensive.
>
>  The output sorting is the result of Sublime-Gitignore repo being
>  reported recently. It's not perfect but it should help seeing the
>  groups in normal case.

Looks small and safe.

> +
> + if (o->clone) {
> + struct string_list list = STRING_LIST_INIT_NODUP;
> + int i;
> +
> + for (i = 0; i < index->cache_nr; i++) {
> + struct cache_entry *ce = index->cache[i];
> +
> + if (!(ce->ce_flags & CE_MATCHED))
> + continue;
> +
> + string_list_append(, ce->name);
> + ce->ce_flags &= ~CE_MATCHED;
> + }
> +
> + list.cmp = fspathcmp;
> + string_list_sort();
> +
> + if (list.nr)
> + warning(_("the following paths have collided (e.g. 
> case-sensitive paths\n"
> +   "on a case-insensitive filesystem) and only 
> one from the same\n"
> +   "colliding group is in the working tree:\n"));
> +
> + for (i = 0; i < list.nr; i++)
> + fprintf(stderr, "  '%s'\n", list.items[i].string);
> +
> + string_list_clear(, 0);

I would have written the "sort, show warning, and list" all inside
"if (list.nr)" block, leaving list-clear outside, which would have
made the logic a bit cleaner.  The reader does not have to bother
thinking "ah, when list.nr==0, this is a no-op anyway" to skip them
if written that way.

I highly suspect that the above was written in that way to reduce
the indentation level, but the right way to reduce the indentation
level, if it bothers readers too much, is to make the whole thing
inside the above if (o->clone) into a dedicated helper function
"void report_collided_checkout(void)", I would think.


[PATCH v5] clone: report duplicate entries on case-insensitive filesystems

2018-08-17 Thread Nguyễn Thái Ngọc Duy
Paths that only differ in case work fine in a case-sensitive
filesystems, but if those repos are cloned in a case-insensitive one,
you'll get problems. The first thing to notice is "git status" will
never be clean with no indication what exactly is "dirty".

This patch helps the situation a bit by pointing out the problem at
clone time. Even though this patch talks about case sensitivity, the
patch makes no assumption about folding rules by the filesystem. It
simply observes that if an entry has been already checked out at clone
time when we're about to write a new path, some folding rules are
behind this.

In the case that we can't rely on filesystem (via inode number) to do
this check, fall back to fspathcmp() which is not perfect but should
not give false positives.

This patch is tested with vim-colorschemes and Sublime-Gitignore
repositories on a JFS partition with case insensitive support on
Linux.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 v5 respects core.checkStat and sorts the output case-insensitively.

 I still don't trust magic st_ino zero, or core.checkStat being zero
 on Windows, so the #if condition still remains but it covers smallest
 area possible and I tested it by manually make it "#if 1"

 The fallback with fspathcmp() is only done when inode can't be
 trusted because strcmp is more expensive and when fspathcmp() learns
 more about real world in the future, it could become even more
 expensive.

 The output sorting is the result of Sublime-Gitignore repo being
 reported recently. It's not perfect but it should help seeing the
 groups in normal case.

 builtin/clone.c  |  1 +
 cache.h  |  1 +
 entry.c  | 31 +++
 t/t5601-clone.sh |  8 +++-
 unpack-trees.c   | 35 +++
 unpack-trees.h   |  1 +
 6 files changed, 76 insertions(+), 1 deletion(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 5c439f1394..0702b0e9d0 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -747,6 +747,7 @@ static int checkout(int submodule_progress)
memset(, 0, sizeof opts);
opts.update = 1;
opts.merge = 1;
+   opts.clone = 1;
opts.fn = oneway_merge;
opts.verbose_update = (option_verbosity >= 0);
opts.src_index = _index;
diff --git a/cache.h b/cache.h
index 8b447652a7..6d6138f4f1 100644
--- a/cache.h
+++ b/cache.h
@@ -1455,6 +1455,7 @@ struct checkout {
unsigned force:1,
 quiet:1,
 not_new:1,
+clone:1,
 refresh_cache:1;
 };
 #define CHECKOUT_INIT { NULL, "" }
diff --git a/entry.c b/entry.c
index b5d1d3cf23..8766e27255 100644
--- a/entry.c
+++ b/entry.c
@@ -399,6 +399,34 @@ static int check_path(const char *path, int len, struct 
stat *st, int skiplen)
return lstat(path, st);
 }
 
+static void mark_colliding_entries(const struct checkout *state,
+  struct cache_entry *ce, struct stat *st)
+{
+   int i, trust_ino = check_stat;
+
+#if defined(GIT_WINDOWS_NATIVE)
+   trust_ino = 0;
+#endif
+
+   ce->ce_flags |= CE_MATCHED;
+
+   for (i = 0; i < state->istate->cache_nr; i++) {
+   struct cache_entry *dup = state->istate->cache[i];
+
+   if (dup == ce)
+   break;
+
+   if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE))
+   continue;
+
+   if ((trust_ino && dup->ce_stat_data.sd_ino == st->st_ino) ||
+   (!trust_ino && !fspathcmp(ce->name, dup->name))) {
+   dup->ce_flags |= CE_MATCHED;
+   break;
+   }
+   }
+}
+
 /*
  * Write the contents from ce out to the working tree.
  *
@@ -455,6 +483,9 @@ int checkout_entry(struct cache_entry *ce,
return -1;
}
 
+   if (state->clone)
+   mark_colliding_entries(state, ce, );
+
/*
 * We unlink the old file, to get the new one with the
 * right permissions (including umask, which is nasty
diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh
index 0b62037744..f2eb73bc74 100755
--- a/t/t5601-clone.sh
+++ b/t/t5601-clone.sh
@@ -624,10 +624,16 @@ test_expect_success 'clone on case-insensitive fs' '
git hash-object -w -t tree --stdin) &&
c=$(git commit-tree -m bogus $t) &&
git update-ref refs/heads/bogus $c &&
-   git clone -b bogus . bogus
+   git clone -b bogus . bogus 2>warning
)
 '
 
+test_expect_success !MINGW,!CYGWIN,CASE_INSENSITIVE_FS 'colliding file 
detection' '
+   grep X icasefs/warning &&
+   grep x icasefs/warning &&
+   test_i18ngrep "the following paths have collided" icasefs/warning
+'
+
 partial_clone () {
   SERVER="$1" &&
   URL="$2" &&
diff --git a/unpack-trees.c b/unpack-trees.c
index