Re: BUG report: unicode normalization on APFS (Mac OS High Sierra)

2018-04-30 Thread Elijah Newren
Hi,

On Fri, Apr 27, 2018 at 2:45 PM, Totsten Bögershausen  wrote:
> On 2018-04-26 19:23, Elijah Newren wrote:

>> Sure.  First, though, note that I can make it pass (or at least "not
>> ok...TODO known breakage") with the following patch (may be
>> whitespace-damaged by gmail):
>>
>> diff --git a/t/test-lib.sh b/t/test-lib.sh
>> index 483c8d6d7..770b91f8c 100644
>> --- a/t/test-lib.sh
>> +++ b/t/test-lib.sh
>> @@ -1106,12 +1106,7 @@ test_lazy_prereq UTF8_NFD_TO_NFC '
>>  auml=$(printf "\303\244")
>>  aumlcdiar=$(printf "\141\314\210")
>>  >"$auml" &&
>> -   case "$(echo *)" in
>> -   "$aumlcdiar")
>> -   true ;;
>> -   *)
>> -   false ;;
>> -   esac
>> +   stat "$aumlcdiar" >/dev/null 2>/dev/null
>
>
> Nicely analyzed and improved.
>
> The "stat" statement is technically correct.
> I think that a more git-style fix would be
> [] ---
> +   test -r "$aumlcdiar"
>
> instead of the stat.
>
> I looked into the 2 known breakages.
> In short: they test use cases which are not sooo important for a user in
> practice, but do a good test if the code is broken.
> IOW: I can't see a need for immediate action.
>
> As you already did all the analyzes:
> Do you want to send a patch ?

You know, despite seeing the "test_expect_failure" and "TODO...known
breakage" with these tests and even mentioning them, it somehow didn't
sink in and I was still thinking that there might be some kind of
unicode normalization handling in the codebase somewhere (similar to
the case insensitivy handling that I've seen in a place or two) that
now needed to be extended.  I should have realized that
test_expect_failure meant there wasn't, and thus all we needed to do
was to mark it as continuing to fail with the new filesystem,  Should
have realized, but didn't.  Oops.

Anyway, it looks like you've already submitted a patch and marked it
as having been reported by me, which is just fine.  Thanks!

Elijah


Re: BUG report: unicode normalization on APFS (Mac OS High Sierra)

2018-04-27 Thread Totsten Bögershausen



On 2018-04-26 19:23, Elijah Newren wrote:

On Thu, Apr 26, 2018 at 10:13 AM, Torsten Bögershausen  wrote:

Hm,
thanks for the report.
I don't have a high sierra box, but I can probably get one.
t0050 -should- pass automagically, so I feel that I can do something.
Unless someone is faster of course.


Sweet, thanks for taking a look.


Is it possible that  you run
debug=t verbose=t ./t0050-filesystem.sh
and send the output to me ?


Sure.  First, though, note that I can make it pass (or at least "not
ok...TODO known breakage") with the following patch (may be
whitespace-damaged by gmail):

diff --git a/t/test-lib.sh b/t/test-lib.sh
index 483c8d6d7..770b91f8c 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -1106,12 +1106,7 @@ test_lazy_prereq UTF8_NFD_TO_NFC '
 auml=$(printf "\303\244")
 aumlcdiar=$(printf "\141\314\210")
 >"$auml" &&
-   case "$(echo *)" in
-   "$aumlcdiar")
-   true ;;
-   *)
-   false ;;
-   esac
+   stat "$aumlcdiar" >/dev/null 2>/dev/null


Nicely analyzed and improved.

The "stat" statement is technically correct.
I think that a more git-style fix would be
[] ---
+   test -r "$aumlcdiar"

instead of the stat.

I looked into the 2 known breakages.
In short: they test use cases which are not sooo important for a user in 
practice, but do a good test if the code is broken.

IOW: I can't see a need for immediate action.

As you already did all the analyzes:
Do you want to send a patch ?


Re: BUG report: unicode normalization on APFS (Mac OS High Sierra)

2018-04-26 Thread Elijah Newren
On Thu, Apr 26, 2018 at 10:13 AM, Torsten Bögershausen  wrote:
> Hm,
> thanks for the report.
> I don't have a high sierra box, but I can probably get one.
> t0050 -should- pass automagically, so I feel that I can do something.
> Unless someone is faster of course.

Sweet, thanks for taking a look.

> Is it possible that  you run
> debug=t verbose=t ./t0050-filesystem.sh
> and send the output to me ?

Sure.  First, though, note that I can make it pass (or at least "not
ok...TODO known breakage") with the following patch (may be
whitespace-damaged by gmail):

diff --git a/t/test-lib.sh b/t/test-lib.sh
index 483c8d6d7..770b91f8c 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -1106,12 +1106,7 @@ test_lazy_prereq UTF8_NFD_TO_NFC '
auml=$(printf "\303\244")
aumlcdiar=$(printf "\141\314\210")
>"$auml" &&
-   case "$(echo *)" in
-   "$aumlcdiar")
-   true ;;
-   *)
-   false ;;
-   esac
+   stat "$aumlcdiar" >/dev/null 2>/dev/null
 '

 test_lazy_prereq AUTOIDENT '


I'm just worried there are bugs elsewhere in dealing with filesystems
like this that would need to be fixed and that this papers over them.

Anyway, the output you requested, at least for the last two failing tests, is:


expecting success:
git mv "$aumlcdiar" "$auml" &&
git commit -m rename

fatal: destination exists, source=ä, destination=ä
not ok 9 - rename (silent unicode normalization)

#
# git mv "$aumlcdiar" "$auml" &&
# git commit -m rename
#

expecting success:
git reset --hard initial &&
git merge topic

HEAD is now at 1b3caf6 initial
Updating 1b3caf6..2db1bf9
error: The following untracked working tree files would be overwritten by merge:
ä
Please move or remove them before you merge.
Aborting
not ok 10 - merge (silent unicode normalization)

#
# git reset --hard initial &&
# git merge topic
#

# still have 1 known breakage(s)
# failed 2 among remaining 9 test(s)


Re: BUG report: unicode normalization on APFS (Mac OS High Sierra)

2018-04-26 Thread Torsten Bögershausen
On 26.04.18 18:48, Elijah Newren wrote:
> On HFS (which appears to be the default Mac filesystem prior to High
> Sierra), unicode names are "normalized" before recording.  Thus with a
> script like:
> 
> mkdir tmp
> cd tmp
> 
> auml=$(printf "\303\244")
> aumlcdiar=$(printf "\141\314\210")
> >"$auml"
> 
> echo "auml:  " $(echo -n "$auml" | xxd)
> echo "aumlcdiar: " $(echo -n "$aumlcdiar" | xxd)
> echo "Dir contents:  " $(echo -n * | xxd)
> 
> echo "Stat auml: " "$(stat -f "%i   %Sm   %Su %N" "$auml")"
> echo "Stat aumlcdiar:" "$(stat -f "%i   %Sm   %Su %N" "$aumlcdiar")"
> 
> We see output like:
> 
> auml:   : c3a4 ..
> aumlcdiar:  : 61cc 88 a..
> Dir contents:   : 61cc 88 a..
> Stat auml:  857473   Apr 26 09:40:40 2018   newren ä
> Stat aumlcdiar: 857473   Apr 26 09:40:40 2018   newren ä
> 
> On APFS, which appears to be the new default filesystem in Mac OS High
> Sierra, we instead see:
> 
> auml:   : c3a4 ..
> aumlcdiar:  : 61cc 88 a..
> Dir contents:   : c3a4 ..
> Stat auml:  8591766636   Apr 26 09:40:59 2018   newren ä
> Stat aumlcdiar: 8591766636   Apr 26 09:40:59 2018   newren ä
> 
> i.e. APFS appears to record the filename as specified by the user, but
> continues to allow the user to access it via any name that normalizes
> to the same thing.  This difference causes t0050-filesystem.sh to fail
> the final two tests.  I could change the "UTF8_NFD_TO_NFC" flag
> checking in test-lib.sh to instead test the exit code of stat to make
> it pass these two tests, but I have no idea if there are problems
> elsewhere that this would just be papering over.
> 
> I dislike Mac OS and avoid it, so I'd prefer to find someone else
> motivated to fix this.  If no one is, I may eventually try to fix this
> up...in a year or three from now.  But is someone else interested?
> Would this serve as a good microproject for our microprojects list (or
> are the internals hairy enough that this is too big of a project for
> that list)?
> 
> 
> Elijah
> 

Hm,
thanks for the report.
I don't have a high sierra box, but I can probably get one.
t0050 -should- pass automagically, so I feel that I can do something.
Unless someone is faster of course.

Is it possible that  you run
debug=t verbose=t ./t0050-filesystem.sh 
and send the output to me ?





BUG report: unicode normalization on APFS (Mac OS High Sierra)

2018-04-26 Thread Elijah Newren
On HFS (which appears to be the default Mac filesystem prior to High
Sierra), unicode names are "normalized" before recording.  Thus with a
script like:

mkdir tmp
cd tmp

auml=$(printf "\303\244")
aumlcdiar=$(printf "\141\314\210")
>"$auml"

echo "auml:  " $(echo -n "$auml" | xxd)
echo "aumlcdiar: " $(echo -n "$aumlcdiar" | xxd)
echo "Dir contents:  " $(echo -n * | xxd)

echo "Stat auml: " "$(stat -f "%i   %Sm   %Su %N" "$auml")"
echo "Stat aumlcdiar:" "$(stat -f "%i   %Sm   %Su %N" "$aumlcdiar")"

We see output like:

auml:   : c3a4 ..
aumlcdiar:  : 61cc 88 a..
Dir contents:   : 61cc 88 a..
Stat auml:  857473   Apr 26 09:40:40 2018   newren ä
Stat aumlcdiar: 857473   Apr 26 09:40:40 2018   newren ä

On APFS, which appears to be the new default filesystem in Mac OS High
Sierra, we instead see:

auml:   : c3a4 ..
aumlcdiar:  : 61cc 88 a..
Dir contents:   : c3a4 ..
Stat auml:  8591766636   Apr 26 09:40:59 2018   newren ä
Stat aumlcdiar: 8591766636   Apr 26 09:40:59 2018   newren ä

i.e. APFS appears to record the filename as specified by the user, but
continues to allow the user to access it via any name that normalizes
to the same thing.  This difference causes t0050-filesystem.sh to fail
the final two tests.  I could change the "UTF8_NFD_TO_NFC" flag
checking in test-lib.sh to instead test the exit code of stat to make
it pass these two tests, but I have no idea if there are problems
elsewhere that this would just be papering over.

I dislike Mac OS and avoid it, so I'd prefer to find someone else
motivated to fix this.  If no one is, I may eventually try to fix this
up...in a year or three from now.  But is someone else interested?
Would this serve as a good microproject for our microprojects list (or
are the internals hairy enough that this is too big of a project for
that list)?


Elijah