Re: BUG report: unicode normalization on APFS (Mac OS High Sierra)
Hi, On Fri, Apr 27, 2018 at 2:45 PM, Totsten Bögershausenwrote: > On 2018-04-26 19:23, Elijah Newren wrote: >> Sure. First, though, note that I can make it pass (or at least "not >> ok...TODO known breakage") with the following patch (may be >> whitespace-damaged by gmail): >> >> diff --git a/t/test-lib.sh b/t/test-lib.sh >> index 483c8d6d7..770b91f8c 100644 >> --- a/t/test-lib.sh >> +++ b/t/test-lib.sh >> @@ -1106,12 +1106,7 @@ test_lazy_prereq UTF8_NFD_TO_NFC ' >> auml=$(printf "\303\244") >> aumlcdiar=$(printf "\141\314\210") >> >"$auml" && >> - case "$(echo *)" in >> - "$aumlcdiar") >> - true ;; >> - *) >> - false ;; >> - esac >> + stat "$aumlcdiar" >/dev/null 2>/dev/null > > > Nicely analyzed and improved. > > The "stat" statement is technically correct. > I think that a more git-style fix would be > [] --- > + test -r "$aumlcdiar" > > instead of the stat. > > I looked into the 2 known breakages. > In short: they test use cases which are not sooo important for a user in > practice, but do a good test if the code is broken. > IOW: I can't see a need for immediate action. > > As you already did all the analyzes: > Do you want to send a patch ? You know, despite seeing the "test_expect_failure" and "TODO...known breakage" with these tests and even mentioning them, it somehow didn't sink in and I was still thinking that there might be some kind of unicode normalization handling in the codebase somewhere (similar to the case insensitivy handling that I've seen in a place or two) that now needed to be extended. I should have realized that test_expect_failure meant there wasn't, and thus all we needed to do was to mark it as continuing to fail with the new filesystem, Should have realized, but didn't. Oops. Anyway, it looks like you've already submitted a patch and marked it as having been reported by me, which is just fine. Thanks! Elijah
Re: BUG report: unicode normalization on APFS (Mac OS High Sierra)
On 2018-04-26 19:23, Elijah Newren wrote: On Thu, Apr 26, 2018 at 10:13 AM, Torsten Bögershausenwrote: Hm, thanks for the report. I don't have a high sierra box, but I can probably get one. t0050 -should- pass automagically, so I feel that I can do something. Unless someone is faster of course. Sweet, thanks for taking a look. Is it possible that you run debug=t verbose=t ./t0050-filesystem.sh and send the output to me ? Sure. First, though, note that I can make it pass (or at least "not ok...TODO known breakage") with the following patch (may be whitespace-damaged by gmail): diff --git a/t/test-lib.sh b/t/test-lib.sh index 483c8d6d7..770b91f8c 100644 --- a/t/test-lib.sh +++ b/t/test-lib.sh @@ -1106,12 +1106,7 @@ test_lazy_prereq UTF8_NFD_TO_NFC ' auml=$(printf "\303\244") aumlcdiar=$(printf "\141\314\210") >"$auml" && - case "$(echo *)" in - "$aumlcdiar") - true ;; - *) - false ;; - esac + stat "$aumlcdiar" >/dev/null 2>/dev/null Nicely analyzed and improved. The "stat" statement is technically correct. I think that a more git-style fix would be [] --- + test -r "$aumlcdiar" instead of the stat. I looked into the 2 known breakages. In short: they test use cases which are not sooo important for a user in practice, but do a good test if the code is broken. IOW: I can't see a need for immediate action. As you already did all the analyzes: Do you want to send a patch ?
Re: BUG report: unicode normalization on APFS (Mac OS High Sierra)
On Thu, Apr 26, 2018 at 10:13 AM, Torsten Bögershausenwrote: > Hm, > thanks for the report. > I don't have a high sierra box, but I can probably get one. > t0050 -should- pass automagically, so I feel that I can do something. > Unless someone is faster of course. Sweet, thanks for taking a look. > Is it possible that you run > debug=t verbose=t ./t0050-filesystem.sh > and send the output to me ? Sure. First, though, note that I can make it pass (or at least "not ok...TODO known breakage") with the following patch (may be whitespace-damaged by gmail): diff --git a/t/test-lib.sh b/t/test-lib.sh index 483c8d6d7..770b91f8c 100644 --- a/t/test-lib.sh +++ b/t/test-lib.sh @@ -1106,12 +1106,7 @@ test_lazy_prereq UTF8_NFD_TO_NFC ' auml=$(printf "\303\244") aumlcdiar=$(printf "\141\314\210") >"$auml" && - case "$(echo *)" in - "$aumlcdiar") - true ;; - *) - false ;; - esac + stat "$aumlcdiar" >/dev/null 2>/dev/null ' test_lazy_prereq AUTOIDENT ' I'm just worried there are bugs elsewhere in dealing with filesystems like this that would need to be fixed and that this papers over them. Anyway, the output you requested, at least for the last two failing tests, is: expecting success: git mv "$aumlcdiar" "$auml" && git commit -m rename fatal: destination exists, source=ä, destination=ä not ok 9 - rename (silent unicode normalization) # # git mv "$aumlcdiar" "$auml" && # git commit -m rename # expecting success: git reset --hard initial && git merge topic HEAD is now at 1b3caf6 initial Updating 1b3caf6..2db1bf9 error: The following untracked working tree files would be overwritten by merge: ä Please move or remove them before you merge. Aborting not ok 10 - merge (silent unicode normalization) # # git reset --hard initial && # git merge topic # # still have 1 known breakage(s) # failed 2 among remaining 9 test(s)
Re: BUG report: unicode normalization on APFS (Mac OS High Sierra)
On 26.04.18 18:48, Elijah Newren wrote: > On HFS (which appears to be the default Mac filesystem prior to High > Sierra), unicode names are "normalized" before recording. Thus with a > script like: > > mkdir tmp > cd tmp > > auml=$(printf "\303\244") > aumlcdiar=$(printf "\141\314\210") > >"$auml" > > echo "auml: " $(echo -n "$auml" | xxd) > echo "aumlcdiar: " $(echo -n "$aumlcdiar" | xxd) > echo "Dir contents: " $(echo -n * | xxd) > > echo "Stat auml: " "$(stat -f "%i %Sm %Su %N" "$auml")" > echo "Stat aumlcdiar:" "$(stat -f "%i %Sm %Su %N" "$aumlcdiar")" > > We see output like: > > auml: : c3a4 .. > aumlcdiar: : 61cc 88 a.. > Dir contents: : 61cc 88 a.. > Stat auml: 857473 Apr 26 09:40:40 2018 newren ä > Stat aumlcdiar: 857473 Apr 26 09:40:40 2018 newren ä > > On APFS, which appears to be the new default filesystem in Mac OS High > Sierra, we instead see: > > auml: : c3a4 .. > aumlcdiar: : 61cc 88 a.. > Dir contents: : c3a4 .. > Stat auml: 8591766636 Apr 26 09:40:59 2018 newren ä > Stat aumlcdiar: 8591766636 Apr 26 09:40:59 2018 newren ä > > i.e. APFS appears to record the filename as specified by the user, but > continues to allow the user to access it via any name that normalizes > to the same thing. This difference causes t0050-filesystem.sh to fail > the final two tests. I could change the "UTF8_NFD_TO_NFC" flag > checking in test-lib.sh to instead test the exit code of stat to make > it pass these two tests, but I have no idea if there are problems > elsewhere that this would just be papering over. > > I dislike Mac OS and avoid it, so I'd prefer to find someone else > motivated to fix this. If no one is, I may eventually try to fix this > up...in a year or three from now. But is someone else interested? > Would this serve as a good microproject for our microprojects list (or > are the internals hairy enough that this is too big of a project for > that list)? > > > Elijah > Hm, thanks for the report. I don't have a high sierra box, but I can probably get one. t0050 -should- pass automagically, so I feel that I can do something. Unless someone is faster of course. Is it possible that you run debug=t verbose=t ./t0050-filesystem.sh and send the output to me ?
BUG report: unicode normalization on APFS (Mac OS High Sierra)
On HFS (which appears to be the default Mac filesystem prior to High Sierra), unicode names are "normalized" before recording. Thus with a script like: mkdir tmp cd tmp auml=$(printf "\303\244") aumlcdiar=$(printf "\141\314\210") >"$auml" echo "auml: " $(echo -n "$auml" | xxd) echo "aumlcdiar: " $(echo -n "$aumlcdiar" | xxd) echo "Dir contents: " $(echo -n * | xxd) echo "Stat auml: " "$(stat -f "%i %Sm %Su %N" "$auml")" echo "Stat aumlcdiar:" "$(stat -f "%i %Sm %Su %N" "$aumlcdiar")" We see output like: auml: : c3a4 .. aumlcdiar: : 61cc 88 a.. Dir contents: : 61cc 88 a.. Stat auml: 857473 Apr 26 09:40:40 2018 newren ä Stat aumlcdiar: 857473 Apr 26 09:40:40 2018 newren ä On APFS, which appears to be the new default filesystem in Mac OS High Sierra, we instead see: auml: : c3a4 .. aumlcdiar: : 61cc 88 a.. Dir contents: : c3a4 .. Stat auml: 8591766636 Apr 26 09:40:59 2018 newren ä Stat aumlcdiar: 8591766636 Apr 26 09:40:59 2018 newren ä i.e. APFS appears to record the filename as specified by the user, but continues to allow the user to access it via any name that normalizes to the same thing. This difference causes t0050-filesystem.sh to fail the final two tests. I could change the "UTF8_NFD_TO_NFC" flag checking in test-lib.sh to instead test the exit code of stat to make it pass these two tests, but I have no idea if there are problems elsewhere that this would just be papering over. I dislike Mac OS and avoid it, so I'd prefer to find someone else motivated to fix this. If no one is, I may eventually try to fix this up...in a year or three from now. But is someone else interested? Would this serve as a good microproject for our microprojects list (or are the internals hairy enough that this is too big of a project for that list)? Elijah