These are using UTF-8. The problem is that there is multiple ways of referring to ö in UTF-8.
Alex Sent from my iPhone 5 > On 18 Mar 2014, at 01:51, Peter Lai <cowb...@gmail.com> wrote > > why do I get the feeling apple made everything worse by not sticking > with either UTF-16 or UTF-8 encodings and posix collation for Finder > etc.? > >> On Mon, Mar 17, 2014 at 8:18 PM, Bjoern Kahl <googlelo...@bjoern-kahl.de> >> wrote: >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >> >> >> To late in the night, hit "send" to early :-( >> >> Am 18.03.14 00:31, schrieb Bjoern Kahl: >>> >>> I apologize for this being a bit longer, but I tried to really >>> clarify what normalization is all about nd how it affects ZFS on >>> OSX. >>> >>> Am 17.03.14 20:56, schrieb Philip Robar: >>>> On Mon, Mar 17, 2014 at 3:40 PM, Dave Cottlehuber >>>> <d...@jsonified.com> wrote: >>> >>>>> On 17. März 2014 at 19:17:23, Philip Robar >>>>> (philip.ro...@gmail.com) wrote: >>>>>> I admit to being one whose eyes glaze over when the >>>>>> discussion turns to i18n/l10n. So why should I use formD >>>>>> normalization? >>>>> >>>>> Because (as you point out ;-) poorly written software won't >>>>> work. >>>>> >>>>> iTunes is one of them, sadly. >>> >>>> OK, let me try again. I read a description of the various >>>> normalization forms and despite my being a native speaker of >>>> English I couldn't find any meaning in the words. (Something, >>>> unfortunately all too common when it comes to standards docs.) >>>> So can you explain for the naive and mildly interested what >>>> "formD" means? >>> >>> The two normalization forms "formD" and "formC" mandate how >>> certain characters outside the standard ASCII range (A-Z, a-z, 0-9 >>> and a few punctuation characters ".,-;" and some other) are >>> represented. >>> >>> >>> For example (note: the following is not fully technical correct, >>> but illustrates the idea), the German letter "ö", named o_umlaut, >>> could be represented as-is, that is as a single entity of Unicode >>> code point number 246. >>> >>> However, the "ö" could also be seen as a plain "o" with two dots >>> (in printed text and modern German hand writing since 1978) or two >>> short downward lines (in some German hand writing scripts, for >>> example the Sütterlin script or other Kurrent scripts and hand >>> writing taught before 1978). >>> >>> Similarly, the "ö" can be encode in Unicode by a two character >>> sequence, a plain "o" and a modifier '"' with the meaning "put two >>> dots above the previous character" (note: '"' is not such a >>> modifier, it serves here as a visualization of the actual >>> modifier). >>> >>> >>> Now, a text in normalization "formC" or "combined form" would have >>> all characters, which can be represented by a single entity encode >>> using this single character. >>> >>> A text in "formD" or "decomposed form" normalization would have >>> all characters that have some dots, accents, or other "additions" >>> encoded using the plain base character followed by one or more >>> modifiers. >>> >>> It is normalized formD, if the modifiers come in a defined order, >>> for example if a character has a dot above and below, the modifier >>> for "dot below" comes always first. >>> >>> It is in irregular formD, if all characters are decomposed, but >>> the modifiers do not come in the defined order, in the example of a >>> dot below and above a character, having the modifier for "dot >>> above" coming before the modifier for "dot below" makes the string >>> irregular. >>> >>> >>> This whole mess is important, because it affects how sorting >>> works. For example, two strings "o" + "dot_below" + "dot_above" >>> and "o" + "dot_above" + "dot_below" should compare equal, because >>> they carry the same information, despite the fact that they differ >>> in their binary representation. >>> >>> Normalizing make comparing and sorting easier. >>> >>> >>> Normalization and ZFS and OSX ============================= >>> >>> >>> Why should we care? >>> >>> Because Finder wants to sort directory listings, and for this needs >>> to know how the byte sequence it gets from the VFS maps to >>> scripting symbols and how these symbols order. >>> >>> Finder expects text like filenames to be in formD. >>> >>> For file systems like ZFS this means, they need to >>> >>> (a) simple case: ignore encoding altogether and just deal with >>> byte sequences. Since names are stored and returned as they arrive >>> from the Finder & Co. no Problem arises. (In practice, problems >>> arise when the using terminal or applications that don't follow >>> Apple's encoding rules, because names in the wrong encoding could >>> end up on the file system.) >>> >>> (b) complex case: Convert the internal form to and from formD when >>> communicating with the VFS (and through it with higher levels like >>> Finder) >>> >>> In case of (b) we have two implementation choices: >>> >>> (1) stick to the rules and really do the conversion, in both >>> directions, and verifying that what ever we get from the VFS is >>> actually in formD (it might not, when using terminal or 3rd party >>> applications not following Apple's encoding rules). In that case, >>> the setting of the normalization property doesn't matter, because >>> it controls how names are recorded *on* *disk*, and this encoding >>> would *never* be exposed to the VFS. >>> >>> (2) be lazy and essentially do (a), that is present the names to >>> VFS in the form mandated by the normalization property when >>> reading, i.e. pass-through, but still do a best effort to force >>> names received from the VFS into the form mandated by normalization >>> property when writing. >> >> That should have read: >> >> (2) be lazy and essentially do (a) but require the user to set "formD" >> as value for he normalization property and then present the names to >> VFS in the form found on disk, but still do a best effort to force >> names received from the VFS into the form mandated by normalization >> property when writing, in order not to taint a ZFS pool originating >> from some other system. >> >> Obviously (b.2) isn't a real option. >> >> >> >>> I hope this answers the question and sheds some light on the >>> problem of filename encoding. >>> >>> >>> Best regards >>> >>> Björn >> >> - -- >> | Bjoern Kahl +++ Siegburg +++ Germany | >> | "googlelogin@-my-domain-" +++ www.bjoern-kahl.de | >> | Languages: German, English, Ancient Latin (a bit :-)) | >> -----BEGIN PGP SIGNATURE----- >> Version: GnuPG v1 >> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ >> >> iQCVAgUBUyeQtVsDv2ib9OLFAQK3MQP+JEhwtmjyAwikJ+KRMdcKOqWxy/Sf1jjG >> z2tVkM2BM2zkZAFV+iq3W3BwWHftESiKWRObzbLkvZjEhYUYxGfCbuTfD0f4V8Ng >> oV5vjOkoxNCi82QiCDQq04vUlCEpbp0QSojguixLpBKPM4OisPYdGqoNo510w8cx >> J9f+G88Iw10= >> =2Z9E >> -----END PGP SIGNATURE----- >> >> -- >> >> --- >> You received this message because you are subscribed to the Google Groups >> "zfs-macos" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to zfs-macos+unsubscr...@googlegroups.com. >> For more options, visit https://groups.google.com/d/optout. > > -- > > --- > You received this message because you are subscribed to the Google Groups > "zfs-macos" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to zfs-macos+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. -- --- You received this message because you are subscribed to the Google Groups "zfs-macos" group. To unsubscribe from this group and stop receiving emails from it, send an email to zfs-macos+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.