These are using UTF-8. The problem is that there is multiple ways of referring 
to ö in UTF-8. 

Alex

Sent from my iPhone 5

> On 18 Mar 2014, at 01:51, Peter Lai <cowb...@gmail.com> wrote
> 
> why do I get the feeling apple made everything worse by not sticking
> with either UTF-16 or UTF-8 encodings and posix collation for Finder
> etc.?
> 
>> On Mon, Mar 17, 2014 at 8:18 PM, Bjoern Kahl <googlelo...@bjoern-kahl.de> 
>> wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>> 
>> 
>> To late in the night, hit "send" to early :-(
>> 
>> Am 18.03.14 00:31, schrieb Bjoern Kahl:
>>> 
>>> I apologize for this being a bit longer, but I tried to really
>>> clarify what normalization is all about nd how it affects ZFS on
>>> OSX.
>>> 
>>> Am 17.03.14 20:56, schrieb Philip Robar:
>>>> On Mon, Mar 17, 2014 at 3:40 PM, Dave Cottlehuber
>>>> <d...@jsonified.com> wrote:
>>> 
>>>>> On 17. März 2014 at 19:17:23, Philip Robar
>>>>> (philip.ro...@gmail.com) wrote:
>>>>>> I admit to being one whose eyes glaze over when the
>>>>>> discussion turns to i18n/l10n. So why should I use formD
>>>>>> normalization?
>>>>> 
>>>>> Because (as you point out ;-) poorly written software won't
>>>>> work.
>>>>> 
>>>>> iTunes is one of them, sadly.
>>> 
>>>> OK, let me try again. I read a description of the various
>>>> normalization forms and despite my being a native speaker of
>>>> English I couldn't find any meaning in the words. (Something,
>>>> unfortunately all too common when it comes to standards docs.)
>>>> So can you explain for the naive and mildly interested what
>>>> "formD" means?
>>> 
>>> The two normalization forms "formD" and "formC" mandate how
>>> certain characters outside the standard ASCII range (A-Z, a-z, 0-9
>>> and a few punctuation characters ".,-;" and some other) are
>>> represented.
>>> 
>>> 
>>> For example (note: the following is not fully technical correct,
>>> but illustrates the idea), the German letter "ö", named o_umlaut,
>>> could be represented as-is, that is as a single entity of Unicode
>>> code point number 246.
>>> 
>>> However, the "ö" could also be seen as a plain "o" with two dots
>>> (in printed text and modern German hand writing since 1978) or two
>>> short downward lines (in some German hand writing scripts, for
>>> example the Sütterlin script or other Kurrent scripts and hand
>>> writing taught before 1978).
>>> 
>>> Similarly, the "ö" can be encode in Unicode by a two character
>>> sequence, a plain "o" and a modifier '"' with the meaning "put two
>>> dots above the previous character" (note: '"' is not such a
>>> modifier, it serves here as a visualization of the actual
>>> modifier).
>>> 
>>> 
>>> Now, a text in normalization "formC" or "combined form" would have
>>> all characters, which can be represented by a single entity encode
>>> using this single character.
>>> 
>>> A text in "formD" or "decomposed form" normalization would have
>>> all characters that have some dots, accents, or other "additions"
>>> encoded using the plain base character followed by one or more
>>> modifiers.
>>> 
>>> It is normalized formD, if the modifiers come in a defined order,
>>> for example if a character has a dot above and below, the modifier
>>> for "dot below" comes always first.
>>> 
>>> It is in irregular formD, if all characters are decomposed, but
>>> the modifiers do not come in the defined order, in the example of a
>>> dot below and above a character, having the modifier for "dot
>>> above" coming before the modifier for "dot below" makes the string
>>> irregular.
>>> 
>>> 
>>> This whole mess is important, because it affects how sorting
>>> works. For example, two strings "o" + "dot_below" + "dot_above"
>>> and "o" + "dot_above" + "dot_below" should compare equal, because
>>> they carry the same information, despite the fact that they differ
>>> in their binary representation.
>>> 
>>> Normalizing make comparing and sorting easier.
>>> 
>>> 
>>> Normalization and ZFS and OSX =============================
>>> 
>>> 
>>> Why should we care?
>>> 
>>> Because Finder wants to sort directory listings, and for this needs
>>> to know how the byte sequence it gets from the VFS maps to
>>> scripting symbols and how these symbols order.
>>> 
>>> Finder expects text like filenames to be in formD.
>>> 
>>> For file systems like ZFS this means, they need to
>>> 
>>> (a) simple case: ignore encoding altogether and just deal with
>>> byte sequences.  Since names are stored and returned as they arrive
>>> from the Finder & Co. no Problem arises.  (In practice, problems
>>> arise when the using terminal or applications that don't follow
>>> Apple's encoding rules, because names in the wrong encoding could
>>> end up on the file system.)
>>> 
>>> (b) complex case: Convert the internal form to and from formD when
>>> communicating with the VFS (and through it with higher levels like
>>> Finder)
>>> 
>>> In case of (b) we have two implementation choices:
>>> 
>>> (1) stick to the rules and really do the conversion, in both
>>> directions, and verifying that what ever we get from the VFS is
>>> actually in formD (it might not, when using terminal or 3rd party
>>> applications not following Apple's encoding rules).  In that case,
>>> the setting of the normalization property doesn't matter, because
>>> it controls how names are recorded *on* *disk*, and this encoding
>>> would *never* be exposed to the VFS.
>>> 
>>> (2) be lazy and essentially do (a), that is present the names to
>>> VFS in the form mandated by the normalization property when
>>> reading, i.e. pass-through, but still do a best effort to force
>>> names received from the VFS into the form mandated by normalization
>>> property when writing.
>> 
>> That should have read:
>> 
>> (2) be lazy and essentially do (a) but require the user to set "formD"
>> as value for he normalization property and then present the names to
>> VFS in the form found on disk, but still do a best effort to force
>> names received from the VFS into the form mandated by normalization
>> property when writing, in order not to taint a ZFS pool originating
>> from some other system.
>> 
>> Obviously (b.2) isn't a real option.
>> 
>> 
>> 
>>> I hope this answers the question and sheds some light on the
>>> problem of filename encoding.
>>> 
>>> 
>>> Best regards
>>> 
>>> Björn
>> 
>> - --
>> |     Bjoern Kahl   +++   Siegburg   +++    Germany     |
>> | "googlelogin@-my-domain-"   +++   www.bjoern-kahl.de  |
>> | Languages: German, English, Ancient Latin (a bit :-)) |
>> -----BEGIN PGP SIGNATURE-----
>> Version: GnuPG v1
>> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>> 
>> iQCVAgUBUyeQtVsDv2ib9OLFAQK3MQP+JEhwtmjyAwikJ+KRMdcKOqWxy/Sf1jjG
>> z2tVkM2BM2zkZAFV+iq3W3BwWHftESiKWRObzbLkvZjEhYUYxGfCbuTfD0f4V8Ng
>> oV5vjOkoxNCi82QiCDQq04vUlCEpbp0QSojguixLpBKPM4OisPYdGqoNo510w8cx
>> J9f+G88Iw10=
>> =2Z9E
>> -----END PGP SIGNATURE-----
>> 
>> --
>> 
>> ---
>> You received this message because you are subscribed to the Google Groups 
>> "zfs-macos" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to zfs-macos+unsubscr...@googlegroups.com.
>> For more options, visit https://groups.google.com/d/optout.
> 
> -- 
> 
> --- 
> You received this message because you are subscribed to the Google Groups 
> "zfs-macos" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to zfs-macos+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"zfs-macos" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to zfs-macos+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to