I did choose a bad example, but as you say, normalization is not preserved in the way you wanted.
Yes, the reason the iota subscript has has a special value is to put it at the end. As to whether text should be normalized before or after casefolding (or other case transformations) or both: I'd have to look at it in more detail. It was not intended to be an invariant that case operations preserve NF*, nor that case operations and NF* be commutative, although we may work towards that end. Mark ————— Δός μοι ποῦ στῶ, καὶ κινῶ τὴν γῆν — Ἀρχιμήδης [http://www.macchiato.com] ----- Original Message ----- From: "David Hopwood" <[EMAIL PROTECTED]> To: "Mark Davis" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Sent: Tuesday, October 30, 2001 00:28 Subject: Re: casefold o NFC = NFC o casefold? > -----BEGIN PGP SIGNED MESSAGE----- > > Mark Davis wrote: > > Sadly, case mapping does not preserve any normalization formats under a > > character-by-character transformation. The simplest example is the string > > \u1FB2\u0300. A character-by-character titlecase conversion produces: > > \u1FBA\u0345\u0300. > > \u1FB2\u0300 isn't NFC-normalised; its NFC-normalised form is \u1F70\u0345 > (alpha-varia-ypogegrammeni). Also, note that I said case-folding, not > mapping to uppercase, lowercase or titlecase. However, I see your point - > this example does demonstrate that my conjecture is false: > > casefold(NFC("\u1FB2\u0300")) = casefold("\u1F70\u0345") > = "\u1F70\u03B9" (alpha-varia, iota) > > NFC(casefold("\u1FB2\u0300")) = NFC("\u1F70\u03B9\u0300") > = "\u1F70\u1F76" (alpha, iota-varia) > > The intuitively correct result is the first one; the varia should > definitely be over the alpha, not the iota. I assume that's partly why > U+0345 (ypogegrammeni) has the highest combining class number. Doesn't > this imply that there should be a note in UTR #21 saying that NFC or > NFD normalisation should be done before case-folding? > > "\u1F70\u03B9" is NFC-normalised, though, so that is not a contradiction > to case-folding preserving normalisation. OTOH, this is: > > casefold(NFC("\u00DF\u0301")) = casefold("\u00DF\u0301") > = "ss\u0301" (s, s-acute) > > NFC(casefold(NFC("\u00DF\u0301")) = NFC("ss\u0301") > = "s\u015B" (s, s-acute) > > (not that "\u00DF\u0301" (eszett-acute) will occur in practice). > > - -- > David Hopwood <[EMAIL PROTECTED]> > > Home page & PGP public key: http://www.users.zetnet.co.uk/hopwood/ > RSA 2048-bit; fingerprint 71 8E A6 23 0E D3 4C E5 0F 69 8C D4 FA 66 15 01 > Nothing in this message is intended to be legally binding. If I revoke a > public key but refuse to specify why, it is because the private key has been > seized under the Regulation of Investigatory Powers Act; see www.fipr.org/rip > > > -----BEGIN PGP SIGNATURE----- > Version: 2.6.3i > Charset: noconv > > iQEVAwUBO95kcTkCAxeYt5gVAQFDqAgAmy0lVM25r5RRcYxnBB22ySiNKCnzlGld > qwauowY0L3D3j7daEuGdJ+tnqDzTPKEQtWoUN9cdzcdOjen6bwaQc2/jIty2U2g0 > oQByPUIFW+1oGzDbLhvphTAiXTnqrCuNV1TbjuV9FWNERMHxIfkU2D5QXMVHyhv5 > 3mcDpXcYD2FR0VJni7/M/Uc7sMkIttAqxH8htrF3SugW5qPoAmKyTtOqGBBBM7ZG > AsMQ6jKRzO+9GILLlao1p1/YwO2NpSrPfIaBB3wkxDavVOCIJmpDSvNxJaf4fvrw > N8ms2nAmpAmJSdm2GerUr0B75xnkFEiY4J/j3TEiOSgBgcmjz9k6Sw== > =xf62 > -----END PGP SIGNATURE----- > >