Aaron Sherman <a...@ajs.com> wrote: > There's just an undefined codepoint smack in the middle of the Greek > uppercase letters (U+03A2). I'm sure the Unicode specs have a rationale for > that somewhere, but my guess is that there's some thousand-year-old debate > about the Greek alphabet behind it.
It becomes clearer if you also look at the corresponding lower-case characters: U+03A1 Greek capital letter rho U+03A2 (none) U+03A3 Greek capital letter sigma U+03C1 Greek small letter rho U+03C2 Greek small letter final sigma U+03C3 Greek small letter sigma Greek words written in lower-case that end in a sigma use a special glyph for that sigma; and Unicode allocates a codepoint to it for roundtripping to legacy character sets. There isn't a corresponding upper-case final sigma. Unicode leaves the gap in the upper-case Greek range for neatness, effectively: adding 0x20 to the numeric value of an upper-case character yields the corresponding lower-case version. > I think that "Ā" .. "Ē" should ĀĂĄĆĈĊČĎĐĒ If that's in the hope of producing a more "intuitive" result, then why not ĀB̄C̄D̄Ē? That's only partly serious. I'm acutely aware that choosing a baroque set of rules makes life harder for both implementers and users (and, in particular, risks ending up with an operator that has no practical non-trivial use cases). I note also that this A-macron and E-macron are in NFC. I think that, certainly by default, the difference between NFC and NFD should be hidden from users. That implies that, however "Ā" .. "Ē" behaves, the NFD version should behave identically; and that "B̄" .. F̄ should behave in the most equivalent way possible. -- Aaron Crane ** http://aaroncrane.co.uk/