this is a "feature" of the Greek alphabet that the lowercase iota subscript can be capitalized in two different ways : either as a subscript below the uppercase main letter, or as a standard iota capitalized. The subscript form is a combining character, but not the non-subscript form. There shouls exist a special contextual rule for language specific casings, there's one already for the final sigma; but not the iota. It is not evident to handle and in fact the choice of case mapping is not specifically a lingusitic rule but a rendering style rule : for carved inscriptions, which are generally using only capitals, the combining forms are generally avoided and a reduced alphabet is used. For handwritten and cursive styles, the extended alphabet is used and this enables contextual forms including the small iota subscript and final small sigma an many combining signs (this also allows other placement rules for accents. For printing purpose or dispˆlay there's no rule, the document author enables or disables the extended alphabet (disabled geerally for rendering with small resolutions). The simple case mappngs however should preserve the distinctions present on the extended alphabet, but simple uppercasing text should not convert lowercase to all uppercase with an appended uppercase iota, even if this maps a lowercase letter to a titlecase one (it would be lossy, simplet casing rules should be lossless). case mappings in the ùain UCD however ignore the contextual rules and language-sˆpecific and style specific rules. But even if they are wrong this cannot be changed. The simple mappings in the main UCD file should not be assumed to be lossless. Actual case mappers do not use just these basic rules which are just the most frequent mappings assumed (anyway any kinds of case concersions introduces a loss, the degree of los is variable when mappings are not concerned by just a single pair of simple letters, see also the old difficulties about the German ess-tsett or sharp sign, and about many ligatures that became plain letters in some contexts, including the ampersand '&" sign which originates from the "et" ligature, or the German umlaut which inherits some old behavior of the superscripted small latin letter "e" behaving like the Greek iota script in Fraktur font styles)
2014-11-06 16:55 GMT+01:00 Mike FABIAN <[email protected]>: > > I have a question about “Uppercase” in DerivedCoreProperties.txt: > > U+1F80 ᾀ GREEK SMALL LETTER ALPHA WITH PSILI AND YPOGEGRAMMENI > is listed as “Lowercase” in > http://www.unicode.org/Public/7.0.0/ucd/DerivedCoreProperties.txt : > > 1F80..1F87 ; Lowercase # L& [8] GREEK SMALL LETTER ALPHA WITH > PSILI AND YPOGEGRAMMENI..GREEK SMALL LETTER ALPHA WITH DASIA AND > PERISPOMENI AND YPOGEGRAMMENI > > But > > “U+1F88 ᾈ GREEK CAPITAL LETTER ALPHA WITH PSILI AND PROSGEGRAMMENI” > is *not* listed as “Uppercase” in > http://www.unicode.org/Public/7.0.0/ucd/DerivedCoreProperties.txt . > > Although U+1F80 seems to be Uppercase according to > http://www.unicode.org/Public/7.0.0/ucd/UnicodeData.txt > because it has a tolower mapping to U+1F80: > > 1F80;GREEK SMALL LETTER ALPHA WITH PSILI AND YPOGEGRAMMENI;Ll;0;L;1F00 > 0345;;;;N;;;1F88;;1F88 > 1F88;GREEK CAPITAL LETTER ALPHA WITH PSILI AND > PROSGEGRAMMENI;Lt;0;L;1F08 0345;;;;N;;;;1F80; > > Is the information in DerivedCoreProperties.txt correct or > could this be a bug in DerivedCoreProperties.txt? > > The above is not only the case for U+1F88, but for several more characters. > > All the characters listed below have a tolower mapping in > http://www.unicode.org/Public/7.0.0/ucd/UnicodeData.txt > but are not listed in DerivedCoreProperties.txt as “Uppercase”: > > U+1F88 ᾈ has a tolower mapping to U+1F80 ᾀ > U+1F89 ᾉ has a tolower mapping to U+1F81 ᾁ > U+1F8A ᾊ has a tolower mapping to U+1F82 ᾂ > U+1F8B ᾋ has a tolower mapping to U+1F83 ᾃ > U+1F8C ᾌ has a tolower mapping to U+1F84 ᾄ > U+1F8D ᾍ has a tolower mapping to U+1F85 ᾅ > U+1F8E ᾎ has a tolower mapping to U+1F86 ᾆ > U+1F8F ᾏ has a tolower mapping to U+1F87 ᾇ > U+1F98 ᾘ has a tolower mapping to U+1F90 ᾐ > U+1F99 ᾙ has a tolower mapping to U+1F91 ᾑ > U+1F9A ᾚ has a tolower mapping to U+1F92 ᾒ > U+1F9B ᾛ has a tolower mapping to U+1F93 ᾓ > U+1F9C ᾜ has a tolower mapping to U+1F94 ᾔ > U+1F9D ᾝ has a tolower mapping to U+1F95 ᾕ > U+1F9E ᾞ has a tolower mapping to U+1F96 ᾖ > U+1F9F ᾟ has a tolower mapping to U+1F97 ᾗ > U+1FA8 ᾨ has a tolower mapping to U+1FA0 ᾠ > U+1FA9 ᾩ has a tolower mapping to U+1FA1 ᾡ > U+1FAA ᾪ has a tolower mapping to U+1FA2 ᾢ > U+1FAB ᾫ has a tolower mapping to U+1FA3 ᾣ > U+1FAC ᾬ has a tolower mapping to U+1FA4 ᾤ > U+1FAD ᾭ has a tolower mapping to U+1FA5 ᾥ > U+1FAE ᾮ has a tolower mapping to U+1FA6 ᾦ > U+1FAF ᾯ has a tolower mapping to U+1FA7 ᾧ > U+1FBC ᾼ has a tolower mapping to U+1FB3 ᾳ > U+1FCC ῌ has a tolower mapping to U+1FC3 ῃ > U+1FFC ῼ has a tolower mapping to U+1FF3 ῳ > > Is that correct or a bug? > > -- > 📧 Mike FABIAN <[email protected]> > 睡眠不足はいい仕事の敵だ。 > _______________________________________________ > Unicode mailing list > [email protected] > http://unicode.org/mailman/listinfo/unicode >
_______________________________________________ Unicode mailing list [email protected] http://unicode.org/mailman/listinfo/unicode

