2013/1/31 Joó Ádám <[email protected]>: >> Blame the invention of the dot over the i, or the convention of omitting it >> when adding accents, or the adoption much later of a specifically dotless i >> into the Turkish alphabet... > > Or the invention of a soft accent, for that matter. If the dot would > be explicitly encoded in all cases, no problem would arise.
It would be wrong. The soft dot initially did not exist ans appeared only as a glyphic feature in some medieval calligraphy for the cursive script). Today the presence of this soft-dot is not justified in most languages as it carries absolutely no semantic and CAN safely be omitted (even if most common non-cursive fonts still display it). It is also quite common to have this soft-dot decorated and replaced by something else, like a small heart , but in that case it carries a supplemental semantic and should be explicitely encoded. But why isn't there a COMBINING HEART ABOVE ? (most often this heart is drawn manually with strokes and not filled, but a filled variant would also exist and if it was encoded then we would have two combining characters: - COMBINING WHITE HEART ABOVE - COMBINING BLACK HEART ABOVE For usual Latin texts (except in Turkic alphabets), the soft-dot should never be encoded as it is a pure typographic feature : the soft-dotted small i (from ASCII) can equally be drawn with or without the dot, as long as there's no other combining character above it. The encoding for Turkic however SHOULD NEVER use the soft-dotted i alone, it should be either the explicit dotless i, or the letter i with a combining dot above (but for this one, it should have better been encoded as dotless i + combining dot above, so that Turkinc languages would have avoided all confusions but using only the dotless i). But there's a long history now for using the soft-dotted i to encode the hard dotted i used in Turkic alphabets, so both should be treated as equivalent, even if they are not strictly canonically equivalent, and this is problematic unless we use collation rules to treat them equivalent for all levels except the last binary level for a few applications that still want to make distinctions for a multinigual context or when the language is not determined). Let's keep the ASCII small i as it is : always soft-dotted, with an optional dot above, which MUST disappear when there's any other combining character above it or attached above. For all other cases, where it MUST NEVER take show a dot, use the dotless i, and where it MUST ALWAYS show the dot, use soft-dotted i+dot above preferably (because this is the current practice, which also matches the Turkic special casing rules in the UCD), or mostly equivalently dotless i + dot above (knowing that it is a confusable which should be listed as such in the auxiliary UCD file of confusables, because it is not canonically equivalent and not even compatibility equivalent). Same consideration for the soft-dotted j.

