From: "African Oracle" <[EMAIL PROTECTED]> > If a can have U+0061 and have a composite that is U+00e2...U+... > If e can have U+0065 and have a composite that is U+00ea...U+... > > Then why is e with accented grave or acute and dot below cannot be assigned > a single unicode value instead of the combinational values 1EB9 0301 and > etc.... > > Since UNICODE is gradually becoming a defacto, I still think it will not be > a bad idea to have such composite values.
I think that the response is that decompositions come from the need to support roundtrips with legacy preexisting standards. This justifies the need to offer canonical equivalences and normalizations. Outside this, I don't think there's a preexisting African standard with which such canonical equivalence is needed. In fact the existence of multiple ways to encode the same characters is a pollution, but something needed to make Unicode work and interoperate with widely used previous legacy standards. Finally, there has been a contractual agreement between Unicode, ISO/IEC 10646 and other standard bodies, to keep a "stability policy" for normalizations. Due to this policy, it's impossible now to define a canonical equivalence between a newly-encoded precombined character and a sequence composed of preexisting base letters and diacritics. So this mean that the only way to include e-with-acute-and-dot-below would be to include it as a new distinct code point, WITHOUT any canonical equivalence. This is not really a problem as long as the African languages needing this character will adopt a consistant representation. But you will see immediately that it will become impossible to define a standard canonical equivalence between characters entered in decomposed forms and newer characters entered as a single precombined code point. For Unicode, ISO/IEC 10646, and for all other standards which depend on Unicode and which have signed the policy agreement, these sequences will be considered distinct, for ever. This won't be a problem if a new African standard is decided that decides to use a single precombined code point (this standard should then really indicate that the character is NOT decomposable). The other way to create a new decomposable character would be to define decompositions containing at least one NEW codepoint. I doubt this would be desired for the base letter e, or even for the acute accent. But it may be possible for the dot below. One thing will mitigate this last approach: with how many base letters (possibly precombined) must we define a composition with such new African dot below character? Is the repertoire of letters with dot below completely closed (including base letters with other diacritics)? As soon as such new African dot below would be defined, all the possible preexisting letters would have to be included in a decomposition pair. It seems difficult to achieve this goal with a repertoire of African letters which is currently not bounded. (In the past it was not a problem, but Unicode stability policies will not make this repertoire extensible later once such African dot below diacritic would be introduced in some version). So the simplest approach is to not define anything, and enter these African letters in their decomposed form (with the exception of letters with overlaying or ligaturing diacritics, which should be encoded separately, without decompositions). Remember this: decompositions of Unicode characters is a pollution needed only for supporting legacy standards and make them interoperable with or through Unicode. This Unicode policy won't prevent the possible definition of a smaller African subset with its own charset encoding where these letters are represented in their precomposed form only; it will also be possible to define such possible future standard (if there's a legitimate need for it) with a complete roundtrip compatibility with Unicode decomposed characters. In summary, for African letters: there's no need (and it's in fact impossible now) to encode in Unicode new letters with dots below unless the base letter is also absent from Unicode. But barred letters are good candidates for inclusion as isolated (not decomposable) code points.

