Thanks for your wonderful response. I just used the two letters that I mentioned as an example and as many on this forum might know already that the African letter e with dot below is a distinguished letter on its own like the o with dot below or s with dot below. The dot below has been an issue especially in font design or development and I think, till today it is only my company that has addressed that issue whereby an underline does not overlay prevent the dot from showing. Please see http://www.dnetcom.com/fonts/ariyaimage2.html
Instead of choosing e as the character base, e with dot below and o with dot below can be used as you rightly suggested, then the accents can be composed with any of the two. Dele Olawole ----- Original Message ----- From: "Philippe Verdy" <[EMAIL PROTECTED]> To: "African Oracle" <[EMAIL PROTECTED]> Cc: "Unicode List" <[EMAIL PROTECTED]> Sent: Tuesday, May 04, 2004 11:14 PM Subject: Re: Just if and where is the then? > From: "African Oracle" <[EMAIL PROTECTED]> > > If a can have U+0061 and have a composite that is U+00e2...U+... > > If e can have U+0065 and have a composite that is U+00ea...U+... > > > > Then why is e with accented grave or acute and dot below cannot be assigned > > a single unicode value instead of the combinational values 1EB9 0301 and > > etc.... > > > > Since UNICODE is gradually becoming a defacto, I still think it will not be > > a bad idea to have such composite values. > > I think that the response is that decompositions come from the need to support > roundtrips with legacy preexisting standards. This justifies the need to offer > canonical equivalences and normalizations. > > Outside this, I don't think there's a preexisting African standard with which > such canonical equivalence is needed. In fact the existence of multiple ways to > encode the same characters is a pollution, but something needed to make Unicode > work and interoperate with widely used previous legacy standards. > > Finally, there has been a contractual agreement between Unicode, ISO/IEC 10646 > and other standard bodies, to keep a "stability policy" for normalizations. Due > to this policy, it's impossible now to define a canonical equivalence between a > newly-encoded precombined character and a sequence composed of preexisting base > letters and diacritics. > > So this mean that the only way to include e-with-acute-and-dot-below would be to > include it as a new distinct code point, WITHOUT any canonical equivalence. This > is not really a problem as long as the African languages needing this character > will adopt a consistant representation. But you will see immediately that it > will become impossible to define a standard canonical equivalence between > characters entered in decomposed forms and newer characters entered as a single > precombined code point. For Unicode, ISO/IEC 10646, and for all other standards > which depend on Unicode and which have signed the policy agreement, these > sequences will be considered distinct, for ever. > > This won't be a problem if a new African standard is decided that decides to use > a single precombined code point (this standard should then really indicate that > the character is NOT decomposable). > > The other way to create a new decomposable character would be to define > decompositions containing at least one NEW codepoint. I doubt this would be > desired for the base letter e, or even for the acute accent. But it may be > possible for the dot below. > > One thing will mitigate this last approach: with how many base letters (possibly > precombined) must we define a composition with such new African dot below > character? Is the repertoire of letters with dot below completely closed > (including base letters with other diacritics)? As soon as such new African dot > below would be defined, all the possible preexisting letters would have to be > included in a decomposition pair. It seems difficult to achieve this goal with a > repertoire of African letters which is currently not bounded. (In the past it > was not a problem, but Unicode stability policies will not make this repertoire > extensible later once such African dot below diacritic would be introduced in > some version). > > So the simplest approach is to not define anything, and enter these African > letters in their decomposed form (with the exception of letters with overlaying > or ligaturing diacritics, which should be encoded separately, without > decompositions). > > Remember this: decompositions of Unicode characters is a pollution needed only > for supporting legacy standards and make them interoperable with or through > Unicode. > > This Unicode policy won't prevent the possible definition of a smaller African > subset with its own charset encoding where these letters are represented in > their precomposed form only; it will also be possible to define such possible > future standard (if there's a legitimate need for it) with a complete roundtrip > compatibility with Unicode decomposed characters. > > In summary, for African letters: there's no need (and it's in fact impossible > now) to encode in Unicode new letters with dots below unless the base letter is > also absent from Unicode. But barred letters are good candidates for inclusion > as isolated (not decomposable) code points. > >

