Dele a.k.a. "African Oracle" <oracle at africaservice dot com> wrote:
> GB is a different from G+B You do not pronunce the letters separately > but people that do not know anything about the language do which is > wrong. It is about correction and proper representation. What Michael and others have been trying to say is this: Unicode encodes characters, not languages. The word "character" means different things to ordinary people, depending on what language they speak and what script they write. "Characters" in Unicode do not always correspond 1-to-1 with "letters" in a given language's alphabet. Here are some quick and dirty definitions for our purposes: Character: the basic unit of text encoding. Letter: the basic unit of a language's orthography. Not necessarily the same as "character." Glyph: the visual representation of a character. Also not necessarily the same as "character." In Spanish, the combination "ch" is considered a distinct letter of the alphabet. It has its own name, "che." Children learn it as a letter that comes between "c" and "d". This is all good, but when it comes to representing text in computers, there is no separate "ch" letter in any of the encodings that people have used for decades. Spanish text includes the two characters "c" and "h". This has been true for decades, and it is also true when using Unicode. Likewise, in Yoruba, if there is no visual distinction between (1) the letter "GB" and (2) the two letters "G" and "B" that happen to appear together, as in your example, then the letter "GB" is encoded with the two characters "G" and "B". This does not deny the existence of a letter "GB" in the Yoruba language, it just dictates how that letter is encoded in computerized text. Now if you need to perform some other type of text processing, such as searching or sorting or spell-checking or line-breaking, then your software may need to understand the difference between the letter "GB" and the two letters "G" + "B". But this needs to be handled by the software, not the character encoding mechanism. > Here are few Yoruba alphabets which might not be new to you, so how > can you equate G+B with GB even if you claimed it has significant. How > significant is significant? > > A B D E E F G GB.... Actually there are quite a few people on this list who are familiar with the letters of the Yoruba alphabet, and they are also familiar with the encoding principles of Unicode. That is why they are saying, yes, we know "GB" is a letter in Yoruba, but it is encoded as U+0047 "G" + U+0042 "B". -Doug Ewell Fullerton, California http://users.adelphia.net/~dewell/

