Thanks Doug. all contributions are appreciated. Regards
Dele ----- Original Message ----- From: "Doug Ewell" <[EMAIL PROTECTED]> To: "Unicode Mailing List" <[EMAIL PROTECTED]> Cc: "African Oracle" <[EMAIL PROTECTED]>; "Michael Everson" <[EMAIL PROTECTED]> Sent: Monday, May 03, 2004 5:11 PM Subject: Re: Nice to join this forum.... > Dele a.k.a. "African Oracle" <oracle at africaservice dot com> wrote: > > > GB is a different from G+B You do not pronunce the letters separately > > but people that do not know anything about the language do which is > > wrong. It is about correction and proper representation. > > What Michael and others have been trying to say is this: > > Unicode encodes characters, not languages. The word "character" means > different things to ordinary people, depending on what language they > speak and what script they write. "Characters" in Unicode do not always > correspond 1-to-1 with "letters" in a given language's alphabet. > > Here are some quick and dirty definitions for our purposes: > > Character: the basic unit of text encoding. > Letter: the basic unit of a language's orthography. Not necessarily the > same as "character." > Glyph: the visual representation of a character. Also not necessarily > the same as "character." > > In Spanish, the combination "ch" is considered a distinct letter of the > alphabet. It has its own name, "che." Children learn it as a letter > that comes between "c" and "d". This is all good, but when it comes to > representing text in computers, there is no separate "ch" letter in any > of the encodings that people have used for decades. Spanish text > includes the two characters "c" and "h". This has been true for > decades, and it is also true when using Unicode. > > Likewise, in Yoruba, if there is no visual distinction between (1) the > letter "GB" and (2) the two letters "G" and "B" that happen to appear > together, as in your example, then the letter "GB" is encoded with the > two characters "G" and "B". This does not deny the existence of a > letter "GB" in the Yoruba language, it just dictates how that letter is > encoded in computerized text. > > Now if you need to perform some other type of text processing, such as > searching or sorting or spell-checking or line-breaking, then your > software may need to understand the difference between the letter "GB" > and the two letters "G" + "B". But this needs to be handled by the > software, not the character encoding mechanism. > > > Here are few Yoruba alphabets which might not be new to you, so how > > can you equate G+B with GB even if you claimed it has significant. How > > significant is significant? > > > > A B D E E F G GB.... > > Actually there are quite a few people on this list who are familiar with > the letters of the Yoruba alphabet, and they are also familiar with the > encoding principles of Unicode. That is why they are saying, yes, we > know "GB" is a letter in Yoruba, but it is encoded as U+0047 "G" + > U+0042 "B". > > -Doug Ewell > Fullerton, California > http://users.adelphia.net/~dewell/ > > >

