Thanks Doug. all contributions are appreciated.

Regards

Dele

----- Original Message ----- 
From: "Doug Ewell" <[EMAIL PROTECTED]>
To: "Unicode Mailing List" <[EMAIL PROTECTED]>
Cc: "African Oracle" <[EMAIL PROTECTED]>; "Michael Everson"
<[EMAIL PROTECTED]>
Sent: Monday, May 03, 2004 5:11 PM
Subject: Re: Nice to join this forum....


> Dele a.k.a. "African Oracle" <oracle at africaservice dot com> wrote:
>
> > GB is a different from G+B You do not pronunce the letters separately
> > but people that do not know anything about the language do which is
> > wrong. It is about correction and proper representation.
>
> What Michael and others have been trying to say is this:
>
> Unicode encodes characters, not languages.  The word "character" means
> different things to ordinary people, depending on what language they
> speak and what script they write.  "Characters" in Unicode do not always
> correspond 1-to-1 with "letters" in a given language's alphabet.
>
> Here are some quick and dirty definitions for our purposes:
>
> Character:  the basic unit of text encoding.
> Letter: the basic unit of a language's orthography.  Not necessarily the
> same as "character."
> Glyph:  the visual representation of a character.  Also not necessarily
> the same as "character."
>
> In Spanish, the combination "ch" is considered a distinct letter of the
> alphabet.  It has its own name, "che."  Children learn it as a letter
> that comes between "c" and "d".  This is all good, but when it comes to
> representing text in computers, there is no separate "ch" letter in any
> of the encodings that people have used for decades.  Spanish text
> includes the two characters "c" and "h".  This has been true for
> decades, and it is also true when using Unicode.
>
> Likewise, in Yoruba, if there is no visual distinction between (1) the
> letter "GB" and (2) the two letters "G" and "B" that happen to appear
> together, as in your example, then the letter "GB" is encoded with the
> two characters "G" and "B".  This does not deny the existence of a
> letter "GB" in the Yoruba language, it just dictates how that letter is
> encoded in computerized text.
>
> Now if you need to perform some other type of text processing, such as
> searching or sorting or spell-checking or line-breaking, then your
> software may need to understand the difference between the letter "GB"
> and the two letters "G" + "B".  But this needs to be handled by the
> software, not the character encoding mechanism.
>
> > Here are few Yoruba alphabets which might not be new to you, so how
> > can you equate G+B with GB even if you claimed it has significant. How
> > significant is significant?
> >
> > A B D E E F G GB....
>
> Actually there are quite a few people on this list who are familiar with
> the letters of the Yoruba alphabet, and they are also familiar with the
> encoding principles of Unicode.  That is why they are saying, yes, we
> know "GB" is a letter in Yoruba, but it is encoded as U+0047 "G" +
> U+0042 "B".
>
> -Doug Ewell
>  Fullerton, California
>  http://users.adelphia.net/~dewell/
>
>
>




Reply via email to