Michael Sierchio wrote:

> "Rene G. Eberhard (keyon)" wrote:
>
> > ...Unicode for example is suppored by
> > Universal and UTF8.
>
> I also meant to point out that UTF-8 supports ASCII, but not EBCDIC, for
> example (not that I imagine that anyone would want to use the latter...;-)

Well, we're getting out of the subject of the list, but after all everyone
should have some clear ideas about internationalization.

>2)      UTF-8 supports the small subset of Unicode encodings
>        that have 8 bit characters

No. Every unicode character can be represented in UTF-8. UTF-8 is a
transformation format for 16-bits unicode. A normal programm will choke on
16-bit unicode strings. Therefore it is transformed into UTF-8, so that it
looks like a quite normal string with 8 bit caracters, and retransformed at
the other end into Unicode. This should be handled transparently for the
programm in the middle who does not need to know exactly what the string
represents. In fact you can even represent some of the characters of 32-bits
unicode, unavailable in 16-bits unicode, in UTF-8 .

UTF-8 is becoming the standardized way of transporting internationalized
strings, instead of using many different encodings, each specific to one or
some countries.

> I also meant to point out that UTF-8 supports ASCII, but not EBCDIC, for
> example (not that I imagine that anyone would want to use the latter...;-)

The way you present things seems to me confusing. UTF-8 "supports" everything,
but there is a "direct" conversion for ASCII.

In UTF-8, the characters that don't have the 8th bit rised directly represent
their ASCII equivalent.
So if you have a pure ASCII string (no eight bit character), it is also a
valid UTF-8 string that represents the same characters.

If the 8th bit is raised, then several characters must be composed to form one
unicode character.
The "good" point about utf-8 is that there is a simple rule to know the number
of characters that must be composed, when you have the value of the first one,
and if you jump inside the text, you can resynchronize at the beginnning of
the next character.

If you need to transfer EBCDIC using UTF-8, you need to convert it to unicode,
then convert the unicode to UTF-8. You will do the opposite at the arrival.

______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       [EMAIL PROTECTED]
Automated List Manager                           [EMAIL PROTECTED]

Reply via email to