Julian Reschke wrote:
Jacob Lund wrote:

Ok! Let me see if I can explain myself - I am not an expert on this so
please correct me if I am wrong!

An UTF-8 representation of one character consists of at combination of
characters. Now JAVA is a Unicode language and this means that one character


...of bytes.

can represent "any" type of character in the world!


Almost. Java's characters have only 16 bit, so there is a class of Unicode characters that need to be represented as a sequence of two Java characters.

Basically UTF-8 only makes sense when working on an "old" 7 bit asci system
and you need to use characters not available in the given codepage.


UTF-8 always makes sense when you need backward compatibilty with ASCII.

Both UTF-8 and UTF-16 uses a varying number of bytes to represent one
character, where Unicode always uses 32 bit characters (maybe it is 24 bit).


Unicode doesn't "represent" at all. Unicode is just a definition of code points.

*Encodings* represent Unicode characters as byte sequences, and UTF-8 and UTF-16 are some of the Unicode encoding.

> ...

Julian


Julian puts my points far more succinctly :-)
So, if this is enough, no need to read my rather lengthy email on some of the gory details of unicode.


Mike




--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to