On Thursday, March 27, 2008, 22:16:38, Peter Palmreuther wrote:

V>> My  advice  would  be: please DONT use Unicode. That uses two bytes to
V>> show e.g. a letter "a" instead of one.

That's a very bad advice. We're not using 300 baud modems anymore, and
a typical Received header added when the message passes a SMTP server
will add more bytes to the message size than using UTF-8 instead of
one of the legacy 8bit encodings. Also, UTF-8 needs just 1 byte to
represent characters in the ASCII range (if the text is valid ASCII,
it's also valid UTF-8).

> Just to clarify this: Unicode != UTF-8! That's an important
> difference!!!

Actually, UTF-8 is just one of the representations used by Unicode.

> UTF-8 is an *ENCODING* of these characters. In short: UTF-8 says which
> position in the Unicode-table the character is at. UTF-8 uses 8 bit
> for standard ISO-8859 characters and 16 bit for "special" characters
> (like e.g. German umlauts).

To be precise, UTF-8 uses 1-4 *bytes* to encode Unicode characters.
How many bytes it uses depends on the character. All ASCII characters
are represented as themselves, and all Latin characters with
diacritics (plus some non-Latin alphabets) fit into two bytes.

V>> The world has chosen it because of laziness. Instead of sending the
V>> charset and then a charcodes, people send a lot of byte 0 nowadays.

> *ONLY* when using e.g. UTF-16 (or UCS-2) as character *encoding*.

Most e-mail gateways can't handle the the NUL character anyway, thus
making UTF-16 unusable for e-mails (unless it's BASE64 encoded).

-- 
< Jernej Simončič ><><><><>< http://eternallybored.org/ >

[The Bat! v4.0.18.6 on Windows XP Professional x64 Edition 5.2.3790.Service 
Pack 2]

Fat expands to fill any apparel worn.
       -- Stanley's Laws of Fat


________________________________________________________
 Current beta is 4.0.18.6 | 'Using TBBETA' information:
http://www.silverstones.com/thebat/TBUDLInfo.html

Reply via email to