On Thursday, March 27, 2008, 22:16:38, Peter Palmreuther wrote: V>> My advice would be: please DONT use Unicode. That uses two bytes to V>> show e.g. a letter "a" instead of one.
That's a very bad advice. We're not using 300 baud modems anymore, and a typical Received header added when the message passes a SMTP server will add more bytes to the message size than using UTF-8 instead of one of the legacy 8bit encodings. Also, UTF-8 needs just 1 byte to represent characters in the ASCII range (if the text is valid ASCII, it's also valid UTF-8). > Just to clarify this: Unicode != UTF-8! That's an important > difference!!! Actually, UTF-8 is just one of the representations used by Unicode. > UTF-8 is an *ENCODING* of these characters. In short: UTF-8 says which > position in the Unicode-table the character is at. UTF-8 uses 8 bit > for standard ISO-8859 characters and 16 bit for "special" characters > (like e.g. German umlauts). To be precise, UTF-8 uses 1-4 *bytes* to encode Unicode characters. How many bytes it uses depends on the character. All ASCII characters are represented as themselves, and all Latin characters with diacritics (plus some non-Latin alphabets) fit into two bytes. V>> The world has chosen it because of laziness. Instead of sending the V>> charset and then a charcodes, people send a lot of byte 0 nowadays. > *ONLY* when using e.g. UTF-16 (or UCS-2) as character *encoding*. Most e-mail gateways can't handle the the NUL character anyway, thus making UTF-16 unusable for e-mails (unless it's BASE64 encoded). -- < Jernej Simončič ><><><><>< http://eternallybored.org/ > [The Bat! v4.0.18.6 on Windows XP Professional x64 Edition 5.2.3790.Service Pack 2] Fat expands to fill any apparel worn. -- Stanley's Laws of Fat ________________________________________________________ Current beta is 4.0.18.6 | 'Using TBBETA' information: http://www.silverstones.com/thebat/TBUDLInfo.html

