At 4:21 pm +0200 5/2/04, ALexander N. Treyner wrote:
Hi John, Your code works perfect. But I found one strange thing. For example I have next string:
hello ˜ÏÂÌ hello world!!!!
that converted by the mail client to
hello =?windows-1255?Q?=F9=EC=E5=ED_hello_world!!!!?=
After converting it by code you wrote into utf-8, the "_" is still present between second "hello" and "world".
Is it right behavior?
I'm not familiar with the way these headers are composed and was simply providing the code to do the x to utf-8 conversion. If the underline character has to be transliterated to a space then that's very simple. I don't know how an underline is written, then -- I presume they must use =5F. I don't know which RFC deals with these headers, otherwise I could tell you better.
It's probably dealt with in <http://www.ietf.org/rfc/rfc2047.txt> ....
Ah yes:
4.2. The "Q" encoding
The "Q" encoding is similar to the "Quoted-Printable" content- transfer-encoding defined in RFC 2045. It is designed to allow text containing mostly ASCII characters to be decipherable on an ASCII terminal without decoding.
(1) Any 8-bit value may be represented by a "=" followed by two hexadecimal digits. For example, if the character set in use were ISO-8859-1, the "=" character would thus be encoded as "=3D", and a SPACE by "=20". (Upper case should be used for hexadecimal digits "A" through "F".)
(2) The 8-bit hexadecimal value 20 (e.g., ISO-8859-1 SPACE) may be represented as "_" (underscore, ASCII 95.). (This character may not pass through some internetwork mail gateways, but its use will greatly enhance readability of "Q" encoded data with mail readers that do not support this encoding.) Note that the "_" always represents hexadecimal 20, even if the SPACE character occupies a different code position in the character set in use.
(3) 8-bit values which correspond to printable ASCII characters other than "=", "?", and "_" (underscore), MAY be represented as those characters. (But see section 5 for restrictions.) In particular, SPACE and TAB MUST NOT be represented as themselves within encoded words.
JD