On Thu, 17 Jun 2004 11:49:46 -0700 (PDT), Mark Crispin
<[EMAIL PROTECTED]> wrote:
On Thu, 17 Jun 2004, Shawn Walker wrote:
I'm trying to use the utf8_mime2text() to convert a UTF-8 string to
text.
The sample string that I'm trying to convert is:
=?UTF-8?Q?Us=C3=A9r T=C3=A9st?=
That is not a proper MIME quoted-word, and consequent utf8_mime2text()
declines to deal with it.
What is the correct method of decoding the strings?
That string is properly MIME decoded as the text
=?UTF-8?Q?Us=C3=A9r T=C3=A9st?=
A proper MIME quoted-word, such as
=?UTF-8?Q?Us=C3=A9r_T=C3=A9st?=
would be decoded into a string with 8-bit characters.
The syntax for MIME quoted words was very carefully selected so that
there would be no mistaken decodings. It is necessary to consider the
effects of RFC 2822 line wrapping with spaces. Consequently, spaces are
forbidden within quoted words.
I understand the argument of "why aren't you more forgiving in an
obvious case such as this?". The answer is that you should take a look
at Outlook. Many of the exploits in Outlook involve attacking Outlook's
willingness to "just work" in "obvious cases." Without strict rules to
follow, software is deprived of clear guidelines to follow that will
enable it to reject absurd cases. To make matters worse, the code paths
to forgive such "obvious forgivable cases" is rarely exercised; most
data complies with the rules. This creates a fertile breeding ground
for bugs.
I finally figured out that quoted string is invalid. If it's invalid, I
say "tough", blame it on the person that formed a invalid quoted string.