On Mon, May 24, 2004 at 12:28:53PM +0900, Hatuka*nezumi - IKEDA Soji wrote: > RFC2047 says quoted-printable "is designed to allow text > containing mostly ASCII characters to be decipherable on an ASCII > terminal without decoding". In general, a UTF-8 text doesn't > contain "mostly ASCII".
I disagree there. In my opinion, in an ideal world, *everyone* would be sending text in UTF-8, regardless of language. Maybe one day we'll reach that point. Suppose you write English text and include a Japanese quotation within it, or vice versa? Both could be UTF-8. To me it doesn't make sense to force one encoding for UTF-8, nor to make the user choose (who almost certainly doesn't care). A simple algorithm here would be to encode using both, and see which comes out shorter. base64 makes the message size increase by around 1/3. quoted-printable makes non-ASCII characters increase by a factor of 3. By my reckoning, if more than 11.1% of the characters need quoted-printable encoding, then base64 is shorter. For other character sets like ISO-8859-1 or ISO-2022-JP, then it may make sense to hard-code the choice of encoding, because the preference is mostly for the benefit of backwards-compatibility with non-MIME-compliant mailers. > + By same reason, I worry some Latin-based MUAs would be able to > handle only quoted-printable text part. Spammers routinely base64-encode their mail to try and bypass filters, so I think most MUAs can handle it. And of course, they wouldn't be MIME compliant if they couldn't. > I think the best practice is to determin encoding method by > fixed flags (recommended by each charset) However I'm not convinced that the selection of UTF-8 necessarily makes any declaration at all about the language it encodes or the subset of characters which are likely to be used within it. Just my 2c. Brian.
