On Mon, May 24, 2004 at 12:28:53PM +0900, Hatuka*nezumi - IKEDA Soji wrote:
> RFC2047 says quoted-printable "is designed to allow text 
> containing mostly ASCII characters to be decipherable on an ASCII 
> terminal without decoding".  In general, a UTF-8 text doesn't 
> contain "mostly ASCII".

I disagree there. In my opinion, in an ideal world, *everyone* would be
sending text in UTF-8, regardless of language. Maybe one day we'll reach
that point.

Suppose you write English text and include a Japanese quotation within it,
or vice versa? Both could be UTF-8. To me it doesn't make sense to force one
encoding for UTF-8, nor to make the user choose (who almost certainly
doesn't care).

A simple algorithm here would be to encode using both, and see which comes
out shorter.

base64 makes the message size increase by around 1/3. quoted-printable makes
non-ASCII characters increase by a factor of 3.

By my reckoning, if more than 11.1% of the characters need quoted-printable
encoding, then base64 is shorter.

For other character sets like ISO-8859-1 or ISO-2022-JP, then it may make
sense to hard-code the choice of encoding, because the preference is mostly
for the benefit of backwards-compatibility with non-MIME-compliant mailers.

> + By same reason, I worry some Latin-based MUAs would be able to 
>   handle only quoted-printable text part.

Spammers routinely base64-encode their mail to try and bypass filters, so I
think most MUAs can handle it. And of course, they wouldn't be MIME
compliant if they couldn't.

> I think the best practice is to determin encoding method by 
> fixed flags (recommended by each charset)

However I'm not convinced that the selection of UTF-8 necessarily makes any
declaration at all about the language it encodes or the subset of characters
which are likely to be used within it.

Just my 2c.

Brian.

Reply via email to