Re: [sqwebmail] Re: content encoding etc. (updated)

Brian Candler Mon, 24 May 2004 02:28:18 -0700

On Mon, May 24, 2004 at 12:28:53PM +0900, Hatuka*nezumi - IKEDA Soji wrote:
> RFC2047 says quoted-printable "is designed to allow text 
> containing mostly ASCII characters to be decipherable on an ASCII 
> terminal without decoding".  In general, a UTF-8 text doesn't 
> contain "mostly ASCII".


I disagree there. In my opinion, in an ideal world, *everyone* would be
sending text in UTF-8, regardless of language. Maybe one day we'll reach
that point.

Suppose you write English text and include a Japanese quotation within it,
or vice versa? Both could be UTF-8. To me it doesn't make sense to force one
encoding for UTF-8, nor to make the user choose (who almost certainly
doesn't care).

A simple algorithm here would be to encode using both, and see which comes
out shorter.

base64 makes the message size increase by around 1/3. quoted-printable makes
non-ASCII characters increase by a factor of 3.

By my reckoning, if more than 11.1% of the characters need quoted-printable
encoding, then base64 is shorter.

For other character sets like ISO-8859-1 or ISO-2022-JP, then it may make
sense to hard-code the choice of encoding, because the preference is mostly
for the benefit of backwards-compatibility with non-MIME-compliant mailers.

> + By same reason, I worry some Latin-based MUAs would be able to 
>   handle only quoted-printable text part.

Spammers routinely base64-encode their mail to try and bypass filters, so I
think most MUAs can handle it. And of course, they wouldn't be MIME
compliant if they couldn't.

> I think the best practice is to determin encoding method by 
> fixed flags (recommended by each charset)

However I'm not convinced that the selection of UTF-8 necessarily makes any
declaration at all about the language it encodes or the subset of characters
which are likely to be used within it.

Just my 2c.

Brian.

Re: [sqwebmail] Re: content encoding etc. (updated)

Reply via email to