=?ISO-8859-1?Q?Re:_Encode::MIME::Header_my_2=A2?=

Dan Kogai Sun, 06 Oct 2002 19:52:04 -0700

On Monday, Oct 7, 2002, at 06:14 Asia/Tokyo, Nick Ing-Simmons wrote:
> I have re-started work on Unicode aware perl/Tk - and I am playing
> with it in "tkmail" (as a test app). Obviously Encode::MIME is just
> the thing for a mail tool.
>
> However the encode ops are not ideal:


I know it is not and one of the reasons it is not is that it has to 
follow Encode API.  MIME Header encoding in its essence is double 
encoding so it lacks minute controls that you may want.

> If you encode('MIME-Header',...) it _seems_ to always use the 'B' form
> maybe my tests are not extensive enough.

That one is documented already;

perldoc Encode::MIME::Header
> ABSTRACT
>        This module implements RFC 2047 Mime Header Encoding.
>        There are 3 variant encoding names; "MIME-Header",
>        "MIME-B" and "MIME-Q".  The difference is described below
>
>                      decode()          encode()
>          ----------------------------------------------
>          MIME-Header Both B and Q      =?UTF-8?B?....?=
>          MIME-B      B only; Q croaks  =?UTF-8?B?....?=
>          MIME-Q      Q only; B croaks  =?UTF-8?Q?....?=

The problem is, if you need more minute controls (en|de)code needs more 
arguments for that but that will make ordinary (en|de)coding too hard.

> If I encode('MIME-Q',...) (as currently) then I seem to get all the
> ' ' inside the =?UTF-8?Q?...?= and so they become =20 it also seems
> to wrap all the ASCII parts too. While this is not wrong, it makes
> things less readable for mail clients which don't understand and so
> leave the markup for user to see.

That one I am not sure.  I got mails of the opposite opinions asking 
for strict RFC 2047 compliance (in Jcode), especially when line folding 
was concerned.  So I made Encode::MIME::Header RFC 2047 compliant.  But 
I agree that =20 instead of '_' maybe too much.  Nevertheless, =20 is 
exactly what RFC 2047 recommends;

RFC 2047
>  As a consequence, unencoded white space
>    characters (such as SPACE and HTAB) are FORBIDDEN within an
>    'encoded-word'.  For example, the character sequence
>
>       =?iso-8859-1?q?this is some text?=
>
>    would be parsed as four 'atom's, rather than as a single 'atom' (by
>    an RFC 822 parser) or 'encoded-word' (by a parser which understands
>    'encoded-words').  The correct way to encode the string "this is 
> some
>    text" is to encode the SPACE characters as well, e.g.
>
>       =?iso-8859-1?q?this=20is=20some=20text?=

And more on "Q" Encoding

> 4.2. The "Q" encoding
>
>    The "Q" encoding is similar to the "Quoted-Printable" content-
>    transfer-encoding defined in RFC 2045.  It is designed to allow text
>    containing mostly ASCII characters to be decipherable on an ASCII
>    terminal without decoding.
>
>    (1) Any 8-bit value may be represented by a "=" followed by two
>        hexadecimal digits.  For example, if the character set in use
>        were ISO-8859-1, the "=" character would thus be encoded as
>        "=3D", and a SPACE by "=20".  (Upper case should be used for
>        hexadecimal digits "A" through "F".)
>
>    (2) The 8-bit hexadecimal value 20 (e.g., ISO-8859-1 SPACE) may be
>        represented as "_" (underscore, ASCII 95.).  (This character may
>        not pass through some internetwork mail gateways, but its use
>        will greatly enhance readability of "Q" encoded data with mail
>        readers that do not support this encoding.)  Note that the "_"
>        always represents hexadecimal 20, even if the SPACE character
>        occupies a different code position in the character set in use.
>
>    (3) 8-bit values which correspond to printable ASCII characters 
> other
>        than "=", "?", and "_" (underscore), MAY be represented as those
>        characters.  (But see section 5 for restrictions.)  In
>        particular, SPACE and TAB MUST NOT be represented as themselves
>        within encoded words.

With this understood,

> Suggestions:
>  - leave ASCII or even iso-8859-1 sequences as such

Only ASCII printable was allowed so I have to decline this one.  
'MIME-Q' is already implemented that way.  Bottom line is that I do not 
want to give up RFC 2047 conformance.

>  - wrap sequences of ch > 0xff in qhichever of 'Q' or 'B' is shorter
>    (do both encodings and throw one away).

I'll consider this one instead.  This one at least does not breach RFC 
2047.

> Are patches in that direction likely to be accepted or do I build
> a MIME-Smart on top ?

As I said, Encode::MIME::Header has those restrictions;

* the Encode API
* RFC 2047

This is very restrictive considering the nature of MIME Header 
Encoding.  Surprisingly the name space Encode::MIME itself remains 
empty and maybe we can make use of it....

Dan the Encode Maintainer

=?ISO-8859-1?Q?Re:_Encode::MIME::Header_my_2=A2?=

Reply via email to