Hi,
From: "Jason R. Mastaler"
Subject: Re: handling internationalized headers
Date: Wed, 16 Oct 2002 10:43:26 -0600
> [EMAIL PROTECTED] writes:
>
> > I don't suppose it's possible for the code to guess whether a header
> > should be encoded or not...IIUC, no headers should contain any 8-bit
> > values so if a header does, I presume it should undergo appropriate
> > MIME encoding.
>
> No, because it's completely possible to have a sequence of 7-bit-only
> bytes in a charset that has nothing to do with US-ASCII. Guessing
> that there are no 8-bit bytes in that header and not including the
> MIME encoding if so will discard the charset information.
Hmmm, perhaps this is correct in general if you don't know what
language(s) you are dealing w/.
> ISO-2022-JP is a perfect example. It's a completely 7-bit encoding,
> but the characters in it have nothing to do with US-ASCII (other than
> the 'escape' characters).
If you know you are dealing w/ Japanese, ISO-2022-JP is trivial to
detect because of the escape sequence.
[ One of the reasons I used ISO-2022-JP instead of EUC-JP on web pages
for a while was precisely because it was the one which could be
detected unambiguously in a Japanese environment. ]
I guess questions in my mind are:
1) Is it likely that multiple non-ASCII character sets will appear
in headers for a single message?
2) If not, is knowing the language enough to perform detection to
determine whether to encode a header value (or a portion of a
header value)?
What I'm not too excited about is for the user to have to focus on
which headers need to be mime-encoded and which headers don't --
spelling this info out explicitly and wiring it in some configuration
file doesn't excite me.
Practically speaking, perhaps the only two headers that are likely to
need encoding are From and Subject so may be this is not really much
of an issue anyway.
As a side note, I don't imagine it'll be much of an issue from a
practical standpoint because or rarity, but I have seen multiple
charsets used in a single header value - e.g. when From or To has
multiple addresses.
Anyway, I suppose no one is forcing me to use this mechanism so I
think I'll go quiet on this now (-;
P.S. FWIW, a quick grep of my mail gives the following headers as also
having used non-ASCII (mime-encoded of course):
To (though in this case, I imagine it's already appropriately encoded)
X-cite-me
X-Weather
Content-Disposition (in the filename portion)
Thread-Topic
Organization
_________________________________________________
tmda-workers mailing list ([EMAIL PROTECTED])
http://tmda.net/lists/listinfo/tmda-workers