Karsten Bräckelmann wrote:
>
>>> Maybe the devs can briefly explain how the charset is being determined.
>>> Or at least, where exactly in the code one could find it...
>>>       
>
> Matt, also, I got a feeling, that logic is what the OP is actually
> about. He does not want to leave out what he wants to be scored on. But
> (positively) define it.
>   

That much is easy. It's done by looking at various character-set tags or
encoding marks in the message.  These explicitly specify which character
set to use when interpreting the text.

Re-quoting myself from 11/26 (and elaborating with more examples):

CHARSET_FARAWAY:
Underlying eval function: check_for_faraway_charset() in MIMEEval.pm
Detects based on: character set in the mime Content-Type: of the message
header.

Example (in a message header): 
Content-Type: text/plain;
        charset="iso-2022-jp"

which specifies Japanese text for a single-part message.

MIME_CHARSET_FARAWAY
Underlying eval function: check_for_mime('mime_faraway_charset') in
MIMEEval.pm
Detects based on: character set in the mime Content-Type: of the message
attachments

Example (in a mime-section header): 
Content-Type: text/plain;
        charset="iso-2022-jp"

which specifies Japanese text for this part of a multi-part message.



HTML_CHARSET_FARAWAY
Underlying eval function: html_charset_faraway() in HTMLEval.pm
Detects based on: character set in the Content-Type: of a meta
http-equiv tag embedded in HTML.

Example:
<META http-equiv=Content-Type content="text/html; charset=iso-2022-jp">

which specifies Japanese text for this html document.


CHARSET_FARAWAY_HEADER
check_for_faraway_charset_in_headers()
Detects based on: Embedded charachter encoding marks in the Subject and
From: headers. You'd have to look at the raw message source to see it,
but it's generally things like this somewhere in the header:

=?GB2312?

Which indicates encoded simplified Chinese text follows.


Reply via email to