Kiyokazu SUTO writes:
> I don't think that SqWebMail can handle ISO-2022-JP (character
> enchoding scheme (CES) used in Japanese e-mail) even if it would
> include any mapping table between coded character sets (CCSs) in the
> scheme and Unicode.
>
> ISO-2022-JP is 7bit CES (i.e., it uses one or two octets in the range
> 0x21..0x7E to represnt a character), and switches 4 CCSs (US-ASCII,
> JIS X 0201 Roman, JIS C 6226, and JIS X 0208) by following escape
> sequences:
>
> US-ASCII : 0x1B 0x28 0x42
> JIS X 0201 Roman: 0x1B 0x28 0x4A
> JIS C 6226 : 0x1B 0x24 0x40
> JIS X 0208 : 0x1B 0x24 0x42
>
> On the other hand, SqWebMail assumes that the range 0x21..0x7E is only
> used by US-ASCII, and that any Non US-ASCII character is represented
> by one or more octets in the range 0x80..0xFF.
No, not really. SqWebMail's only assumption is that a character set can be
mapped to or from unicode. Non US-ASCII charsets can generally use
0x21..0x7E, except for the HTML defanging issue, which I'll mention shortly.
Someone else mailed me some links to look over. It appears that the major
stumbling block is that currently the unicode mapper does not carry over
stateful information between successive mappings to/from unicode. SqWebMail
first maps the message's text/plain content to Unicode, according to its
MIME charset, then from Unicode to the browser client's MIME charset. To do
this correctly with iso-2022-jp it is necessary to keep track of the current
character set being encoded in iso-2022-jp, and currently there is no state
information carried across successive calls to the unicode functions.
The other potential issue is text/html content encoded in iso-2022-jp. The
jis-x-0208 octets are in the lower US-ASCII range and they definitely
overlap with the HTML markup tags, since they use the < > (and & and other)
octets. I suppose that text/html iso-2022-jp always shifts back to US-ASCII
before introducing each < > markup tag. Even with that, this is going to
cause problems for SqWebMail's HTML defanger, which eats HTML markup tags in
their raw form.
--
Sam