A show of hands, please...
How many of you feel that we've arrived at the point of diminishing
returns, in the area of trying to parse badly formed Asian-language
messages?
So far, we've solved a lot of problems, and I think the Citadel system as
a whole is better for it:
* We now accept "Bernstein encoded" messages (messages containing
characters in the 0x7F through 0xFE range, even though we have not
advertised the receiver as 8-bit clean)
* WebCit now encodes HTML transmittals in QP-encoding, preventing locally
composed messages from getting mangled. It also declares the desired
charset for the form.
* MIME parts of type text/html whose charset is illegally declared inside
the HTML instead of in the MIME headers are now displayed properly by
WebCit.
* We even alias known-incorrect charset names when we can, i.e.
MS950-->CP950.
At this point, any further work in this area is going to require a lot of
effort. For example, the most recent batch of "badly displayed" messages
contain headers that use an undeclared character set, which is completely
illegal. There's no easy way to guess, either. There is the idea of
scanning headers for illegal characters, and if they're found, fishing
into the MIME structure to try to make our best guess, but doing so would
involve a gigantic refactoring of the display code.
So we're at a point where larger and larger amounts of work are going to
be required in order to fix smaller and smaller problems. Havving arrived
at this point, I'm included to stop here and move on to addressing other
users' concerns. Thoughts on this?