A show of hands, please...  
   
 How many of you feel that we've arrived at the point of diminishing 
returns, in the area of trying to parse badly formed Asian-language 
messages?  
   
 So far, we've solved a lot of problems, and I think the Citadel system as 
a whole is better for it:  
   
 * We now accept "Bernstein encoded" messages (messages containing 
characters in the 0x7F through 0xFE range, even though we have not 
advertised the receiver as 8-bit clean)  
   
 * WebCit now encodes HTML transmittals in QP-encoding, preventing locally 
composed messages from getting mangled.  It also declares the desired 
charset for the form.  
   
 * MIME parts of type text/html whose charset is illegally declared inside 
the HTML instead of in the MIME headers are now displayed properly by 
WebCit.  
   
 * We even alias known-incorrect charset names when we can, i.e. 
MS950-->CP950.  
   
 At this point, any further work in this area is going to require a lot of 
effort.  For example, the most recent batch of "badly displayed" messages 
contain headers that use an undeclared character set, which is completely 
illegal.  There's no easy way to guess, either.  There is the idea of 
scanning headers for illegal characters, and if they're found, fishing 
into the MIME structure to try to make our best guess, but doing so would 
involve a gigantic refactoring of the display code.  
   
 So we're at a point where larger and larger amounts of work are going to 
be required in order to fix smaller and smaller problems.  Havving arrived 
at this point, I'm included to stop here and move on to addressing other 
users' concerns.  Thoughts on this?  

Reply via email to