Hello,

Such a multi-charset feature would be very difficult for user agents to 
support.  I work on Internet Explorer's "charset" implementation.  Simply put, 
in the implementation of the "charset" META tag, data is (1) placed in a 
buffer, (2) scanned for a charset META tag in the head, (3) decoded based on 
that charset, (4) tokenized by a lexer and (5) the tokens are parsed into some 
internal representation.  While user agent implementations vary and I've 
greatly simplified things, it's still useful to think of the processing in 
these terms.

The difficulty of this suggestion is that the user agent wouldn't be able to 
identify and handle charset attributes until [I] during the parsing phase (5) 
or [II] by adding an early scan that violates the above phasing.  If discovered 
during parsing, the appropriate portion of the file would have to be reset back 
to phase (3), decoded again, retokenized and reparsed.  On the other hand, the 
user agent could attempt to find such attributes at a much earlier phase.  
There are many encodings that have problematic characteristics such as 
characters from the ASCII range (used for elements, attributes, etc.) not being 
encoded as ASCII characters in the stream or shift states that change the 
interpretation of bytes.  As a result, these early scans are very difficult and 
expense (in terms of performance) to implement early.  For either approach, 
this kind of reinterpretation of data lends itself to security attacks (I could 
hide script in a different encoding from the rest of the file), makes XSS 
filtering much more difficult, leads to bad performance issues and ultimately 
inconsistent implementation across the different user agents.

I'd recommend against such a multi-charset feature.  Servers that compose files 
together need make their encoding consistent in the rendered composite file.  
The same holds true for composition which occurs on the client.  Thanks,

Harley Rosnow
Internet Explorer Development
Microsoft Corporation

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of hrhrhr hahaha
Sent: Wednesday, April 09, 2008 7:42 AM
To: [email protected]
Subject: charset


Hi,

IF a page has its charset set in the head section, via meta tag, and elsewhere, 
within the same page is another charset used, aside from wrapping it in an a 
tag/element, what would you use? Maybe, alongside lang (and xml:lang) the 
charset 'could' be added to span? Even using utf-8, there could be a charset 
used NOT in or recognised by utf, that could be 'added' to the page via inline 
tag/element?
_________________________________________________________________
More immediate than e-mail? Get instant access with Windows Live Messenger.
http://www.windowslive.com/messenger/overview.html?ocid=TXT_TAGLM_WL_Refresh_instantaccess_042008



Reply via email to