Re: [whatwg] Character encoding of document.open()ed documents

And Clover Wed, 31 Mar 2010 20:26:54 -0700

Henri Sivonen wrote:

Spec change request: Please change the spec to say that document.open()
sets the document's character encoding to UTF-8

+1. UTF-16 is a troublesome encoding for [X]HTML[5] documents and shouldbe consistently discouraged; as a ASCII-non-superset it interacts verypoorly with byte interfaces in HTTP and form submissions.

No browser will actually try to submit a form as UTF-16 for this reason,but it still causes problems. eg. Firefox misleadingly sets the`_charset_` hack field to 'UTF-16' even though the submission isUTF-8-encoded.

even though the parser operates on UTF-16 DOMStrings.

The term 'UTF-16' can mean two very different things: either a sequenceof 16-bit code units (as in DOMString), or a sequence of bytes which,taken as UTF-16LE or UTF-16BE, represent 16-code units. Unicode'stradition of conflating the meanings of the code unit sequence and thebyte sequence has caused much confusion.

DOM Level 3 LS made the mistake of saying that because DOMStrings areUTF-16-code-units, XML documents parsed from`LSInput.characterStream`/`StringData` should receive the `encoding`'UTF-16', as if the parser has done a conversion from UTF-16-bytes tocharacters, though no such process has actually taken place.Consequently when you serialise a document parsed from a string in DOMLevel 3 LS you get an unexpected and unwanted UTF-16 document.


--
And Clover
mailto:[email protected]
http://www.doxdesk.com/

Re: [whatwg] Character encoding of document.open()ed documents

Reply via email to