Henri Sivonen wrote:
Spec change request: Please change the spec to say that document.open() sets the document's character encoding to UTF-8
+1. UTF-16 is a troublesome encoding for [X]HTML[5] documents and should be consistently discouraged; as a ASCII-non-superset it interacts very poorly with byte interfaces in HTTP and form submissions.
No browser will actually try to submit a form as UTF-16 for this reason, but it still causes problems. eg. Firefox misleadingly sets the `_charset_` hack field to 'UTF-16' even though the submission is UTF-8-encoded.
even though the parser operates on UTF-16 DOMStrings.
The term 'UTF-16' can mean two very different things: either a sequence of 16-bit code units (as in DOMString), or a sequence of bytes which, taken as UTF-16LE or UTF-16BE, represent 16-code units. Unicode's tradition of conflating the meanings of the code unit sequence and the byte sequence has caused much confusion.
DOM Level 3 LS made the mistake of saying that because DOMStrings are UTF-16-code-units, XML documents parsed from `LSInput.characterStream`/`StringData` should receive the `encoding` 'UTF-16', as if the parser has done a conversion from UTF-16-bytes to characters, though no such process has actually taken place. Consequently when you serialise a document parsed from a string in DOM Level 3 LS you get an unexpected and unwanted UTF-16 document.
-- And Clover mailto:[email protected] http://www.doxdesk.com/
