Have you checked for a byte order marker in the source document? (see http://unicode.org/faq/utf_bom.html#BOM )


On Jun 23, 2009, at 6:42 PM, Kartikaya Gupta wrote:

There's a page (http://www.microsoft.com/windowsmobile/mobile/en-us/totalaccess/software/software/eula-sw-netflix.mspx specifically) that has a Content-Type header of "text/html; charset=utf-16" and has no BOM. The references I've seen (RFC2781, as well as http://unicode.org/faq/utf_bom.html#gen7) say that this means the content should be assumed to be UTF-16BE. The page, however, is actually in UTF-16LE.

All browsers seem to do some sort of unspecified magic and figure out that the page is in LE. I was wondering if that magic could be described and added to the HTML5 spec so that it covers rendering the above page as expected. According to the draft spec as it stands, I believe that page should be rendered as garbage.


PS - the page also has a meta tag that says the charset is iso-8859-1. *sigh*

Reply via email to