Have you checked for a byte order marker in the source document? (see http://unicode.org/faq/utf_bom.html#BOM
)
--Oliver
On Jun 23, 2009, at 6:42 PM, Kartikaya Gupta wrote:
There's a page (http://www.microsoft.com/windowsmobile/mobile/en-us/totalaccess/software/software/eula-sw-netflix.mspx
specifically) that has a Content-Type header of "text/html;
charset=utf-16" and has no BOM. The references I've seen (RFC2781,
as well as http://unicode.org/faq/utf_bom.html#gen7) say that this
means the content should be assumed to be UTF-16BE. The page,
however, is actually in UTF-16LE.
All browsers seem to do some sort of unspecified magic and figure
out that the page is in LE. I was wondering if that magic could be
described and added to the HTML5 spec so that it covers rendering
the above page as expected. According to the draft spec as it
stands, I believe that page should be rendered as garbage.
Cheers,
kats
PS - the page also has a meta tag that says the charset is
iso-8859-1. *sigh*