http://dev.w3.org/2006/webapi/XMLHttpRequest-2/#document-response-entity-body says: "If final MIME type is text/html let document be Document object that represents the response entity body parsed following the rules set forth in the HTML specification for an HTML parser with scripting disabled. [HTML]"
Since there's presumably no legacy content using XHR to read responseXML for text/html (and expecting HTML parsing) and since (in Gecko at least) responseText for non-XML tries HTTP charset and falls back on UTF-8, it seems it doesn't make sense to implement full-blown legacy charset craziness for text/html in XHR. Specifically, it seems that it makes sense to skip heuristic detection and to use UTF-8 (as opposed to Windows-1252 or a locale-dependent value) as the fallback encoding if there's neither <meta> nor HTTP charset, since UTF-8 is the pre-existing fallback for responseText and responseText is already used with text/html. As it stands, the XHR2 spec defers to a part of HTML that has legacy-oriented optional features. It seems that it makes sense to clamp down those options for XHR. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/