On Mon, Sep 26, 2011 at 12:46 PM, Jonas Sicking <[email protected]> wrote: > On Fri, Sep 23, 2011 at 1:26 AM, Henri Sivonen <[email protected]> wrote: >> On Thu, Sep 22, 2011 at 9:54 PM, Jonas Sicking <[email protected]> wrote: >>> I agree that there are no legacy requirements on XHR here, however I >>> don't think that that is the only thing that we should look at. We >>> should also look at what makes the feature the most useful. A extreme >>> counter-example would be that we could let XHR refuse to parse any >>> HTML page that didn't pass a validator. While this wouldn't break any >>> existing content, it would make HTML-in-XHR significantly less useful. >> >> Applying all the legacy text/html craziness to XHR could break current >> use of XHR to retrieve responseText of text/html resources (assuming >> that we want responseText for text/html work like responseText for XML >> in the sense that the same character encoding is used for responseText >> and responseXML). > > This doesn't seem to only be a problem when using "crazy" parts of > text/html charset detection. Simply looking for <meta charset> in the > first 1024 characters will change behavior and could cause page > breakage. > > Or am I missing something?
Yes: WebKit already performs the <meta> prescan for text/html when retrieving responseText via XHR even though it doesn't support full HTML parsing in XHR (so responseXML is still null). http://hsivonen.iki.fi/test/moz/xhr/charset-xhr.html Thus, apps broken by the meta prescan would already be broken in WebKit (unless, of course, they browser sniff in a very strange way). And apps that wouldn't be OK with using UTF-8 as the fallback encoding when there's no HTTP-level charset, no BOM and no <meta> in the first 1024 bytes would already by broken in Gecko. >> Applying all the legacy text/html craziness to XHR would make data >> loading in programs fail in subtle and hard-to-debug ways depending on >> the browser localization and user settings. At least when loading into >> a browsing context, there's visual feedback of character misdecoding >> and the feedback can be attributed back to a given file. If >> setting-dependent misdecoding happens in the XHR data loading >> machinery of an app, it's much harder to figure out what part of the >> system the problem should be attributed to. > > Could you provide more detail here. How are you imagining this data > being used such that it's not being displayed to the user. > > I.e. can you describe an application that would break in a non-visual > way and where it would be harder to detect where the data originated > from compared to for example <iframe> usage. If a piece of text came from XHR and got injected into a visible DOM, it's not immediately obvious, which HTTP response it came from. -- Henri Sivonen [email protected] http://hsivonen.iki.fi/
