Re: [XHR] responseType json
On Fri, 06 Jan 2012 17:20:04 +0100, Glenn Maynard gl...@zewt.org wrote: Anne: There's one related change I'd suggest. Currently, if a JSON response says Content-Encoding: application/json; charset=Shift_JIS, the explicit charset will be silently ignored and UTF-8 will be used. I think this should be explicitly rejected, returning null as the JSON response entity body. Don't decode as UTF-8 despite an explicitly conflicting header, or people will start sending bogus charset values without realizing it. I don't think there's a single media type parameter that causes fatal error handling so I do not think that would be a good idea. E.g. text/event-stream;charset=hz-gb-2312 will give you utf-8 decoding too. text/html;charset=foobar will give you whatever is the default in HTML, etc. -- Anne van Kesteren http://annevankesteren.nl/
Re: [XHR] responseType json
On 2012-01-06 22:58, Tab Atkins Jr. wrote: ... RFC4627, for example, is six years old. This was right about the beginning of the time when UTF-8 everywhere, dammit was really starting to gain hold as a reasonable solution to encoding hell. Crockford, as well, is not a browser dev, nor is he closely connected to browser devs in a capacity that would really inform him of why supporting multiple encodings on the web is so painful. So, looking to that RFC for guidance on current best-practice is not a good idea. ... This is misleading. RFC 4627 is *written* by Douglas, but not owned by him. There is a change procedure in place. If you really really believe something needs to change, use it. (First step would be to subsribe to IETF apps-discuss and explain the problem) Best regards, Julian
Re: [XHR] responseType json
On 2012-01-07 10:48, Anne van Kesteren wrote: On Fri, 06 Jan 2012 17:20:04 +0100, Glenn Maynard gl...@zewt.org wrote: Anne: There's one related change I'd suggest. Currently, if a JSON response says Content-Encoding: application/json; charset=Shift_JIS, the explicit charset will be silently ignored and UTF-8 will be used. I think this should be explicitly rejected, returning null as the JSON response entity body. Don't decode as UTF-8 despite an explicitly conflicting header, or people will start sending bogus charset values without realizing it. I don't think there's a single media type parameter that causes fatal error handling so I do not think that would be a good idea. E.g. text/event-stream;charset=hz-gb-2312 will give you utf-8 decoding too. text/html;charset=foobar will give you whatever is the default in HTML, etc. charset is undefined on application/json, so ignoring it is the right thing. text/event-stream;charset=hz-gb-2312 on the other hand is invalid (as far as I understand the spec), so if this defaults to UTF-8 this is just an effect of the specified error handling. Best regards, Julian
Re: [XHR] responseType json
On Sat, 07 Jan 2012 11:30:42 +0100, Julian Reschke julian.resc...@gmx.de wrote: charset is undefined on application/json, so ignoring it is the right thing. text/event-stream;charset=hz-gb-2312 on the other hand is invalid (as far as I understand the spec), so if this defaults to UTF-8 this is just an effect of the specified error handling. I guess so. FWIW, the theory that 'charset' is defined for certain media types and not for others is not necessarily implemented that way. E.g. XMLHttpRequest text decoding just searches for a 'charset' parameter regardless of what the media type is. Not sure if that is the only context in which implementations diverge from the theoretical model (the theoretical model is kind of impossible to work with for generic code). -- Anne van Kesteren http://annevankesteren.nl/
Re: [XHR] responseType json
On 2012-01-07 15:15, Anne van Kesteren wrote: On Sat, 07 Jan 2012 11:30:42 +0100, Julian Reschke julian.resc...@gmx.de wrote: charset is undefined on application/json, so ignoring it is the right thing. text/event-stream;charset=hz-gb-2312 on the other hand is invalid (as far as I understand the spec), so if this defaults to UTF-8 this is just an effect of the specified error handling. I guess so. FWIW, the theory that 'charset' is defined for certain media types and not for others is not necessarily implemented that way. E.g. XMLHttpRequest text decoding just searches for a 'charset' parameter regardless of what the media type is. Not sure if that is the only context in which implementations diverge from the theoretical model (the theoretical model is kind of impossible to work with for generic code). For text/* this is ok. For others, maybe not. It's still better than having to special-case each and every media type... Best regards, Julian
Re: [XHR] responseType json
On Sat, 07 Jan 2012 02:55:15 +0100, Jarred Nicholls jar...@webkit.org wrote: Not exact, but close. For discussion's sake and in this context, you could call it the Unicode text decoder that does BOM detection and switches Unicode codecs automatically. For enforced UTF-8 I'd (have to) disable the BOM detection, but additionally could avoid decoding altogether if the specified encoding is not explicitly UTF-8 (and that was a part of the spec). We'll make it work either way :) FYI, if WebKit cannot do pure UTF-8 decoding (i.e. ignoring everything else), WebKit has bugs in its server-sent events (EventSource), Web Workers, WebVTT, and Web Sockets implementation. Potentially more, I'm not sure if this list is still complete. -- Anne van Kesteren http://annevankesteren.nl/
Re: [XHR] responseType json
On Mon, Dec 5, 2011 at 3:15 PM, Glenn Adams gl...@skynav.com wrote: But, if the browser does not support UTF-32, then the table in step (4) of [1] is supposed to apply, which would interpret the initial two bytes FF FE as UTF-16LE according to the current language of [1], and further, return a confidence level of certain. I see the problem now. It seems that the table in step (4) should be changed to interpret an initial FF FE as UTF-16BE only if the following two bytes are not 00. That wouldn't actually bring browsers and the spec closer together; it would actually bring them further apart. At first glance, it looks like it makes the spec allow WebKit and IE's behavior, which (unfortunately) includes UTF-32 detection, by allowing them to fall through to step 7, where they're allowed to detect things however they want. However, that's ignoring step 5. If step 4 passes through, then step 5 would happen next. That means this carefully-constructed file would be detected as UTF-8 by step 5: http://zewt.org/~glenn/test-utf32-with-ascii-meta.html-no-encoding That's not what happens in any browser; FF detects it as UTF-16 and WebKit and IE detect it as UTF-32. This change would require it to be detected as UTF-8, which would have security implications if implemented, eg. a page outputting escaped user-inputted text in UTF-32 might contain a string like this, followed by a hostile script, when interpreted as UTF-8. This really isn't worth spending time on; you've free to press this if you like, but I'm moving on. -- Glenn Maynard I'm getting responseType json landed in WebKit, and going to do so without the restriction of the JSON source being UTF-8. We default our decoding to UTF-8 if none is dictated by the server or overrideMIMEType(), but we also do BOM detection and will gracefully switch to UTF-16(BE/LE) or UTF-32(BE/LE) if the context is encoded as such, and accept the source as-is. It's a matter of having that perfect recipe of easiest implementation + most interoperability. It actually adds complication to our decoder if we do something special just for (perfectly legit) JSON payloads. I think keeping that UTF-8 bit in the spec is fine, but I don't think WebKit will be reducing our interoperability and complicating our code base. If we don't want JSON to be UTF-16 or UTF-32, let's change the JSON spec and the JSON grammar and JSON.parse will do the leg work. As someone else stated, this is a good fight but probably not the right battlefield.
Re: [XHR] responseType json
Please be careful with quote markers; you quoted text written by me as written by Glenn Adams. On Fri, Jan 6, 2012 at 10:00 AM, Jarred Nicholls jar...@webkit.org wrote: I'm getting responseType json landed in WebKit, and going to do so without the restriction of the JSON source being UTF-8. We default our decoding to UTF-8 if none is dictated by the server or overrideMIMEType(), but we also do BOM detection and will gracefully switch to UTF-16(BE/LE) or UTF-32(BE/LE) if the context is encoded as such, and accept the source as-is. It's a matter of having that perfect recipe of easiest implementation + most interoperability. It actually adds complication to our decoder if we Accepting content that other browsers don't will result in pages being created that work only in WebKit. That gives the least interoperability, not the most. If this behavior gets propagated into other browsers, that's even worse. Gecko doesn't support UTF-32, and adding it would be a huge step backwards. do something special just for (perfectly legit) JSON payloads. I think keeping that UTF-8 bit in the spec is fine, but I don't think WebKit will be reducing our interoperability and complicating our code base. If we don't want JSON to be UTF-16 or UTF-32, let's change the JSON spec and the JSON grammar and JSON.parse will do the leg work. Big -1 to perpetuating UTF-16 and UTF-32 due to braindamage in an IETF spec. Also, I'm a bit confused. You talk about the rudimentary encoding detection in the JSON spec (rfc4627 sec3), but you also mention HTTP mechanisms (HTTP headers and overrideMimeType). These are separate and unrelated. If you're using HTTP mechanisms, then the JSON spec doesn't enter into it. If you're using both HTTP headers (HTTP) and UTF-32 BOM detection (rfc4627), then you're using a strange mix of the two. I can't tell what mechanism you're actually using. As someone else stated, this is a good fight but probably not the right battlefield. Strongly disagree. Preventing legacy messes from being perpetuated into new APIs is one of the *only* battlefields available, where we can get people to stop using legacy encodings without breaking existing content. Anne: There's one related change I'd suggest. Currently, if a JSON response says Content-Encoding: application/json; charset=Shift_JIS, the explicit charset will be silently ignored and UTF-8 will be used. I think this should be explicitly rejected, returning null as the JSON response entity body. Don't decode as UTF-8 despite an explicitly conflicting header, or people will start sending bogus charset values without realizing it. -- Glenn Maynard
Re: [XHR] responseType json
On 2012-01-06 17:20, Glenn Maynard wrote: ... Big -1 to perpetuating UTF-16 and UTF-32 due to braindamage in an IETF spec. ... You seem to feel strongly about this (and I might agree for UTF-32). How about raising this issue in a place where there's an actual chance to cause changes? (- IETF apps-discuss) Best regards, Julian
Re: [XHR] responseType json
On 1/6/12 11:20 AM, Glenn Maynard wrote: Accepting content that other browsers don't will result in pages being created that work only in WebKit. That gives the least interoperability, not the most. I assume Jarred was talking about interoperability with content, not with other browsers. And thus start most races to the bottom in web-land -Boris
Re: [XHR] responseType json
On 2012-01-06 17:56, Boris Zbarsky wrote: On 1/6/12 11:20 AM, Glenn Maynard wrote: Accepting content that other browsers don't will result in pages being created that work only in WebKit. That gives the least interoperability, not the most. I assume Jarred was talking about interoperability with content, not with other browsers. And thus start most races to the bottom in web-land One could argue that it isn't a race to the bottom when the component accepts what is defined as valid (by the media type); and that the real problem is that another spec tries to profile that. Best regards, Julian
Re: [XHR] responseType json
On Fri, Jan 6, 2012 at 11:20 AM, Glenn Maynard gl...@zewt.org wrote: Please be careful with quote markers; you quoted text written by me as written by Glenn Adams. Sorry, copying from the archives into Gmail is a pain. On Fri, Jan 6, 2012 at 10:00 AM, Jarred Nicholls jar...@webkit.org wrote: I'm getting responseType json landed in WebKit, and going to do so without the restriction of the JSON source being UTF-8. We default our decoding to UTF-8 if none is dictated by the server or overrideMIMEType(), but we also do BOM detection and will gracefully switch to UTF-16(BE/LE) or UTF-32(BE/LE) if the context is encoded as such, and accept the source as-is. It's a matter of having that perfect recipe of easiest implementation + most interoperability. It actually adds complication to our decoder if we Accepting content that other browsers don't will result in pages being created that work only in WebKit. WebKit is used in many walled garden environments, so we consider these scenarios, but as a secondary goal to our primary goal of being a standards compliant browser engine. The point being, there will always be content that's created solely for WebKit, so that's not a good argument to make. So generally speaking, if someone is aiming to create content that's x-browser compatible, they'll do just that and use the least common denominators. That gives the least interoperability, not the most. If this behavior gets propagated into other browsers, that's even worse. Gecko doesn't support UTF-32, and adding it would be a huge step backwards. We're not adding anything here, it's a matter of complicating and taking away from our decoder for one particular case. You're acting like we're adding UTF-32 support for the first time. do something special just for (perfectly legit) JSON payloads. I think keeping that UTF-8 bit in the spec is fine, but I don't think WebKit will be reducing our interoperability and complicating our code base. If we don't want JSON to be UTF-16 or UTF-32, let's change the JSON spec and the JSON grammar and JSON.parse will do the leg work. Big -1 to perpetuating UTF-16 and UTF-32 due to braindamage in an IETF spec. So let's change the IETF spec as well - are we even fighting that battle yet? Also, I'm a bit confused. You talk about the rudimentary encoding detection in the JSON spec (rfc4627 sec3), but you also mention HTTP mechanisms (HTTP headers and overrideMimeType). These are separate and unrelated. If you're using HTTP mechanisms, then the JSON spec doesn't enter into it. If you're using both HTTP headers (HTTP) and UTF-32 BOM detection (rfc4627), then you're using a strange mix of the two. I can't tell what mechanism you're actually using. As someone else stated, this is a good fight but probably not the right battlefield. Strongly disagree. Preventing legacy messes from being perpetuated into new APIs is one of the *only* battlefields available, where we can get people to stop using legacy encodings without breaking existing content. without breaking existing content and yet killing UTF-16 and UTF-32 support just for responseType json would break existing UTF-16 and UTF-32 JSON. Well, which is it? Don't get me wrong, I agree with pushing UTF-8 as the sole text encoding for the web platform. But it's also plausible to push these restrictions not just in one spot in XHR, but across the web platform and also where the web platform defers to external specs (e.g. JSON). In this particular case, an author will be more likely to just use responseText + JSON.parse for content he/she cannot control - the content won't end up changing and our initiative is circumvented. I suggest taking this initiative elsewhere (at least in parallel), i.e., getting RFC4627 to only support UTF-8 encoding if that's the larger picture. To say that a legit JSON source can be stored as any Unicode encoding but can only be transported as UTF-8 in this one particular XHR case is inconsistent and only leads to worse interoperability and confusion to those looking up these specs - if I go to JSON spec first, I'll see all those encodings are supported and wonder why it doesn't work in this one instance. Are we out to totally confuse the hell out of authors? Anne: There's one related change I'd suggest. Currently, if a JSON response says Content-Encoding: application/json; charset=Shift_JIS, the explicit charset will be silently ignored and UTF-8 will be used. I think this should be explicitly rejected, returning null as the JSON response entity body. Don't decode as UTF-8 despite an explicitly conflicting header, or people will start sending bogus charset values without realizing it. +1 -- Glenn Maynard
Re: [XHR] responseType json
On 1/6/12 12:13 PM, Jarred Nicholls wrote: WebKit is used in many walled garden environments, so we consider these scenarios, but as a secondary goal to our primary goal of being a standards compliant browser engine. The point being, there will always be content that's created solely for WebKit, so that's not a good argument to make. So generally speaking, if someone is aiming to create content that's x-browser compatible, they'll do just that and use the least common denominators. People never aim to create content that's cross-browser compatible per se, with a tiny minority of exceptions. People aim to create content that reaches users. What that means is that right now people are busy authoring webkit-only websites on the open web because they think that webkit is the only UA that will ever matter on mobile. And if you point out this assumption to these people, they will tell you right to your face that it's a perfectly justified assumption. The problem is bad enough that both Trident and Gecko have seriously considered implementing support for some subset of -webkit CSS properties. Note that people here includes divisions of Google. As a result, any time WebKit deviates from standards, that _will_ 100% guaranteed cause sites to be created that depend on those deviations; the other UAs then have the choice of not working on those sites or duplicating the deviations. We've seen all this before, circa 2001 or so. Maybe in this particular case it doesn't matter, and maybe the spec in this case should just change, but if so, please argue for that, as the rest of your mail does, not for the principle of shipping random spec violations just because you want to. In general if WebKit wants to do special webkitty things in walled gardens that's fine. Don't pollute the web with them if it can be avoided. Same thing applies to other UAs, obviously. -Boris
Re: [XHR] responseType json
On Fri, Jan 6, 2012 at 3:18 PM, Boris Zbarsky bzbar...@mit.edu wrote: On 1/6/12 12:13 PM, Jarred Nicholls wrote: WebKit is used in many walled garden environments, so we consider these scenarios, but as a secondary goal to our primary goal of being a standards compliant browser engine. The point being, there will always be content that's created solely for WebKit, so that's not a good argument to make. So generally speaking, if someone is aiming to create content that's x-browser compatible, they'll do just that and use the least common denominators. People never aim to create content that's cross-browser compatible per se, with a tiny minority of exceptions. People aim to create content that reaches users. What that means is that right now people are busy authoring webkit-only websites on the open web because they think that webkit is the only UA that will ever matter on mobile. And if you point out this assumption to these people, they will tell you right to your face that it's a perfectly justified assumption. The problem is bad enough that both Trident and Gecko have seriously considered implementing support for some subset of -webkit CSS properties. Note that people here includes divisions of Google. As a result, any time WebKit deviates from standards, that _will_ 100% guaranteed cause sites to be created that depend on those deviations; the other UAs then have the choice of not working on those sites or duplicating the deviations. We've seen all this before, circa 2001 or so. Maybe in this particular case it doesn't matter, and maybe the spec in this case should just change, but if so, please argue for that, as the rest of your mail does, not for the principle of shipping random spec violations just because you want to. I think my entire mail was quite clear that the spec is inconsistent with rfc4627 and perhaps that's where the changes need to happen, or else yield to it. Let's not be dogmatic here, I'm just pointing out the obvious disconnect. This is an editor's draft of a spec, it's not a recommendation, so it's hardly a violation of anything. This is a 2-way street, and often times it's the spec that needs to change, not the implementation. The point is, there needs to be a very compelling reason to breach the contract of a media type's existing spec that would yield inconsistent results from the rest of the web platform layers, and involve taking away functionality that is working perfectly fine and can handle all the legit content that's already out there (as rare as it might be). Let's get Crockford on our side, let him know there's a lot of support for banishing UTF-16 and UTF-32 forever and change rfc4627. In general if WebKit wants to do special webkitty things in walled gardens that's fine. Don't pollute the web with them if it can be avoided. Same thing applies to other UAs, obviously. IE and WebKit have gracefully handled UTF-32 for a long time in other parts of the platform, and despite it being an unsupported codec of the HTML spec, they've continued to do so. I've had nothing to do with this, so I'm not to be held responsible for its present perpetuation ;) My argument is focused around the JSON media type's spec, which blatantly contradicts. -Boris -- *Sencha* Jarred Nicholls, Senior Software Architect @jarrednicholls http://twitter.com/jarrednicholls
Re: [XHR] responseType json
On 01/06/2012 10:28 PM, Jarred Nicholls wrote: This is an editor's draft of a spec, it's not a recommendation, so it's hardly a violation of anything. With this kind of attitude, frankly, you shouldn't be implementing a spec. HTH Ms2ger
Re: [XHR] responseType json
On Fri, Jan 6, 2012 at 12:18 PM, Boris Zbarsky bzbar...@mit.edu wrote: On 1/6/12 12:13 PM, Jarred Nicholls wrote: WebKit is used in many walled garden environments, so we consider these scenarios, but as a secondary goal to our primary goal of being a standards compliant browser engine. The point being, there will always be content that's created solely for WebKit, so that's not a good argument to make. So generally speaking, if someone is aiming to create content that's x-browser compatible, they'll do just that and use the least common denominators. People never aim to create content that's cross-browser compatible per se, with a tiny minority of exceptions. People aim to create content that reaches users. What that means is that right now people are busy authoring webkit-only websites on the open web because they think that webkit is the only UA that will ever matter on mobile. And if you point out this assumption to these people, they will tell you right to your face that it's a perfectly justified assumption. The problem is bad enough that both Trident and Gecko have seriously considered implementing support for some subset of -webkit CSS properties. Note that people here includes divisions of Google. As a result, any time WebKit deviates from standards, that _will_ 100% guaranteed cause sites to be created that depend on those deviations; the other UAs then have the choice of not working on those sites or duplicating the deviations. We've seen all this before, circa 2001 or so. Maybe in this particular case it doesn't matter, and maybe the spec in this case should just change, but if so, please argue for that, as the rest of your mail does, not for the principle of shipping random spec violations just because you want to. In general if WebKit wants to do special webkitty things in walled gardens that's fine. Don't pollute the web with them if it can be avoided. Same thing applies to other UAs, obviously. I'm ambivalent about whether we should restrict to utf8 or not. On the one hand, having everyone on utf8 would greatly simplify the web. On the other hand, I can imagine this hurting download size for japanese/chinese websites (i.e. they'd want utf-16). I agree with Boris that we don't need to pollute the web if we want to expose this to WebKit's walled-garden environments. We have mechanisms for exposing things only to those environments specifically to avoid this problem. Lets keep this discussion focused on what's best for the web. We can make WebKit do the appropriate thing.
Re: [XHR] responseType json
On Fri, Jan 6, 2012 at 4:34 PM, Ms2ger ms2...@gmail.com wrote: On 01/06/2012 10:28 PM, Jarred Nicholls wrote: This is an editor's draft of a spec, it's not a recommendation, so it's hardly a violation of anything. With this kind of attitude, frankly, you shouldn't be implementing a spec. I resent that comment, because I'm one of the few that fight in WebKit to get us 100% spec compliant in XHR (don't even get me started with how many violations there are in Firefox, IE, and Opera...WebKit isn't the only one mind you), but that doesn't mean any spec addition, as fluid as it is in the early stages, is gospel. In this case I simply think it wasn't debated enough before going in - actually it wasn't debated at all, it was just placed in there and now I'm a bad guy for pointing out its disconnect? I think your attitude is far poorer. The web platform changes all the time - if this matter is sured up, then implementations will change accordingly. HTH Ms2ger
Re: [XHR] responseType json
* Jarred Nicholls wrote: This is an editor's draft of a spec, it's not a recommendation, so it's hardly a violation of anything. This is a 2-way street, and often times it's the spec that needs to change, not the implementation. The point is, there needs to be a very compelling reason to breach the contract of a media type's existing spec that would yield inconsistent results from the rest of the web platform layers, and involve taking away functionality that is working perfectly fine and can handle all the legit content that's already out there (as rare as it might be). You have yet to explain how you propose Webkit should behave, and it is rather unclear to me whether the proposed behavior is in line with the existing HTTP, MIME, and JSON specifications. A HTTP response with Content-Type: application/json;charset=iso-8859-15 for instance must not be treated as ISO-8859-15 encoded as there is no charset parameter for the application/json media type, and there is no other reason to treat it as ISO-8859-15, so it's either an error, or you silently ignore the unrecognized parameter. -- Björn Höhrmann · mailto:bjo...@hoehrmann.de · http://bjoern.hoehrmann.de Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de 25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
Re: [XHR] responseType json
On Fri, Jan 6, 2012 at 4:58 PM, Tab Atkins Jr. jackalm...@gmail.com wrote: Long experience shows that people who say things like I'm going to code against the Rec instead of the draft, because the Rec is more stable I know that's a common error, but I never said I was going against a Rec. My point was that the editor's draft is fluid enough that it can be debated and changed, as it's clearly not perfect at any point in time. Debating a change to it doesn't put anyone in the wrong, and certainly doesn't mean I'm violating it - because tomorrow, my proposed violation could be the current state of the spec. RFC4627, for example, is six years old. This was right about the beginning of the time when UTF-8 everywhere, dammit was really starting to gain hold as a reasonable solution to encoding hell. Crockford, as well, is not a browser dev, nor is he closely connected to browser devs in a capacity that would really inform him of why supporting multiple encodings on the web is so painful. So, looking to that RFC for guidance on current best-practice is not a good idea. This issue has been debated and argued over for a long time, far predating the current XHR bit. There's a reason why new file formats produced in connection with web stuff are utf8-only. It's good for the web if we're consistent about this. ~TJ
Re: [XHR] responseType json
On Fri, Jan 6, 2012 at 4:54 PM, Bjoern Hoehrmann derhoe...@gmx.net wrote: * Jarred Nicholls wrote: This is an editor's draft of a spec, it's not a recommendation, so it's hardly a violation of anything. This is a 2-way street, and often times it's the spec that needs to change, not the implementation. The point is, there needs to be a very compelling reason to breach the contract of a media type's existing spec that would yield inconsistent results from the rest of the web platform layers, and involve taking away functionality that is working perfectly fine and can handle all the legit content that's already out there (as rare as it might be). You have yet to explain how you propose Webkit should behave, and it is rather unclear to me whether the proposed behavior is in line with the existing HTTP, MIME, and JSON specifications. A HTTP response with Content-Type: application/json;charset=iso-8859-15 for instance must not be treated as ISO-8859-15 encoded as there is no charset parameter for the application/json media type, and there is no other reason to treat it as ISO-8859-15, so it's either an error, or you silently ignore the unrecognized parameter. I think the spec should clarify this. I agree with Glenn Maynard's proposal: if a server sends a specific charset to use that isn't UTF-8, we should explicitly reject it, never decode or parse the text and return null. Silently decoding in UTF-8 when the server or author is dictating something different could cause confusion. -- Björn Höhrmann · mailto:bjo...@hoehrmann.de · http://bjoern.hoehrmann.de Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de 25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ -- *Sencha* Jarred Nicholls, Senior Software Architect @jarrednicholls http://twitter.com/jarrednicholls
Re: [XHR] responseType json
On Fri, Jan 6, 2012 at 12:13 PM, Jarred Nicholls jar...@webkit.org wrote: WebKit is used in many walled garden environments, so we consider these scenarios, but as a secondary goal to our primary goal of being a standards compliant browser engine. The point being, there will always be content that's created solely for WebKit, so that's not a good argument to make. So generally speaking, if someone is aiming to create content that's x-browser compatible, they'll do just that and use the least common denominators. If you support UTF-16 here, then people will use it. That's always the pattern on the web--one browser implements something extra, and everyone else ends up having to implement it--whether or not it was a good idea--because people accidentally started depending on it. I don't know why we have to keep repeating this mistake. We're not adding anything here, it's a matter of complicating and taking away from our decoder for one particular case. You're acting like we're adding UTF-32 support for the first time. Of course you are; you're adding UTF-16 and UTF-32 support to the responseType == json API. Also, since JSON uses zero-byte detection, which isn't used by HTML at all, you'd still need code in your decoder to support that--which means you're forcing everyone else to complicate *their* decoders with this special case. XHR's behavior, if the change I suggested is accepted, shouldn't require special cases in a decoding layer. I'd have the decoder expose the final encoding in use (which I'd expect to be available already), and when .response is queried, return null if the final encoding used by the decoder wasn't UTF-8. This means the decoding would still take place for other encodings, but the end result would be discarded by XHR. This puts the handling for this restriction within the XHR layer, rather than at the decoder layer. I said: Also, I'm a bit confused. You talk about the rudimentary encoding detection in the JSON spec (rfc4627 sec3), but you also mention HTTP mechanisms (HTTP headers and overrideMimeType). These are separate and unrelated. If you're using HTTP mechanisms, then the JSON spec doesn't enter into it. If you're using both HTTP headers (HTTP) and UTF-32 BOM detection (rfc4627), then you're using a strange mix of the two. I can't tell what mechanism you're actually using. Correction: rfc4627 doesn't describe BOM detection, it describes zero-byte detection. My question remains, though: what exactly are you doing? Do you do zero-byte detection? Do you do BOM detection? What's the order of precedence between zero-byte and/or BOM detection, HTTP Content-Type headers, and overrideMimeType if they disagree? All of this would need to be specified; currently none of it is. without breaking existing content and yet killing UTF-16 and UTF-32 support just for responseType json would break existing UTF-16 and UTF-32 JSON. Well, which is it? This is a new feature; there isn't yet existing content using a responseType of json to be broken. Don't get me wrong, I agree with pushing UTF-8 as the sole text encoding for the web platform. But it's also plausible to push these restrictions not just in one spot in XHR, but across the web platform I've yet to see a workable proposal to do this across the web platform, due to backwards-compatibility. That's why it's being done more narrowly, where it can be done without breaking existing pages. If you have any novel ideas to do this across the platform, I guarantee everyone on the list would like to hear them. Failing that, we should do what we can where we can. and also where the web platform defers to external specs (e.g. JSON). In this particular case, an author will be more likely to just use responseText + JSON.parse for content he/she cannot control - the content won't end up changing and our initiative is circumvented. Of course not. It tells the developer that something's wrong, and he has the choice of working around it or fixing his service. If just 25% of those people make the right choice, this is a win. It also helps discourage new services from being written using legacy encodings. We can't stop people from doing the wrong thing, but that doesn't mean we shouldn't point people in the right direction. This is an editor's draft of a spec, it's not a recommendation, so it's hardly a violation of anything. This is the worst thing I've seen anyone say in here in a long time. On Fri, Jan 6, 2012 at 12:25 PM, Julian Reschke julian.resc...@gmx.dewrote: One could argue that it isn't a race to the bottom when the component accepts what is defined as valid (by the media type); and that the real problem is that another spec tries to profile that. First off, it's common and perfectly normal for an API exposing features from another spec to explicitly limit the allowed profile of that spec. Saying JSON through this API must be UTF-8 is perfectly OK. Second, this
Re: [XHR] responseType json
Sent from my iPhone On Jan 6, 2012, at 7:11 PM, Glenn Maynard gl...@zewt.org wrote: On Fri, Jan 6, 2012 at 12:13 PM, Jarred Nicholls jar...@webkit.org wrote: WebKit is used in many walled garden environments, so we consider these scenarios, but as a secondary goal to our primary goal of being a standards compliant browser engine. The point being, there will always be content that's created solely for WebKit, so that's not a good argument to make. So generally speaking, if someone is aiming to create content that's x-browser compatible, they'll do just that and use the least common denominators. If you support UTF-16 here, then people will use it. That's always the pattern on the web--one browser implements something extra, and everyone else ends up having to implement it--whether or not it was a good idea--because people accidentally started depending on it. I don't know why we have to keep repeating this mistake. We're not adding anything here, it's a matter of complicating and taking away from our decoder for one particular case. You're acting like we're adding UTF-32 support for the first time. Of course you are; you're adding UTF-16 and UTF-32 support to the responseType == json API. Also, since JSON uses zero-byte detection, which isn't used by HTML at all, you'd still need code in your decoder to support that--which means you're forcing everyone else to complicate *their* decoders with this special case. XHR's behavior, if the change I suggested is accepted, shouldn't require special cases in a decoding layer. I'd have the decoder expose the final encoding in use (which I'd expect to be available already), and when .response is queried, return null if the final encoding used by the decoder wasn't UTF-8. This means the decoding would still take place for other encodings, but the end result would be discarded by XHR. This puts the handling for this restriction within the XHR layer, rather than at the decoder layer. That's why I'd like to see the spec changed to clarify the discarding if the encoding was supplied and isn't UTF-8. I said: Also, I'm a bit confused. You talk about the rudimentary encoding detection in the JSON spec (rfc4627 sec3), but you also mention HTTP mechanisms (HTTP headers and overrideMimeType). These are separate and unrelated. If you're using HTTP mechanisms, then the JSON spec doesn't enter into it. If you're using both HTTP headers (HTTP) and UTF-32 BOM detection (rfc4627), then you're using a strange mix of the two. I can't tell what mechanism you're actually using. Correction: rfc4627 doesn't describe BOM detection, it describes zero-byte detection. My question remains, though: what exactly are you doing? Do you do zero-byte detection? Do you do BOM detection? What's the order of precedence between zero-byte and/or BOM detection, HTTP Content-Type headers, and overrideMimeType if they disagree? All of this would need to be specified; currently none of it is. None of that matters if a specific codec is the one all be all. If that's the consensus then that's it, period. WebKit shares a single text decoder globally for HTML, XML, plain text, etc. the XHR payload runs through it before it would pass to JSON.parse. Read the code if you're interested. I would need to change the text decoder to skip BOM detection for this one case unless the spec added that wording of discarding when encoding != UTF-8, then that can be enforced all in XHR with no decoder changes. I don't want to get hung on explaining WebKit's specific impl. details. without breaking existing content and yet killing UTF-16 and UTF-32 support just for responseType json would break existing UTF-16 and UTF-32 JSON. Well, which is it? This is a new feature; there isn't yet existing content using a responseType of json to be broken. Don't get me wrong, I agree with pushing UTF-8 as the sole text encoding for the web platform. But it's also plausible to push these restrictions not just in one spot in XHR, but across the web platform I've yet to see a workable proposal to do this across the web platform, due to backwards-compatibility. That's why it's being done more narrowly, where it can be done without breaking existing pages. If you have any novel ideas to do this across the platform, I guarantee everyone on the list would like to hear them. Failing that, we should do what we can where we can. and also where the web platform defers to external specs (e.g. JSON). In this particular case, an author will be more likely to just use responseText + JSON.parse for content he/she cannot control - the content won't end up changing and our initiative is circumvented. Of course not. It tells the developer that something's wrong, and he has the choice of working around it or fixing his service. If just 25% of those people make the right choice, this is a
Re: [XHR] responseType json
On Fri, Jan 6, 2012 at 7:36 PM, Jarred Nicholls jar...@webkit.org wrote: Correction: rfc4627 doesn't describe BOM detection, it describes zero-byte detection. My question remains, though: what exactly are you doing? Do you do zero-byte detection? Do you do BOM detection? What's the order of precedence between zero-byte and/or BOM detection, HTTP Content-Type headers, and overrideMimeType if they disagree? All of this would need to be specified; currently none of it is. None of that matters if a specific codec is the one all be all. If that's the consensus then that's it, period. WebKit shares a single text decoder globally for HTML, XML, plain text, etc. the XHR payload runs through it before it would pass to JSON.parse. Read the code if you're interested. I would need to change the text decoder to skip BOM detection for this one case unless the spec added that wording of discarding when encoding != UTF-8, then that can be enforced all in XHR with no decoder changes. I don't want to get hung on explaining WebKit's specific impl. details. All of the details I asked about are user-visible, not WebKit implementation details, and would need to be specified if encodings other than UTF-8 were allowed. I do think this should remain UTF-8 only, but if you want to discuss allowing other encodings, these are things that would need to be defined (which requires a clear proposal, not read the code). I assume it's not using the exact same decoder logic as HTML. After all, that would allow non-Unicode encodings. -- Glenn Maynard
Re: [XHR] responseType json
On Jan 6, 2012, at 8:10 PM, Glenn Maynard gl...@zewt.org wrote: On Fri, Jan 6, 2012 at 7:36 PM, Jarred Nicholls jar...@webkit.org wrote: Correction: rfc4627 doesn't describe BOM detection, it describes zero-byte detection. My question remains, though: what exactly are you doing? Do you do zero-byte detection? Do you do BOM detection? What's the order of precedence between zero-byte and/or BOM detection, HTTP Content-Type headers, and overrideMimeType if they disagree? All of this would need to be specified; currently none of it is. None of that matters if a specific codec is the one all be all. If that's the consensus then that's it, period. WebKit shares a single text decoder globally for HTML, XML, plain text, etc. the XHR payload runs through it before it would pass to JSON.parse. Read the code if you're interested. I would need to change the text decoder to skip BOM detection for this one case unless the spec added that wording of discarding when encoding != UTF-8, then that can be enforced all in XHR with no decoder changes. I don't want to get hung on explaining WebKit's specific impl. details. All of the details I asked about are user-visible, not WebKit implementation details, and would need to be specified if encodings other than UTF-8 were allowed. I do think this should remain UTF-8 only, but if you want to discuss allowing other encodings, these are things that would need to be defined (which requires a clear proposal, not read the code). Of course, I apologize I didn't mean it as a dismissal, I just figured if we are settled on one codec then I'd spare ourselves the time. I'm also mobile :) I could provide you those details if no decoding changes (enforcement) were done in WebKit, if you'd like. But since this is a new API, might as well just stick to UTF-8. I assume it's not using the exact same decoder logic as HTML. After all, that would allow non-Unicode encodings. Not exact, but close. For discussion's sake and in this context, you could call it the Unicode text decoder that does BOM detection and switches Unicode codecs automatically. For enforced UTF-8 I'd (have to) disable the BOM detection, but additionally could avoid decoding altogether if the specified encoding is not explicitly UTF-8 (and that was a part of the spec). We'll make it work either way :) -- Glenn Maynard
Re: [XHR] responseType json
On Fri, Jan 6, 2012 at 1:45 PM, Jarred Nicholls jar...@webkit.org wrote: On Fri, Jan 6, 2012 at 4:34 PM, Ms2ger ms2...@gmail.com wrote: On 01/06/2012 10:28 PM, Jarred Nicholls wrote: This is an editor's draft of a spec, it's not a recommendation, so it's hardly a violation of anything. With this kind of attitude, frankly, you shouldn't be implementing a spec. I resent that comment, because I'm one of the few that fight in WebKit to get us 100% spec compliant in XHR (don't even get me started with how many violations there are in Firefox, IE, and Opera...WebKit isn't the only one mind you), but that doesn't mean any spec addition, as fluid as it is in the early stages, is gospel. In this case I simply think it wasn't debated enough before going in - actually it wasn't debated at all, it was just placed in there and now I'm a bad guy for pointing out its disconnect? I think your attitude is far poorer. The web platform changes all the time - if this matter is sured up, then implementations will change accordingly. While Ms2ger was a bit short, there's a reason. Long experience shows that people who say things like I'm going to code against the Rec instead of the draft, because the Rec is more stable often end up causing pain for everyone else, because that more stable Rec is also *more wrong*, precisely because stable means hasn't been updated to take into account new information or to fix bugs. This happens even for smaller differences - well-meaning devs coding to the Working Draft of a spec on /TR instead of the error-corrected Editor's Draft cause never-ending pain. Old RFCs are also often a source of pain, because we quite often find that the authors aren't fully versed in the complexities and subtleties of the public web. They may be operating from an academic or corporate standpoint, or otherwise be contained in a local experience-minimum that affects their view of what's reasonable. RFC4627, for example, is six years old. This was right about the beginning of the time when UTF-8 everywhere, dammit was really starting to gain hold as a reasonable solution to encoding hell. Crockford, as well, is not a browser dev, nor is he closely connected to browser devs in a capacity that would really inform him of why supporting multiple encodings on the web is so painful. So, looking to that RFC for guidance on current best-practice is not a good idea. This issue has been debated and argued over for a long time, far predating the current XHR bit. There's a reason why new file formats produced in connection with web stuff are utf8-only. It's good for the web if we're consistent about this. ~TJ
Re: [XHR] responseType json
On Fri, Jan 6, 2012 at 12:36 PM, Ojan Vafai o...@chromium.org wrote: I'm ambivalent about whether we should restrict to utf8 or not. On the one hand, having everyone on utf8 would greatly simplify the web. On the other hand, I can imagine this hurting download size for japanese/chinese websites (i.e. they'd want utf-16). Note that this may be subject to the same counter-intuitive forces that cause UTF-8 to usually be better for CJK HTML pages (because a lot of the source is ASCII markup). In JSON, all of the markup artifacts (braces, brackets, quotes, colon, commas, spaces) are ASCII, along with numbers, bools, and null. Only the contents of strings can be non-ascii. JSON is generally lighter on markup than XML-like languages, so the effect may not be as pronounced, but it shouldn't be dismissed without some study. At minimum, it will *reduce* the size difference between the two. ~TJ
Re: [XHR] responseType json
On Fri, Jan 6, 2012 at 4:45 PM, Tab Atkins Jr. jackalm...@gmail.com wrote: Note that this may be subject to the same counter-intuitive forces that cause UTF-8 to usually be better for CJK HTML pages (because a lot of the source is ASCII markup). In JSON, all of the markup artifacts (braces, brackets, quotes, colon, commas, spaces) are ASCII, along with numbers, bools, and null. Only the contents of strings can be non-ascii. JSON is generally lighter on markup than XML-like languages, so the effect may not be as pronounced, but it shouldn't be dismissed without some study. At minimum, it will *reduce* the size difference between the two. And more fundamentally, this is trying to repurpose charsets as a compression mechanism. If you want compression, use compression (Transfer-Encoding: gzip): -rw-rw-r-- 1 glenn glenn 7274 Jan 06 23:59 test-utf8.txt -rw-rw-r-- 1 glenn glenn 3672 Jan 06 23:59 test-utf8.txt.gz -rw-rw-r-- 1 glenn glenn 6150 Jan 06 23:59 test-utf16.txt -rw-rw-r-- 1 glenn glenn 3468 Jan 06 23:59 test-utf16.txt.gz The difference even without compression isn't enough to warrant the complexity (~15%), and with compression the difference is under 10%. (Test case is simply copying the rendered text from http://ja.wikipedia.org/wiki/%E3%83%A1%E3%82%A4%E3%83%B3%E3%83%9A%E3%83%BC%E3%82%B8in Firefox.) -- Glenn Maynard
Re: [XHR] responseType json
On Mon, Dec 12, 2011 at 7:08 PM, Jarred Nicholls jar...@sencha.com wrote: There's no feeding (re: streaming) of data to a parser, it's buffered until the state is DONE (readyState == 4) and then an XML doc is created upon the first access to responseXML or response. Same will go for the JSON parser in our first iteration of implementing the json responseType. FWIW, Gecko parses XML and HTML in a streaming way as data arrives from the network. When readyState changes to DONE, the document has already been parsed. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [XHR] responseType json
On Sun, 11 Dec 2011 15:44:58 +0100, Jarred Nicholls jar...@sencha.com wrote: I understand that's how you spec'ed it, but it's not how it's implemented in IE nor WebKit for legacy purposes - which is what I meant in the above statement. What do you mean legacy purposes? responseType is a new feature. And we added it in this way in part because of feedback from the WebKit community that did not want to keep the raw data around. In the thread where we discussed adding it the person working on it for WebKit did seem to plan on implementing it per the specification: http://lists.w3.org/Archives/Public/public-webapps/2010OctDec/thread.html#msg799 In WebKit and IE =9, a responseType of , text, or document means access to both responseXML and responseText. I don't know what IE10's behavior is yet. IE8 could not have supported this feature and for IE9 I could not find any documentation. Are you sure they implemented it? Given that Gecko does the right thing and Opera will too (next major release I believe) I do not really see any reason to change the specification. -- Anne van Kesteren http://annevankesteren.nl/
Re: [XHR] responseType json
I'd like to bring up an issue with the spec with regards to responseText + the new json responseType. Currently it is written that responseText should throw an exception if the responseType is not or text. I would argue that responseText should also return the plain text when the type is json. Take the scenario of debugging an application, or an application that has a Error Reporting feature; If XHR.response returns null, meaning the JSON payload was not successfully parsed and/or was invalid, there is no means to retrieve the plain text that caused the error. null is rather useless at that point. See my WebKit bug for more context: https://bugs.webkit.org/show_bug.cgi?id=73648 For legacy reasons, responseText and responseXML continue to work together despite the responseType that is set. In other words, a responseType of text still allows access to responseXML, and responseType of document still allows access to responseText. And it makes sense that this is so; if a strong-typed Document from responseXML is unable to be created, responseText is the fallback to get the payload and either debug it, submit it as an error report, etc. I would argue that json responseType would be more valuable if it behaved the same. Unlike the binary types (ArrayBuffer, Blob), json and document are backed by a plain text payload and therefore responseText has value in being accessible. If all we can get on a bad JSON response is null, I think there is little incentive for anyone to use the json type when they can use text and JSON.parse it themselves. Comments, questions, and flames are welcomed! Thanks, Jarred
Re: [XHR] responseType json
I'd like to bring up an issue with the spec with regards to responseText + the new json responseType. Currently it is written that responseText should throw an exception if the responseType is not or text. I would argue that responseText should also return the plain text when the type is json. Take the scenario of debugging an application, or an application that has a Error Reporting feature; If XHR.response returns null, meaning the JSON payload was not successfully parsed and/or was invalid, there is no means to retrieve the plain text that caused the error. null is rather useless at that point. See my WebKit bug for more context: https://bugs.webkit.org/show_bug.cgi?id=73648 For legacy reasons, responseText and responseXML continue to work together despite the responseType that is set. In other words, a responseType of text still allows access to responseXML, and responseType of document still allows access to responseText. And it makes sense that this is so; if a strong-typed Document from responseXML is unable to be created, responseText is the fallback to get the payload and either debug it, submit it as an error report, etc. I would argue that json responseType would be more valuable if it behaved the same. Unlike the binary types (ArrayBuffer, Blob), json and document are backed by a plain text payload and therefore responseText has value in being accessible. If all we can get on a bad JSON response is null, I think there is little incentive for anyone to use the json type when they can use text and JSON.parse it themselves. Comments, questions, and flames are welcomed! Thanks, Jarred
Re: [XHR] responseType json
On Sun, Dec 11, 2011 at 4:08 PM, Jarred Nicholls jar...@sencha.com wrote: A good compromise would be to only throw it away (and thus restrict responseText access) upon the first successful parse when accessing .response. I disagree. Even though conceptually, the spec says that you first accumulate text and then you invoke JSON.parse, I think we should allow for implementations that feed an incremental JSON parser as data arrives from the network and throws away each input buffer after pushing it to the incremental JSON parser. That is, in order to allow more memory-efficient implementations in the future, I think we shouldn't expose responseText for JSON. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [XHR] responseType json
On Mon, Dec 12, 2011 at 5:37 AM, Anne van Kesteren ann...@opera.com wrote: On Sun, 11 Dec 2011 15:44:58 +0100, Jarred Nicholls jar...@sencha.com wrote: I understand that's how you spec'ed it, but it's not how it's implemented in IE nor WebKit for legacy purposes - which is what I meant in the above statement. What do you mean legacy purposes? responseType is a new feature. And we added it in this way in part because of feedback from the WebKit community that did not want to keep the raw data around. I wasn't talking about responseType, I was referring to the pair of responseText and responseXML being accessible together since the dawn of time. I don't know why WebKit and IE didn't take the opportunity to use responseType and kill that behavior; don't ask me, I wasn't responsible for it ;) In the thread where we discussed adding it the person working on it for WebKit did seem to plan on implementing it per the specification: http://lists.w3.org/Archives/**Public/public-webapps/** 2010OctDec/thread.html#msg799http://lists.w3.org/Archives/Public/public-webapps/2010OctDec/thread.html#msg799 Clearly not - shame, because now I'm trying to clean up the mess. In WebKit and IE =9, a responseType of , text, or document means access to both responseXML and responseText. I don't know what IE10's behavior is yet. IE8 could not have supported this feature and for IE9 I could not find any documentation. Are you sure they implemented it? I'm not positive if they did to be honest - I haven't found it documented anywhere. Given that Gecko does the right thing and Opera will too (next major release I believe) I do not really see any reason to change the specification. I started an initiative to bring XHR in WebKit up-to-spec (see https://bugs.webkit.org/show_bug.cgi?id=54162) and got a lot of push back. All I'm asking is that if I run into push back again, that I can send them your way ;) -- Anne van Kesteren http://annevankesteren.nl/ -- *Sencha* Jarred Nicholls, Senior Software Architect @jarrednicholls http://twitter.com/jarrednicholls
Re: [XHR] responseType json
On Mon, Dec 12, 2011 at 6:39 AM, Henri Sivonen hsivo...@iki.fi wrote: On Sun, Dec 11, 2011 at 4:08 PM, Jarred Nicholls jar...@sencha.com wrote: A good compromise would be to only throw it away (and thus restrict responseText access) upon the first successful parse when accessing .response. I disagree. Even though conceptually, the spec says that you first accumulate text and then you invoke JSON.parse, I think we should allow for implementations that feed an incremental JSON parser as data arrives from the network and throws away each input buffer after pushing it to the incremental JSON parser. That is, in order to allow more memory-efficient implementations in the future, I think we shouldn't expose responseText for JSON. I'm completely down with that. It still leaves an unsatisfied use case; but one that, after a nice weekend of relaxation, I no longer care about. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/ -- *Sencha* Jarred Nicholls, Senior Software Architect @jarrednicholls http://twitter.com/jarrednicholls
Re: [XHR] responseType json
On Mon, 12 Dec 2011 14:12:57 +0100, Jarred Nicholls jar...@sencha.com wrote: I started an initiative to bring XHR in WebKit up-to-spec (see https://bugs.webkit.org/show_bug.cgi?id=54162) and got a lot of push back. All I'm asking is that if I run into push back again, that I can send them your way ;) So a) thanks a lot for doing that and b) please do send them here. Discussing the XMLHttpRequest standard should happen here, not in bugs.webkit.org :-) -- Anne van Kesteren http://annevankesteren.nl/
Re: [XHR] responseType json
On 12/12/2011 03:12 PM, Jarred Nicholls wrote: On Mon, Dec 12, 2011 at 5:37 AM, Anne van Kesteren ann...@opera.com mailto:ann...@opera.com wrote: On Sun, 11 Dec 2011 15:44:58 +0100, Jarred Nicholls jar...@sencha.com mailto:jar...@sencha.com wrote: I understand that's how you spec'ed it, but it's not how it's implemented in IE nor WebKit for legacy purposes - which is what I meant in the above statement. What do you mean legacy purposes? responseType is a new feature. And we added it in this way in part because of feedback from the WebKit community that did not want to keep the raw data around. I wasn't talking about responseType, I was referring to the pair of responseText and responseXML being accessible together since the dawn of time. In case responseType is not set. If responseType is set, implementations can optimize certain things. I don't know why WebKit and IE didn't take the opportunity to use responseType responseType is a new thing. Gecko hasn't changed behavior in case responseType is not set. and kill that behavior; don't ask me, I wasn't responsible for it ;) In the thread where we discussed adding it the person working on it for WebKit did seem to plan on implementing it per the specification: http://lists.w3.org/Archives/__Public/public-webapps/__2010OctDec/thread.html#msg799 http://lists.w3.org/Archives/Public/public-webapps/2010OctDec/thread.html#msg799 Clearly not - shame, because now I'm trying to clean up the mess. In WebKit and IE =9, a responseType of , text, or document means access to both responseXML and responseText. I don't know what IE10's behavior is yet. IE8 could not have supported this feature and for IE9 I could not find any documentation. Are you sure they implemented it? I'm not positive if they did to be honest - I haven't found it documented anywhere. Given that Gecko does the right thing and Opera will too (next major release I believe) I do not really see any reason to change the specification. I started an initiative to bring XHR in WebKit up-to-spec (see https://bugs.webkit.org/show_bug.cgi?id=54162) and got a lot of push back. All I'm asking is that if I run into push back again, that I can send them your way ;) -- Anne van Kesteren http://annevankesteren.nl/ -- *Sencha* Jarred Nicholls, Senior Software Architect @jarrednicholls http://twitter.com/jarrednicholls
Re: [XHR] responseType json
On Mon, Dec 12, 2011 at 9:28 AM, Boris Zbarsky bzbar...@mit.edu wrote: On 12/12/11 8:12 AM, Jarred Nicholls wrote: I started an initiative to bring XHR in WebKit up-to-spec (see https://bugs.webkit.org/show_**bug.cgi?id=54162https://bugs.webkit.org/show_bug.cgi?id=54162) and got a lot of push back. That seems to be about a different issue than responseType, right? I just tried the following testcase: script var xhr = new XMLHttpRequest(); xhr.open(GET, window.location, false); xhr.responseType = document xhr.send(); try { alert(xhr.responseText); } catch (e) { alert(e); } try { alert(xhr.responseXML); } catch (e) { alert(e); } xhr.open(GET, window.location, false); xhr.responseType = text xhr.send(); try { alert(xhr.responseText); } catch (e) { alert(e); } try { alert(xhr.responseXML); } catch (e) { alert(e); } /script Gecko behavior seems to be per spec: the attempt to get responseText fails on the first XHR, and the attempt to get responseXML fails on the second XHR. WebKit (tested Chrome dev channel and Safari 5.1.1 behavior) seems to be partially per spec: the attempt to get responseText throws for the first XHR, but the attempt to get the responseXML succeeds for the second XHR. That sort of makes sense in terms of how I recall WebKit implementing feeding data to their parser in XHR, if the implementation of responseType just wasn't very careful. There's no feeding (re: streaming) of data to a parser, it's buffered until the state is DONE (readyState == 4) and then an XML doc is created upon the first access to responseXML or response. Same will go for the JSON parser in our first iteration of implementing the json responseType. Given that WebKit already implements the right behavior when responseType = document, it sounds like the only bug on their end here is really responseType = text handling, right? It'd definitely be good to just fix that... Yeah I'm going to clean up all the mess. -Boris Thanks! -- *Sencha* Jarred Nicholls, Senior Software Architect @jarrednicholls http://twitter.com/jarrednicholls
Re: [XHR] responseType json
On Sat, Dec 10, 2011 at 9:10 PM, Jarred Nicholls jar...@sencha.com wrote: I'd like to bring up an issue with the spec with regards to responseText + the new json responseType. Currently it is written that responseText should throw an exception if the responseType is not or text. I would argue that responseText should also return the plain text when the type is json. Take the scenario of debugging an application, or an application that has a Error Reporting feature; If XHR.response returns null, meaning the JSON payload was not successfully parsed and/or was invalid, there is no means to retrieve the plain text that caused the error. null is rather useless at that point. See my WebKit bug for more context: https://bugs.webkit.org/show_bug.cgi?id=73648 For legacy reasons, responseText and responseXML continue to work together despite the responseType that is set. In other words, a responseType of text still allows access to responseXML, and responseType of document still allows access to responseText. And it makes sense that this is so; if a strong-typed Document from responseXML is unable to be created, responseText is the fallback to get the payload and either debug it, submit it as an error report, etc. I would argue that json responseType would be more valuable if it behaved the same. Unlike the binary types (ArrayBuffer, Blob), json and document are backed by a plain text payload and therefore responseText has value in being accessible. If all we can get on a bad JSON response is null, I think there is little incentive for anyone to use the json type when they can use text and JSON.parse it themselves. What's the problem with simply setting responseType to 'text' when debugging? A nice benefit of *not* presenting the text by default is that the browser can throw the text away immediately, rather than keeping around the payload in both forms and paying for it twice in memory (especially since the text form will, I believe, generally be larger than the JSON form). ~TJ
Re: [XHR] responseType json
On Sun, 11 Dec 2011 06:10:26 +0100, Jarred Nicholls jar...@sencha.com wrote: For legacy reasons, responseText and responseXML continue to work together despite the responseType that is set. This is false. responseType text allows access to responseText, but not responseXML. document allows access to responseXML, but not responseText. We made this exclusive to reduce memory usage. I hope that browsers will report the JSON errors to the console and I think at some point going forward we should probably introduce some kind of error object for XMLHttpRequest. -- Anne van Kesteren http://annevankesteren.nl/
Re: [XHR] responseType json
On Sun, Dec 11, 2011 at 5:08 AM, Tab Atkins Jr. jackalm...@gmail.comwrote: On Sat, Dec 10, 2011 at 9:10 PM, Jarred Nicholls jar...@sencha.com wrote: I'd like to bring up an issue with the spec with regards to responseText + the new json responseType. Currently it is written that responseText should throw an exception if the responseType is not or text. I would argue that responseText should also return the plain text when the type is json. Take the scenario of debugging an application, or an application that has a Error Reporting feature; If XHR.response returns null, meaning the JSON payload was not successfully parsed and/or was invalid, there is no means to retrieve the plain text that caused the error. null is rather useless at that point. See my WebKit bug for more context: https://bugs.webkit.org/show_bug.cgi?id=73648 For legacy reasons, responseText and responseXML continue to work together despite the responseType that is set. In other words, a responseType of text still allows access to responseXML, and responseType of document still allows access to responseText. And it makes sense that this is so; if a strong-typed Document from responseXML is unable to be created, responseText is the fallback to get the payload and either debug it, submit it as an error report, etc. I would argue that json responseType would be more valuable if it behaved the same. Unlike the binary types (ArrayBuffer, Blob), json and document are backed by a plain text payload and therefore responseText has value in being accessible. If all we can get on a bad JSON response is null, I think there is little incentive for anyone to use the json type when they can use text and JSON.parse it themselves. What's the problem with simply setting responseType to 'text' when debugging? This does not satisfy the use cases of error reporting w/ contextual data nor the use case of debugging a runtime error in a production environment. A nice benefit of *not* presenting the text by default is that the browser can throw the text away immediately, rather than keeping around the payload in both forms and paying for it twice in memory (especially since the text form will, I believe, generally be larger than the JSON form). Yes I agree, and it's what everyone w/ WebKit wants to try and accomplish. A good compromise would be to only throw it away (and thus restrict responseText access) upon the first successful parse when accessing .response. ~TJ -- *Sencha* Jarred Nicholls, Senior Software Architect @jarrednicholls http://twitter.com/jarrednicholls
Re: [XHR] responseType json
On Sun, Dec 11, 2011 at 9:08 AM, Jarred Nicholls jar...@sencha.com wrote: On Sun, Dec 11, 2011 at 5:08 AM, Tab Atkins Jr. jackalm...@gmail.comwrote: On Sat, Dec 10, 2011 at 9:10 PM, Jarred Nicholls jar...@sencha.com wrote: I'd like to bring up an issue with the spec with regards to responseText + the new json responseType. Currently it is written that responseText should throw an exception if the responseType is not or text. I would argue that responseText should also return the plain text when the type is json. Take the scenario of debugging an application, or an application that has a Error Reporting feature; If XHR.response returns null, meaning the JSON payload was not successfully parsed and/or was invalid, there is no means to retrieve the plain text that caused the error. null is rather useless at that point. See my WebKit bug for more context: https://bugs.webkit.org/show_bug.cgi?id=73648 For legacy reasons, responseText and responseXML continue to work together despite the responseType that is set. In other words, a responseType of text still allows access to responseXML, and responseType of document still allows access to responseText. And it makes sense that this is so; if a strong-typed Document from responseXML is unable to be created, responseText is the fallback to get the payload and either debug it, submit it as an error report, etc. I would argue that json responseType would be more valuable if it behaved the same. Unlike the binary types (ArrayBuffer, Blob), json and document are backed by a plain text payload and therefore responseText has value in being accessible. If all we can get on a bad JSON response is null, I think there is little incentive for anyone to use the json type when they can use text and JSON.parse it themselves. What's the problem with simply setting responseType to 'text' when debugging? This does not satisfy the use cases of error reporting w/ contextual data nor the use case of debugging a runtime error in a production environment. Given that most user agents send the payload to the console, the debugging scenario is satisfied; so I renege on that. Error reporting is still a valid use case, albeit a rare requirement. A nice benefit of *not* presenting the text by default is that the browser can throw the text away immediately, rather than keeping around the payload in both forms and paying for it twice in memory (especially since the text form will, I believe, generally be larger than the JSON form). Yes I agree, and it's what everyone w/ WebKit wants to try and accomplish. A good compromise would be to only throw it away (and thus restrict responseText access) upon the first successful parse when accessing .response. ~TJ -- *Sencha* Jarred Nicholls, Senior Software Architect @jarrednicholls http://twitter.com/jarrednicholls -- *Sencha* Jarred Nicholls, Senior Software Architect @jarrednicholls http://twitter.com/jarrednicholls
Re: [XHR] responseType json
On Sun, Dec 11, 2011 at 6:55 AM, Anne van Kesteren ann...@opera.com wrote: On Sun, 11 Dec 2011 06:10:26 +0100, Jarred Nicholls jar...@sencha.com wrote: For legacy reasons, responseText and responseXML continue to work together despite the responseType that is set. This is false. responseType text allows access to responseText, but not responseXML. document allows access to responseXML, but not responseText. I understand that's how you spec'ed it, but it's not how it's implemented in IE nor WebKit for legacy purposes - which is what I meant in the above statement. Firefox (tested on 8) is the only one adhering to the spec as you described above. In WebKit and IE =9, a responseType of , text, or document means access to both responseXML and responseText. I don't know what IE10's behavior is yet. I'll be fighting a battle soon to get WebKit to be 100% compliant with the spec - and it's hard to convince others (harder than it should be) to change when IE doesn't behave in the same manner. The use case of error reporting w/ contextual data (i.e. the bad payload) is still unsatisfied, but it's not a common scenario. We made this exclusive to reduce memory usage. I hope that browsers will report the JSON errors to the console The net response is always logged in the console, so this is satisfactory for debugging purposes, just not realtime error handling. I think an error object would be good. JSON errors reporting to the console will unlikely be seen unless it is a defined exception being thrown, per spec. and I think at some point going forward we should probably introduce some kind of error object for XMLHttpRequest. One of the inconsistencies with browsers (including IE and WebKit) are how'when exceptions are being thrown when accessing different properties (getResponseHeader, statusText, etc.). The spec often says to fail gracefully and return null or an empty string, etc., while IE and WebKit tend to throw exceptions instead. Perhaps an XHR error object would be useful there. -- Anne van Kesteren http://annevankesteren.nl/ -- *Sencha* Jarred Nicholls, Senior Software Architect @jarrednicholls http://twitter.com/jarrednicholls
Re: [XHR] responseType json
I'd like to bring up an issue with the spec with regards to responseText + the new json responseType. Currently it is written that responseText should throw an exception if the responseType is not or text. I would argue that responseText should also return the plain text when the type is json. Take the scenario of debugging an application, or an application that has a Error Reporting feature; If XHR.response returns null, meaning the JSON payload was not successfully parsed and/or was invalid, there is no means to retrieve the plain text that caused the error. null is rather useless at that point. See my WebKit bug for more context: https://bugs.webkit.org/show_bug.cgi?id=73648 For legacy reasons, responseText and responseXML continue to work together despite the responseType that is set. In other words, a responseType of text still allows access to responseXML, and responseType of document still allows access to responseText. And it makes sense that this is so; if a strong-typed Document from responseXML is unable to be created, responseText is the fallback to get the payload and either debug it, submit it as an error report, etc. I would argue that json responseType would be more valuable if it behaved the same. Unlike the binary types (ArrayBuffer, Blob), json and document are backed by a plain text payload and therefore responseText has value in being accessible. If all we can get on a bad JSON response is null, I think there is little incentive for anyone to use the json type when they can use text and JSON.parse it themselves. Comments, questions, and flames are welcomed! Thanks, Jarred
Re: [XHR] responseType json
On Sun, 04 Dec 2011 21:38:53 +0100, Bjoern Hoehrmann derhoe...@gmx.net wrote: I did not reverse-engineer the current proposal, but my impression is it would handle text and json differently with respect to the Unicode signature. I do not think that would be particularily desirable if true. Thanks, fixed; that was an oversight: http://dvcs.w3.org/hg/xhr/rev/edfeab9138a4 http://dvcs.w3.org/hg/xhr/raw-file/tip/Overview.html#json-response-entity-body -- Anne van Kesteren http://annevankesteren.nl/
Re: [XHR] responseType json
On Fri, 02 Dec 2011 14:00:26 +0100, Anne van Kesteren ann...@opera.com wrote: I tied it to UTF-8 to further the fight on encoding proliferation and encourage developers to always use that encoding. FYI, I also tied it to ECMAScript's definition of JSON, which has some restrictions in place that the JSON RFC does not have. Given that ECMAScript thus far had the only platform-based implementation of JSON it made sense for XMLHttpRequest to follow that. -- Anne van Kesteren http://annevankesteren.nl/
Re: [XHR] responseType json
What do you mean by treat content that clearly is UTF-32 as UTF-16-encoded? Do you mean interpreting it as a sequence of unsigned shorts? That would be a direct violation of the semantics of UTF-32, would it not? I'm not advocating the use of UTF-32 for interchange, but it does have the advantage of being fixed length encoding covering the entirety of Unicode. On Sun, Dec 4, 2011 at 1:41 PM, Bjoern Hoehrmann derhoe...@gmx.net wrote: * Henri Sivonen wrote: Browsers don't support UTF-32. It has no use cases as an interchange encoding beyond writing evil test cases. Defining it as a valid encoding is reprehensible. If UTF-32 is bad, then it should be detected as such and be rejected. The current idea, from what I can tell, is to ignore UTF-32 exists, and treat content that clearly is UTF-32 as UTF-16-encoded, which is much worse, as some components are likely to actually detect UTF-32, they would disagree with other components, and that tends to cause strange bugs and security issues. Thankfully, that is not a problem in this particular case.
Re: [XHR] responseType json
* Glenn Adams wrote: What do you mean by treat content that clearly is UTF-32 as UTF-16-encoded? Do you mean interpreting it as a sequence of unsigned shorts? That would be a direct violation of the semantics of UTF-32, would it not? Consider you have ... Content-Type: example/example;charset=utf-32 FF FE 00 00 ... Some would like to treat this as UTF-16 encoded document starting with U+ after the Unicode signature, even though it clearly is UTF-32. -- Björn Höhrmann · mailto:bjo...@hoehrmann.de · http://bjoern.hoehrmann.de Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de 25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
Re: [XHR] responseType json
In the example you give, there is consistency between the content metadata (charset param) and the content itself (as determined by sniffing). So why would both the metadata and content be ignored? If there were an inconsistency (but there isn't) then [1] would apply, in which case the metadata can't be ignored without user consent. [1] http://www.w3.org/TR/webarch/#metadata-inconsistencies In any case, what is suggested below would be a direct violation of [2] as well. [2] http://www.w3.org/TR/charmod/#C030 On Mon, Dec 5, 2011 at 8:20 AM, Bjoern Hoehrmann derhoe...@gmx.net wrote: * Glenn Adams wrote: What do you mean by treat content that clearly is UTF-32 as UTF-16-encoded? Do you mean interpreting it as a sequence of unsigned shorts? That would be a direct violation of the semantics of UTF-32, would it not? Consider you have ... Content-Type: example/example;charset=utf-32 FF FE 00 00 ... Some would like to treat this as UTF-16 encoded document starting with U+ after the Unicode signature, even though it clearly is UTF-32. -- Björn Höhrmann · mailto:bjo...@hoehrmann.de · http://bjoern.hoehrmann.de Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de 25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
Re: [XHR] responseType json
On Sun, 4 Dec 2011, Bjoern Hoehrmann wrote: The fight here is for standards. The fight, if you want to characterise it as such, is for interoperability, not standards. Standards are just a tool we use today for that purpose. For these purposes, we can ignore UTF-32. It's poorly implemented if at all, it's hardly ever used, and it provides no useful benefits for transport. Anything we can do to steer people more towards UTF-8 is a win. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [XHR] responseType json
On Mon, Dec 5, 2011 at 11:12 AM, Glenn Adams gl...@skynav.com wrote: In the example you give, there is consistency between the content metadata (charset param) and the content itself (as determined by sniffing). So why would both the metadata and content be ignored? Because in the real world, UTF-32 isn't a transfer encoding. Browsers shouldn't have to waste time supporting it, and if someone accidentally creates content in that encoding somehow, it should be immediately clear that something is wrong. It would take a major disconnect from reality to insist that browsers support UTF-32. In any case, what is suggested below would be a direct violation of [2] as well. [2] http://www.w3.org/TR/charmod/#C030 No, it wouldn't. That doesn't say that UTF-32 must be recognized. -- Glenn Maynard
Re: [XHR] responseType json
On Mon, Dec 5, 2011 at 1:00 PM, Glenn Adams gl...@skynav.com wrote: [2] http://www.w3.org/TR/charmod/#C030 No, it wouldn't. That doesn't say that UTF-32 must be recognized. You misread me. I am not saying or supporting that UTF-32 must be recognized. I am saying that MIS-recognizing UTF-32 as UTF-16 violates [2]. It's impossible to violate that rule if the encoding isn't recognized. When an IANA-registered charset name *is recognized*; UTF-32 isn't recognized, so this is irrelevant. If a browser doesn't support UTF-32 as an incoming interchange format, then it should treat it as any other character encoding it does not recognize. It must not pretend it is another encoding. When an encoding is not recognized by the browser, the browser has full discretion in guessing the encoding. (See step 7 of http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#determining-the-character-encoding.) It's perfectly reasonable for UTF-32 data to be detected as UTF-16. For example, UTF-32 data is likely to contain null bytes when scanned bytewise, and UTF-16 is the only supported encoding where that's likely to happen. Steps 7 and 8 gives browsers unrestricted freedom in selecting the encoding when the previous steps are unable to do so; if they choose to include if the charset is declared as UTF-32, return UTF-16 as one of their autodetection rules, the spec allows it. -- Glenn Maynard
Re: [XHR] responseType json
Let me choose my words more carefully. A browser may recognize UTF-32 (e.g., in a sniffer) without supporting it (either internally or for transcoding into a different internal encoding). If the browser supports UTF-32, then step (2) of [1] applies. [1] http://dev.w3.org/html5/spec/Overview.html#determining-the-character-encoding But, if the browser does not support UTF-32, then the table in step (4) of [1] is supposed to apply, which would interpret the initial two bytes FF FE as UTF-16LE according to the current language of [1], and further, return a confidence level of certain. I see the problem now. It seems that the table in step (4) should be changed to interpret an initial FF FE as UTF-16BE only if the following two bytes are not 00. On Mon, Dec 5, 2011 at 11:45 AM, Glenn Maynard gl...@zewt.org wrote: On Mon, Dec 5, 2011 at 1:00 PM, Glenn Adams gl...@skynav.com wrote: [2] http://www.w3.org/TR/charmod/#C030 No, it wouldn't. That doesn't say that UTF-32 must be recognized. You misread me. I am not saying or supporting that UTF-32 must be recognized. I am saying that MIS-recognizing UTF-32 as UTF-16 violates [2]. It's impossible to violate that rule if the encoding isn't recognized. When an IANA-registered charset name *is recognized*; UTF-32 isn't recognized, so this is irrelevant. If a browser doesn't support UTF-32 as an incoming interchange format, then it should treat it as any other character encoding it does not recognize. It must not pretend it is another encoding. When an encoding is not recognized by the browser, the browser has full discretion in guessing the encoding. (See step 7 of http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#determining-the-character-encoding.) It's perfectly reasonable for UTF-32 data to be detected as UTF-16. For example, UTF-32 data is likely to contain null bytes when scanned bytewise, and UTF-16 is the only supported encoding where that's likely to happen. Steps 7 and 8 gives browsers unrestricted freedom in selecting the encoding when the previous steps are unable to do so; if they choose to include if the charset is declared as UTF-32, return UTF-16 as one of their autodetection rules, the spec allows it. -- Glenn Maynard
Re: [XHR] responseType json
On Mon, 5 Dec 2011, Glenn Adams wrote: I see the problem now. It seems that the table in step (4) should be changed to interpret an initial FF FE as UTF-16BE only if the following two bytes are not 00. The current text is intentional. UTF-32 is explicitly not supported by the HTML standard. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [XHR] responseType json
On Mon, Dec 5, 2011 at 3:15 PM, Glenn Adams gl...@skynav.com wrote: But, if the browser does not support UTF-32, then the table in step (4) of [1] is supposed to apply, which would interpret the initial two bytes FF FE as UTF-16LE according to the current language of [1], and further, return a confidence level of certain. I see the problem now. It seems that the table in step (4) should be changed to interpret an initial FF FE as UTF-16BE only if the following two bytes are not 00. That wouldn't actually bring browsers and the spec closer together; it would actually bring them further apart. At first glance, it looks like it makes the spec allow WebKit and IE's behavior, which (unfortunately) includes UTF-32 detection, by allowing them to fall through to step 7, where they're allowed to detect things however they want. However, that's ignoring step 5. If step 4 passes through, then step 5 would happen next. That means this carefully-constructed file would be detected as UTF-8 by step 5: http://zewt.org/~glenn/test-utf32-with-ascii-meta.html-no-encoding That's not what happens in any browser; FF detects it as UTF-16 and WebKit and IE detect it as UTF-32. This change would require it to be detected as UTF-8, which would have security implications if implemented, eg. a page outputting escaped user-inputted text in UTF-32 might contain a string like this, followed by a hostile script, when interpreted as UTF-8. This really isn't worth spending time on; you've free to press this if you like, but I'm moving on. -- Glenn Maynard
Re: [XHR] responseType json
The problem as I see it is that the current spec text for charset detection effectively *requires* a browser that does not support UTF-32 to explicitly ignore content metadata that may be correct (if it specifies UTF-32 as charset param), and further, to explicitly mis-label such content as UTF-16LE in the case that the first four bytes are FF FE 00 00. Indeed, the current algorithm requires mis-labelling such content as UTF-16LE with a confidence of certain. The current text is also ambiguous with respect to what support means in step (2) of Section 8.2.2.1 of [1]. Which of the following are meant by support? - recognize with sniffer - be capable of using directly as internal coding - be capable of transcoding to internal coding [1] http://dev.w3.org/html5/spec/Overview.html#determining-the-character-encoding On Mon, Dec 5, 2011 at 3:10 PM, Ian Hickson i...@hixie.ch wrote: On Mon, 5 Dec 2011, Glenn Adams wrote: I see the problem now. It seems that the table in step (4) should be changed to interpret an initial FF FE as UTF-16BE only if the following two bytes are not 00. The current text is intentional. UTF-32 is explicitly not supported by the HTML standard. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [XHR] responseType json
On Mon, 5 Dec 2011, Glenn Adams wrote: The problem as I see it is that the current spec text for charset detection effectively *requires* a browser that does not support UTF-32 to explicitly ignore content metadata that may be correct (if it specifies UTF-32 as charset param), and further, to explicitly mis-label such content as UTF-16LE in the case that the first four bytes are FF FE 00 00. Indeed, the current algorithm requires mis-labelling such content as UTF-16LE with a confidence of certain. Yes, this is explicitly intentional. The current text is also ambiguous with respect to what support means in step (2) of Section 8.2.2.1 of [1]. Which of the following are meant by support? To quote from the terminology section: The specification uses the term supported when referring to whether a user agent has an implementation capable of decoding the semantics of an external resource. - recognize with sniffer - be capable of using directly as internal coding - be capable of transcoding to internal coding I don't know how to distinguish the latter two in a black-box manner. Either of the latter two is a correct interpretation as far as I can tell. I suppose the current spec could be read such that the user agent could autodetect an unsupported encoding, but that wouldn't be very clever. I guess I can add some text to the spec to make that more obviously bad. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [XHR] responseType json
On Fri, 02 Dec 2011 17:03:37 +0100, Henri Sivonen hsivo...@iki.fi wrote: Does anyone actually transfer JSON as UTF-16? Note that you cannot transmit UTF-16 JSON from a page (sending text is UTF-8 only) so not being able to receive it either seems fine. I actually think that we should make text UTF-8-only too to enforce this symmetry. -- Anne van Kesteren http://annevankesteren.nl/
Re: [XHR] responseType json
On 2011-12-04 16:52, Anne van Kesteren wrote: On Fri, 02 Dec 2011 17:03:37 +0100, Henri Sivonen hsivo...@iki.fi wrote: Does anyone actually transfer JSON as UTF-16? Note that you cannot transmit UTF-16 JSON from a page (sending text is UTF-8 only) so not being able to receive it either seems fine. I actually think that we should make text UTF-8-only too to enforce this symmetry. I think you're confusing client abilities with server abilities. Just because XHR can't *send* Json as UTF-16 doesn't mean servers don't send it to clients. It appears this needs more research. Best regards, Julian
Re: [XHR] responseType json
* Anne van Kesteren wrote: I tied it to UTF-8 to further the fight on encoding proliferation and encourage developers to always use that encoding. The fight here is for standards. You know, you read the specification, create some content, and then that content works in all implementations that claim to implement the specification as you would assume based on reading the specification. You want to know how JSON content is handled by reading the JSON specification, and not the documentation for each and every JSON processors. That said, there are a number of media types by now that use the +json convention that's not actually defined anywhere authoriatively, and it is common to use other media types than application/json for JSON con- tent, like application/javascript, and the rules there vary. Should it be possible to use the UTF-8 Unicode Signature? Types differ on that and it seems likely that implementations do aswell. I did not reverse-engineer the current proposal, but my impression is it would handle text and json differently with respect to the Unicode signature. I do not think that would be particularily desirable if true. Anyway, given that it's difficult to tell which rules apply without some specification for +json and other things, I can't find much wrong with forcing the encoding to be UTF-8, especially because the other options that the JSON specification allows would result in a fatal error, which would be the same if implementations tried to detect the encoding, but then decided they do not support, say, UTF-32 encoded JSON. But it's not clear to me, that the Unicode signature should result in a fatal error, if you ignore what the JSON specification says about encodings. -- Björn Höhrmann · mailto:bjo...@hoehrmann.de · http://bjoern.hoehrmann.de Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de 25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
Re: [XHR] responseType json
* Henri Sivonen wrote: Browsers don't support UTF-32. It has no use cases as an interchange encoding beyond writing evil test cases. Defining it as a valid encoding is reprehensible. If UTF-32 is bad, then it should be detected as such and be rejected. The current idea, from what I can tell, is to ignore UTF-32 exists, and treat content that clearly is UTF-32 as UTF-16-encoded, which is much worse, as some components are likely to actually detect UTF-32, they would disagree with other components, and that tends to cause strange bugs and security issues. Thankfully, that is not a problem in this particular case. -- Björn Höhrmann · mailto:bjo...@hoehrmann.de · http://bjoern.hoehrmann.de Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de 25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
Re: [XHR] responseType json
On 2011-12-02 14:00, Anne van Kesteren wrote: I added a json responseType http://dvcs.w3.org/hg/xhr/raw-file/tip/Overview.html#the-responsetype-attribute and JSON response entity body description: http://dvcs.w3.org/hg/xhr/raw-file/tip/Overview.html#json-response-entity-body This is based on a proposal by Gecko from a while back. I tied it to UTF-8 to further the fight on encoding proliferation and encourage developers to always use that encoding. Well, it breaks legitimate JSON resources. What's the benefit? Best regards, Julian
Re: [XHR] responseType json
Le 2 déc. 2011 à 08:00, Anne van Kesteren a écrit : I tied it to UTF-8 to further the fight on encoding proliferation and encourage developers to always use that encoding. Do we have stats on what is currently done on the Web with regards to the encoding? -- Karl Dubost - http://dev.opera.com/ Developer Relations Tools, Opera Software
Re: [XHR] responseType json
On Dec 2, 2011, at 14:00 , Anne van Kesteren wrote: I tied it to UTF-8 to further the fight on encoding proliferation and encourage developers to always use that encoding. That's a good fight, but I think this is the wrong battlefield. IIRC (valid) JSON can only be in UTF-8,16,32 (with BE/LE variants) and all of those are detectable rather easily. The only thing this limitation is likely to bring is pain when dealing with resources outside one's control. -- Robin Berjon - http://berjon.com/ - @robinberjon
Re: [XHR] responseType json
On 2011-12-02 14:41, Robin Berjon wrote: On Dec 2, 2011, at 14:00 , Anne van Kesteren wrote: I tied it to UTF-8 to further the fight on encoding proliferation and encourage developers to always use that encoding. That's a good fight, but I think this is the wrong battlefield. IIRC (valid) JSON can only be in UTF-8,16,32 (with BE/LE variants) and all of those are detectable rather easily. The only thing this limitation is likely to bring is pain when dealing with resources outside one's control. If there's agreement that UTF-8 should be mandated for JSON then this should apply to *all* of JSON, not only this use case. Best regards, Julian
Re: [XHR] responseType json
On Fri, Dec 2, 2011 at 3:41 PM, Robin Berjon ro...@berjon.com wrote: On Dec 2, 2011, at 14:00 , Anne van Kesteren wrote: I tied it to UTF-8 to further the fight on encoding proliferation and encourage developers to always use that encoding. That's a good fight, but I think this is the wrong battlefield. IIRC (valid) JSON can only be in UTF-8,16,32 (with BE/LE variants) and all of those are detectable rather easily. The only thing this limitation is likely to bring is pain when dealing with resources outside one's control. Browsers don't support UTF-32. It has no use cases as an interchange encoding beyond writing evil test cases. Defining it as a valid encoding is reprehensible. Does anyone actually transfer JSON as UTF-16? -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/