Re: Security bug in XmlHttpRequest, setRequestHeader()
On Fri, 06 Jan 2012 00:26:25 +0100, Hill, Brad bh...@paypal-inc.com wrote: As this behavior is at least partially formally documented in http://tools.ietf.org/html/rfc3875#section-4.1.18 , and very widely implemented, the algorithm for XHR should be updated to at least consider _, and possibly all non-alphanumeric characters, as equivalent to - for purposes of comparison to the blacklisted header set. We do not consider this to be an issue. (If it's an issue at all, it's an issue with those libraries.) http://lists.w3.org/Archives/Public/public-webapps/2009OctDec/thread.html#msg1349 -- Anne van Kesteren http://annevankesteren.nl/
Re: Security bug in XmlHttpRequest, setRequestHeader()
On 2012-01-06 09:49, Anne van Kesteren wrote: On Fri, 06 Jan 2012 00:26:25 +0100, Hill, Brad bh...@paypal-inc.com wrote: As this behavior is at least partially formally documented in http://tools.ietf.org/html/rfc3875#section-4.1.18 , and very widely implemented, the algorithm for XHR should be updated to at least consider _, and possibly all non-alphanumeric characters, as equivalent to - for purposes of comparison to the blacklisted header set. We do not consider this to be an issue. (If it's an issue at all, it's an issue with those libraries.) http://lists.w3.org/Archives/Public/public-webapps/2009OctDec/thread.html#msg1349 See also the thread starting http://lists.w3.org/Archives/Public/ietf-http-wg/2011OctDec/0317.html. If people are concerned by this, I'd recommend submitting an erratum for RFC 3050. Best regards, Julian
Re: [XHR] responseType json
On Mon, Dec 5, 2011 at 3:15 PM, Glenn Adams gl...@skynav.com wrote: But, if the browser does not support UTF-32, then the table in step (4) of [1] is supposed to apply, which would interpret the initial two bytes FF FE as UTF-16LE according to the current language of [1], and further, return a confidence level of certain. I see the problem now. It seems that the table in step (4) should be changed to interpret an initial FF FE as UTF-16BE only if the following two bytes are not 00. That wouldn't actually bring browsers and the spec closer together; it would actually bring them further apart. At first glance, it looks like it makes the spec allow WebKit and IE's behavior, which (unfortunately) includes UTF-32 detection, by allowing them to fall through to step 7, where they're allowed to detect things however they want. However, that's ignoring step 5. If step 4 passes through, then step 5 would happen next. That means this carefully-constructed file would be detected as UTF-8 by step 5: http://zewt.org/~glenn/test-utf32-with-ascii-meta.html-no-encoding That's not what happens in any browser; FF detects it as UTF-16 and WebKit and IE detect it as UTF-32. This change would require it to be detected as UTF-8, which would have security implications if implemented, eg. a page outputting escaped user-inputted text in UTF-32 might contain a string like this, followed by a hostile script, when interpreted as UTF-8. This really isn't worth spending time on; you've free to press this if you like, but I'm moving on. -- Glenn Maynard I'm getting responseType json landed in WebKit, and going to do so without the restriction of the JSON source being UTF-8. We default our decoding to UTF-8 if none is dictated by the server or overrideMIMEType(), but we also do BOM detection and will gracefully switch to UTF-16(BE/LE) or UTF-32(BE/LE) if the context is encoded as such, and accept the source as-is. It's a matter of having that perfect recipe of easiest implementation + most interoperability. It actually adds complication to our decoder if we do something special just for (perfectly legit) JSON payloads. I think keeping that UTF-8 bit in the spec is fine, but I don't think WebKit will be reducing our interoperability and complicating our code base. If we don't want JSON to be UTF-16 or UTF-32, let's change the JSON spec and the JSON grammar and JSON.parse will do the leg work. As someone else stated, this is a good fight but probably not the right battlefield.
Re: [XHR] responseType json
Please be careful with quote markers; you quoted text written by me as written by Glenn Adams. On Fri, Jan 6, 2012 at 10:00 AM, Jarred Nicholls jar...@webkit.org wrote: I'm getting responseType json landed in WebKit, and going to do so without the restriction of the JSON source being UTF-8. We default our decoding to UTF-8 if none is dictated by the server or overrideMIMEType(), but we also do BOM detection and will gracefully switch to UTF-16(BE/LE) or UTF-32(BE/LE) if the context is encoded as such, and accept the source as-is. It's a matter of having that perfect recipe of easiest implementation + most interoperability. It actually adds complication to our decoder if we Accepting content that other browsers don't will result in pages being created that work only in WebKit. That gives the least interoperability, not the most. If this behavior gets propagated into other browsers, that's even worse. Gecko doesn't support UTF-32, and adding it would be a huge step backwards. do something special just for (perfectly legit) JSON payloads. I think keeping that UTF-8 bit in the spec is fine, but I don't think WebKit will be reducing our interoperability and complicating our code base. If we don't want JSON to be UTF-16 or UTF-32, let's change the JSON spec and the JSON grammar and JSON.parse will do the leg work. Big -1 to perpetuating UTF-16 and UTF-32 due to braindamage in an IETF spec. Also, I'm a bit confused. You talk about the rudimentary encoding detection in the JSON spec (rfc4627 sec3), but you also mention HTTP mechanisms (HTTP headers and overrideMimeType). These are separate and unrelated. If you're using HTTP mechanisms, then the JSON spec doesn't enter into it. If you're using both HTTP headers (HTTP) and UTF-32 BOM detection (rfc4627), then you're using a strange mix of the two. I can't tell what mechanism you're actually using. As someone else stated, this is a good fight but probably not the right battlefield. Strongly disagree. Preventing legacy messes from being perpetuated into new APIs is one of the *only* battlefields available, where we can get people to stop using legacy encodings without breaking existing content. Anne: There's one related change I'd suggest. Currently, if a JSON response says Content-Encoding: application/json; charset=Shift_JIS, the explicit charset will be silently ignored and UTF-8 will be used. I think this should be explicitly rejected, returning null as the JSON response entity body. Don't decode as UTF-8 despite an explicitly conflicting header, or people will start sending bogus charset values without realizing it. -- Glenn Maynard
Re: [XHR] responseType json
On 2012-01-06 17:20, Glenn Maynard wrote: ... Big -1 to perpetuating UTF-16 and UTF-32 due to braindamage in an IETF spec. ... You seem to feel strongly about this (and I might agree for UTF-32). How about raising this issue in a place where there's an actual chance to cause changes? (- IETF apps-discuss) Best regards, Julian
Re: [XHR] responseType json
On 1/6/12 11:20 AM, Glenn Maynard wrote: Accepting content that other browsers don't will result in pages being created that work only in WebKit. That gives the least interoperability, not the most. I assume Jarred was talking about interoperability with content, not with other browsers. And thus start most races to the bottom in web-land -Boris
Re: [XHR] responseType json
On 2012-01-06 17:56, Boris Zbarsky wrote: On 1/6/12 11:20 AM, Glenn Maynard wrote: Accepting content that other browsers don't will result in pages being created that work only in WebKit. That gives the least interoperability, not the most. I assume Jarred was talking about interoperability with content, not with other browsers. And thus start most races to the bottom in web-land One could argue that it isn't a race to the bottom when the component accepts what is defined as valid (by the media type); and that the real problem is that another spec tries to profile that. Best regards, Julian
RE: Speech Recognition and Text-to-Speech Javascript API - seeking feedback for eventual standardization
The HTML Speech XG worked for over a year prioritizing use cases against timelines and packaged all of that into a recommendation complete with IDLs and examples. So while I understand that WebApps may not have the time to review the entirety of this work, it's hard to see how dissecting it would speed the process of understanding. Perhaps a better approach would be to find half an hour to present to select members of WebApps the content of the recommendation and the possible relevance to their group. Does that sound reasonable? Thanks From: Glen Shires [mailto:gshi...@google.com] Sent: Wednesday, January 04, 2012 11:15 PM To: public-webapps@w3.org Cc: public-xg-htmlspe...@w3.org; Arthur Barstow; Dan Burnett Subject: Speech Recognition and Text-to-Speech Javascript API - seeking feedback for eventual standardization As Dan Burnett wrote below: The HTML Speech Incubator Group [1] has recently wrapped up its work on use cases, requirements, and proposals for adding automatic speech recognition (ASR) and text-to-speech (TTS) capabilities to HTML. The work of the group is documented in the group's Final Report. [2] The members of the group intend this work to be input to one or more working groups, in W3C and/or other standards development organizations such as the IETF, as an aid to developing full standards in this space. Because that work was so broad, Art Barstow asked (below) for a relatively specific proposal. We at Google are proposing that a subset of it be accepted as a work item by the Web Applications WG. Specifically, we are proposing this Javascript API [3], which enables web developers to incorporate speech recognition and synthesis into their web pages. This simplified subset enables developers to use scripting to generate text-to-speech output and to use speech recognition as an input for forms, continuous dictation and control, and it supports the majority of use-cases in the Incubator Group's Final Report. We welcome your feedback and ask that the Web Applications WG consider accepting this Javascript API [3] as a work item. [1] charter: http://www.w3.org/2005/Incubator/htmlspeech/charter [2] report: http://www.w3.org/2005/Incubator/htmlspeech/XGR-htmlspeech/ [3] API: http://lists.w3.org/Archives/Public/public-webapps/2011OctDec/att-1696/s peechapi.html Bjorn Bringert Satish Sampath Glen Shires On Thu, Dec 22, 2011 at 11:38 AM, Glen Shires gshi...@google.com wrote: Milan, The IDLs contained in both documents are in the same format and order, so it's relatively easy to compare the two side http://www.w3.org/2005/Incubator/htmlspeech/XGR-htmlspeech-20111206/#sp eechreco-section -by-side http://lists.w3.org/Archives/Public/public-webapps/2011OctDec/att-1696/ speechapi.html#api_description . The semantics of the attributes, methods and events have not changed, and both IDLs link directly to the definitions contained in the Speech XG Final Report. As you mention, we agree that the protocol portions of the Speech XG Final Report are most appropriate for consideration by a group such as IETF, and believe such work can proceed independently, particularly because the Speech XG Final Report has provided a roadmap for these to remain compatible. Also, as shown in the Speech XG Final Report - Overview http://www.w3.org/2005/Incubator/htmlspeech/XGR-htmlspeech-20111206/#in troductory , the Speech Web API is not dependent on the Speech Protocol and a Default Speech service can be used for local or remote speech recognition and synthesis. Glen Shires On Thu, Dec 22, 2011 at 10:32 AM, Young, Milan milan.yo...@nuance.com wrote: Hello Glen, The proposal says that it contains a simplified subset of the JavaScript API. Could you please clarify which elements of the HTMLSpeech recommendation's JavaScript API were omitted? I think this would be the most efficient way for those of us familiar with the XG recommendation to evaluate the new proposal. I'd also appreciate clarification on how you see the protocol being handled. In the HTMLSpeech group we were thinking about this as a hand-in-hand relationship between W3C and IETF like WebSockets. Is this still your (and Google's) vision? Thanks From: Glen Shires [mailto:gshi...@google.com] Sent: Thursday, December 22, 2011 11:14 AM To: public-webapps@w3.org; Arthur Barstow Cc: public-xg-htmlspe...@w3.org; Dan Burnett Subject: Re: HTML Speech XG Completes, seeks feedback for eventual standardization We at Google believe that a scripting-only (Javascript) subset of the API defined in the Speech XG Incubator Group Final Report is of appropriate scope for consideration by the WebApps WG. The enclosed scripting-only subset supports the majority of the use-cases and samples in the XG proposal. Specifically, it enables web-pages to generate speech output and to use speech recognition as an input for forms, continuous dictation and control. The Javascript API will allow
Re: [XHR] responseType json
On Fri, Jan 6, 2012 at 11:20 AM, Glenn Maynard gl...@zewt.org wrote: Please be careful with quote markers; you quoted text written by me as written by Glenn Adams. Sorry, copying from the archives into Gmail is a pain. On Fri, Jan 6, 2012 at 10:00 AM, Jarred Nicholls jar...@webkit.org wrote: I'm getting responseType json landed in WebKit, and going to do so without the restriction of the JSON source being UTF-8. We default our decoding to UTF-8 if none is dictated by the server or overrideMIMEType(), but we also do BOM detection and will gracefully switch to UTF-16(BE/LE) or UTF-32(BE/LE) if the context is encoded as such, and accept the source as-is. It's a matter of having that perfect recipe of easiest implementation + most interoperability. It actually adds complication to our decoder if we Accepting content that other browsers don't will result in pages being created that work only in WebKit. WebKit is used in many walled garden environments, so we consider these scenarios, but as a secondary goal to our primary goal of being a standards compliant browser engine. The point being, there will always be content that's created solely for WebKit, so that's not a good argument to make. So generally speaking, if someone is aiming to create content that's x-browser compatible, they'll do just that and use the least common denominators. That gives the least interoperability, not the most. If this behavior gets propagated into other browsers, that's even worse. Gecko doesn't support UTF-32, and adding it would be a huge step backwards. We're not adding anything here, it's a matter of complicating and taking away from our decoder for one particular case. You're acting like we're adding UTF-32 support for the first time. do something special just for (perfectly legit) JSON payloads. I think keeping that UTF-8 bit in the spec is fine, but I don't think WebKit will be reducing our interoperability and complicating our code base. If we don't want JSON to be UTF-16 or UTF-32, let's change the JSON spec and the JSON grammar and JSON.parse will do the leg work. Big -1 to perpetuating UTF-16 and UTF-32 due to braindamage in an IETF spec. So let's change the IETF spec as well - are we even fighting that battle yet? Also, I'm a bit confused. You talk about the rudimentary encoding detection in the JSON spec (rfc4627 sec3), but you also mention HTTP mechanisms (HTTP headers and overrideMimeType). These are separate and unrelated. If you're using HTTP mechanisms, then the JSON spec doesn't enter into it. If you're using both HTTP headers (HTTP) and UTF-32 BOM detection (rfc4627), then you're using a strange mix of the two. I can't tell what mechanism you're actually using. As someone else stated, this is a good fight but probably not the right battlefield. Strongly disagree. Preventing legacy messes from being perpetuated into new APIs is one of the *only* battlefields available, where we can get people to stop using legacy encodings without breaking existing content. without breaking existing content and yet killing UTF-16 and UTF-32 support just for responseType json would break existing UTF-16 and UTF-32 JSON. Well, which is it? Don't get me wrong, I agree with pushing UTF-8 as the sole text encoding for the web platform. But it's also plausible to push these restrictions not just in one spot in XHR, but across the web platform and also where the web platform defers to external specs (e.g. JSON). In this particular case, an author will be more likely to just use responseText + JSON.parse for content he/she cannot control - the content won't end up changing and our initiative is circumvented. I suggest taking this initiative elsewhere (at least in parallel), i.e., getting RFC4627 to only support UTF-8 encoding if that's the larger picture. To say that a legit JSON source can be stored as any Unicode encoding but can only be transported as UTF-8 in this one particular XHR case is inconsistent and only leads to worse interoperability and confusion to those looking up these specs - if I go to JSON spec first, I'll see all those encodings are supported and wonder why it doesn't work in this one instance. Are we out to totally confuse the hell out of authors? Anne: There's one related change I'd suggest. Currently, if a JSON response says Content-Encoding: application/json; charset=Shift_JIS, the explicit charset will be silently ignored and UTF-8 will be used. I think this should be explicitly rejected, returning null as the JSON response entity body. Don't decode as UTF-8 despite an explicitly conflicting header, or people will start sending bogus charset values without realizing it. +1 -- Glenn Maynard
Re: [XHR] responseType json
On 1/6/12 12:13 PM, Jarred Nicholls wrote: WebKit is used in many walled garden environments, so we consider these scenarios, but as a secondary goal to our primary goal of being a standards compliant browser engine. The point being, there will always be content that's created solely for WebKit, so that's not a good argument to make. So generally speaking, if someone is aiming to create content that's x-browser compatible, they'll do just that and use the least common denominators. People never aim to create content that's cross-browser compatible per se, with a tiny minority of exceptions. People aim to create content that reaches users. What that means is that right now people are busy authoring webkit-only websites on the open web because they think that webkit is the only UA that will ever matter on mobile. And if you point out this assumption to these people, they will tell you right to your face that it's a perfectly justified assumption. The problem is bad enough that both Trident and Gecko have seriously considered implementing support for some subset of -webkit CSS properties. Note that people here includes divisions of Google. As a result, any time WebKit deviates from standards, that _will_ 100% guaranteed cause sites to be created that depend on those deviations; the other UAs then have the choice of not working on those sites or duplicating the deviations. We've seen all this before, circa 2001 or so. Maybe in this particular case it doesn't matter, and maybe the spec in this case should just change, but if so, please argue for that, as the rest of your mail does, not for the principle of shipping random spec violations just because you want to. In general if WebKit wants to do special webkitty things in walled gardens that's fine. Don't pollute the web with them if it can be avoided. Same thing applies to other UAs, obviously. -Boris
Re: [XHR] responseType json
On Fri, Jan 6, 2012 at 3:18 PM, Boris Zbarsky bzbar...@mit.edu wrote: On 1/6/12 12:13 PM, Jarred Nicholls wrote: WebKit is used in many walled garden environments, so we consider these scenarios, but as a secondary goal to our primary goal of being a standards compliant browser engine. The point being, there will always be content that's created solely for WebKit, so that's not a good argument to make. So generally speaking, if someone is aiming to create content that's x-browser compatible, they'll do just that and use the least common denominators. People never aim to create content that's cross-browser compatible per se, with a tiny minority of exceptions. People aim to create content that reaches users. What that means is that right now people are busy authoring webkit-only websites on the open web because they think that webkit is the only UA that will ever matter on mobile. And if you point out this assumption to these people, they will tell you right to your face that it's a perfectly justified assumption. The problem is bad enough that both Trident and Gecko have seriously considered implementing support for some subset of -webkit CSS properties. Note that people here includes divisions of Google. As a result, any time WebKit deviates from standards, that _will_ 100% guaranteed cause sites to be created that depend on those deviations; the other UAs then have the choice of not working on those sites or duplicating the deviations. We've seen all this before, circa 2001 or so. Maybe in this particular case it doesn't matter, and maybe the spec in this case should just change, but if so, please argue for that, as the rest of your mail does, not for the principle of shipping random spec violations just because you want to. I think my entire mail was quite clear that the spec is inconsistent with rfc4627 and perhaps that's where the changes need to happen, or else yield to it. Let's not be dogmatic here, I'm just pointing out the obvious disconnect. This is an editor's draft of a spec, it's not a recommendation, so it's hardly a violation of anything. This is a 2-way street, and often times it's the spec that needs to change, not the implementation. The point is, there needs to be a very compelling reason to breach the contract of a media type's existing spec that would yield inconsistent results from the rest of the web platform layers, and involve taking away functionality that is working perfectly fine and can handle all the legit content that's already out there (as rare as it might be). Let's get Crockford on our side, let him know there's a lot of support for banishing UTF-16 and UTF-32 forever and change rfc4627. In general if WebKit wants to do special webkitty things in walled gardens that's fine. Don't pollute the web with them if it can be avoided. Same thing applies to other UAs, obviously. IE and WebKit have gracefully handled UTF-32 for a long time in other parts of the platform, and despite it being an unsupported codec of the HTML spec, they've continued to do so. I've had nothing to do with this, so I'm not to be held responsible for its present perpetuation ;) My argument is focused around the JSON media type's spec, which blatantly contradicts. -Boris -- *Sencha* Jarred Nicholls, Senior Software Architect @jarrednicholls http://twitter.com/jarrednicholls
Re: [XHR] responseType json
On 01/06/2012 10:28 PM, Jarred Nicholls wrote: This is an editor's draft of a spec, it's not a recommendation, so it's hardly a violation of anything. With this kind of attitude, frankly, you shouldn't be implementing a spec. HTH Ms2ger
Re: Use Cases for Connectionless Push support in Webapps recharter
That is correct, the essential value in notification bearer flexibility is resource conservation and contextual adaptability (eg bearer selection when conditions warrant a change or limit choices). On Wednesday, January 4, 2012, Charles Pritchard ch...@jumis.com wrote: a) Don't drain the battery. b) Don't waste bandwidth. c) Don't use the more expensive connection when a less expensive connection is also available. On Jan 4, 2012, at 6:38 PM, Glenn Adams gl...@skynav.com wrote: what are the qualitative differences (if any) between these three use cases? On Tue, Jan 3, 2012 at 5:51 PM, Bryan Sullivan bls...@gmail.com wrote: I had an action item to provide some use cases for the Webapps recharter process, related to the Push based on extending server-sent events topic at the last F2F (draft API proposal that was presented: http://bkaj.net/w3c/eventsource-push.html). The intent of the action item was to establish a basis for a Webapps charter item related to extending eventsource (or coming up with a new API) for the ability to deliver arbitrary notifications/data to webapps via connectionless bearers, as informationally described in Server-Sent Events (http://dev.w3.org/html5/eventsource/). Here are three use cases: 1) One of Bob’s most-used apps is a social networking webapp which enables him to remain near-realtime connected to his friends and colleagues. During his busy social hours, when he’s out clubbing, his phone stays pretty much connected full time, with a constant stream of friend updates. He likes to remain just as connected though during less-busy times, for example during the workday as friends post their lunch plans or other updates randomly. While he wants his favorite app to remain ready to alert him, he doesn’t want the app to drain his battery just to remain connected during low-update periods. 2) Alice is a collector, and is continually watching or bidding in various online auctions. When auctions are about to close, she knows the activity can be fast and furious and is usually watching her auction webapp closely. But in the long slow hours between auction closings, she still likes for her webapp to alert her about bids and other auction updates as they happen, without delay. She needs for her auction webapp to enable her to continually watch multiple auctions without fear that its data usage during the slow periods will adversely impact her profits. 3) Bob uses a web based real-time communications service and he wants to be available to his friends and family even when his application is not running. Bob travels frequently and it is critical for him to optimize data usage and preserve battery. Bob’s friends can call him up to chat using video/audio or just text and he wants to make sure they can reach him irrespective of what device and what network he is connected at any given time. Comments/questions? -- Thanks, Bryan Sullivan -- Thanks, Bryan Sullivan
Re: [XHR] responseType json
On Fri, Jan 6, 2012 at 12:18 PM, Boris Zbarsky bzbar...@mit.edu wrote: On 1/6/12 12:13 PM, Jarred Nicholls wrote: WebKit is used in many walled garden environments, so we consider these scenarios, but as a secondary goal to our primary goal of being a standards compliant browser engine. The point being, there will always be content that's created solely for WebKit, so that's not a good argument to make. So generally speaking, if someone is aiming to create content that's x-browser compatible, they'll do just that and use the least common denominators. People never aim to create content that's cross-browser compatible per se, with a tiny minority of exceptions. People aim to create content that reaches users. What that means is that right now people are busy authoring webkit-only websites on the open web because they think that webkit is the only UA that will ever matter on mobile. And if you point out this assumption to these people, they will tell you right to your face that it's a perfectly justified assumption. The problem is bad enough that both Trident and Gecko have seriously considered implementing support for some subset of -webkit CSS properties. Note that people here includes divisions of Google. As a result, any time WebKit deviates from standards, that _will_ 100% guaranteed cause sites to be created that depend on those deviations; the other UAs then have the choice of not working on those sites or duplicating the deviations. We've seen all this before, circa 2001 or so. Maybe in this particular case it doesn't matter, and maybe the spec in this case should just change, but if so, please argue for that, as the rest of your mail does, not for the principle of shipping random spec violations just because you want to. In general if WebKit wants to do special webkitty things in walled gardens that's fine. Don't pollute the web with them if it can be avoided. Same thing applies to other UAs, obviously. I'm ambivalent about whether we should restrict to utf8 or not. On the one hand, having everyone on utf8 would greatly simplify the web. On the other hand, I can imagine this hurting download size for japanese/chinese websites (i.e. they'd want utf-16). I agree with Boris that we don't need to pollute the web if we want to expose this to WebKit's walled-garden environments. We have mechanisms for exposing things only to those environments specifically to avoid this problem. Lets keep this discussion focused on what's best for the web. We can make WebKit do the appropriate thing.
Re: [XHR] responseType json
On Fri, Jan 6, 2012 at 4:34 PM, Ms2ger ms2...@gmail.com wrote: On 01/06/2012 10:28 PM, Jarred Nicholls wrote: This is an editor's draft of a spec, it's not a recommendation, so it's hardly a violation of anything. With this kind of attitude, frankly, you shouldn't be implementing a spec. I resent that comment, because I'm one of the few that fight in WebKit to get us 100% spec compliant in XHR (don't even get me started with how many violations there are in Firefox, IE, and Opera...WebKit isn't the only one mind you), but that doesn't mean any spec addition, as fluid as it is in the early stages, is gospel. In this case I simply think it wasn't debated enough before going in - actually it wasn't debated at all, it was just placed in there and now I'm a bad guy for pointing out its disconnect? I think your attitude is far poorer. The web platform changes all the time - if this matter is sured up, then implementations will change accordingly. HTH Ms2ger
Re: [XHR] responseType json
* Jarred Nicholls wrote: This is an editor's draft of a spec, it's not a recommendation, so it's hardly a violation of anything. This is a 2-way street, and often times it's the spec that needs to change, not the implementation. The point is, there needs to be a very compelling reason to breach the contract of a media type's existing spec that would yield inconsistent results from the rest of the web platform layers, and involve taking away functionality that is working perfectly fine and can handle all the legit content that's already out there (as rare as it might be). You have yet to explain how you propose Webkit should behave, and it is rather unclear to me whether the proposed behavior is in line with the existing HTTP, MIME, and JSON specifications. A HTTP response with Content-Type: application/json;charset=iso-8859-15 for instance must not be treated as ISO-8859-15 encoded as there is no charset parameter for the application/json media type, and there is no other reason to treat it as ISO-8859-15, so it's either an error, or you silently ignore the unrecognized parameter. -- Björn Höhrmann · mailto:bjo...@hoehrmann.de · http://bjoern.hoehrmann.de Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de 25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
Re: [XHR] responseType json
On Fri, Jan 6, 2012 at 4:58 PM, Tab Atkins Jr. jackalm...@gmail.com wrote: Long experience shows that people who say things like I'm going to code against the Rec instead of the draft, because the Rec is more stable I know that's a common error, but I never said I was going against a Rec. My point was that the editor's draft is fluid enough that it can be debated and changed, as it's clearly not perfect at any point in time. Debating a change to it doesn't put anyone in the wrong, and certainly doesn't mean I'm violating it - because tomorrow, my proposed violation could be the current state of the spec. RFC4627, for example, is six years old. This was right about the beginning of the time when UTF-8 everywhere, dammit was really starting to gain hold as a reasonable solution to encoding hell. Crockford, as well, is not a browser dev, nor is he closely connected to browser devs in a capacity that would really inform him of why supporting multiple encodings on the web is so painful. So, looking to that RFC for guidance on current best-practice is not a good idea. This issue has been debated and argued over for a long time, far predating the current XHR bit. There's a reason why new file formats produced in connection with web stuff are utf8-only. It's good for the web if we're consistent about this. ~TJ
Re: [XHR] responseType json
On Fri, Jan 6, 2012 at 4:54 PM, Bjoern Hoehrmann derhoe...@gmx.net wrote: * Jarred Nicholls wrote: This is an editor's draft of a spec, it's not a recommendation, so it's hardly a violation of anything. This is a 2-way street, and often times it's the spec that needs to change, not the implementation. The point is, there needs to be a very compelling reason to breach the contract of a media type's existing spec that would yield inconsistent results from the rest of the web platform layers, and involve taking away functionality that is working perfectly fine and can handle all the legit content that's already out there (as rare as it might be). You have yet to explain how you propose Webkit should behave, and it is rather unclear to me whether the proposed behavior is in line with the existing HTTP, MIME, and JSON specifications. A HTTP response with Content-Type: application/json;charset=iso-8859-15 for instance must not be treated as ISO-8859-15 encoded as there is no charset parameter for the application/json media type, and there is no other reason to treat it as ISO-8859-15, so it's either an error, or you silently ignore the unrecognized parameter. I think the spec should clarify this. I agree with Glenn Maynard's proposal: if a server sends a specific charset to use that isn't UTF-8, we should explicitly reject it, never decode or parse the text and return null. Silently decoding in UTF-8 when the server or author is dictating something different could cause confusion. -- Björn Höhrmann · mailto:bjo...@hoehrmann.de · http://bjoern.hoehrmann.de Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de 25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ -- *Sencha* Jarred Nicholls, Senior Software Architect @jarrednicholls http://twitter.com/jarrednicholls
Re: [XHR] responseType json
On Fri, Jan 6, 2012 at 12:13 PM, Jarred Nicholls jar...@webkit.org wrote: WebKit is used in many walled garden environments, so we consider these scenarios, but as a secondary goal to our primary goal of being a standards compliant browser engine. The point being, there will always be content that's created solely for WebKit, so that's not a good argument to make. So generally speaking, if someone is aiming to create content that's x-browser compatible, they'll do just that and use the least common denominators. If you support UTF-16 here, then people will use it. That's always the pattern on the web--one browser implements something extra, and everyone else ends up having to implement it--whether or not it was a good idea--because people accidentally started depending on it. I don't know why we have to keep repeating this mistake. We're not adding anything here, it's a matter of complicating and taking away from our decoder for one particular case. You're acting like we're adding UTF-32 support for the first time. Of course you are; you're adding UTF-16 and UTF-32 support to the responseType == json API. Also, since JSON uses zero-byte detection, which isn't used by HTML at all, you'd still need code in your decoder to support that--which means you're forcing everyone else to complicate *their* decoders with this special case. XHR's behavior, if the change I suggested is accepted, shouldn't require special cases in a decoding layer. I'd have the decoder expose the final encoding in use (which I'd expect to be available already), and when .response is queried, return null if the final encoding used by the decoder wasn't UTF-8. This means the decoding would still take place for other encodings, but the end result would be discarded by XHR. This puts the handling for this restriction within the XHR layer, rather than at the decoder layer. I said: Also, I'm a bit confused. You talk about the rudimentary encoding detection in the JSON spec (rfc4627 sec3), but you also mention HTTP mechanisms (HTTP headers and overrideMimeType). These are separate and unrelated. If you're using HTTP mechanisms, then the JSON spec doesn't enter into it. If you're using both HTTP headers (HTTP) and UTF-32 BOM detection (rfc4627), then you're using a strange mix of the two. I can't tell what mechanism you're actually using. Correction: rfc4627 doesn't describe BOM detection, it describes zero-byte detection. My question remains, though: what exactly are you doing? Do you do zero-byte detection? Do you do BOM detection? What's the order of precedence between zero-byte and/or BOM detection, HTTP Content-Type headers, and overrideMimeType if they disagree? All of this would need to be specified; currently none of it is. without breaking existing content and yet killing UTF-16 and UTF-32 support just for responseType json would break existing UTF-16 and UTF-32 JSON. Well, which is it? This is a new feature; there isn't yet existing content using a responseType of json to be broken. Don't get me wrong, I agree with pushing UTF-8 as the sole text encoding for the web platform. But it's also plausible to push these restrictions not just in one spot in XHR, but across the web platform I've yet to see a workable proposal to do this across the web platform, due to backwards-compatibility. That's why it's being done more narrowly, where it can be done without breaking existing pages. If you have any novel ideas to do this across the platform, I guarantee everyone on the list would like to hear them. Failing that, we should do what we can where we can. and also where the web platform defers to external specs (e.g. JSON). In this particular case, an author will be more likely to just use responseText + JSON.parse for content he/she cannot control - the content won't end up changing and our initiative is circumvented. Of course not. It tells the developer that something's wrong, and he has the choice of working around it or fixing his service. If just 25% of those people make the right choice, this is a win. It also helps discourage new services from being written using legacy encodings. We can't stop people from doing the wrong thing, but that doesn't mean we shouldn't point people in the right direction. This is an editor's draft of a spec, it's not a recommendation, so it's hardly a violation of anything. This is the worst thing I've seen anyone say in here in a long time. On Fri, Jan 6, 2012 at 12:25 PM, Julian Reschke julian.resc...@gmx.dewrote: One could argue that it isn't a race to the bottom when the component accepts what is defined as valid (by the media type); and that the real problem is that another spec tries to profile that. First off, it's common and perfectly normal for an API exposing features from another spec to explicitly limit the allowed profile of that spec. Saying JSON through this API must be UTF-8 is perfectly OK. Second, this
Re: [XHR] responseType json
Sent from my iPhone On Jan 6, 2012, at 7:11 PM, Glenn Maynard gl...@zewt.org wrote: On Fri, Jan 6, 2012 at 12:13 PM, Jarred Nicholls jar...@webkit.org wrote: WebKit is used in many walled garden environments, so we consider these scenarios, but as a secondary goal to our primary goal of being a standards compliant browser engine. The point being, there will always be content that's created solely for WebKit, so that's not a good argument to make. So generally speaking, if someone is aiming to create content that's x-browser compatible, they'll do just that and use the least common denominators. If you support UTF-16 here, then people will use it. That's always the pattern on the web--one browser implements something extra, and everyone else ends up having to implement it--whether or not it was a good idea--because people accidentally started depending on it. I don't know why we have to keep repeating this mistake. We're not adding anything here, it's a matter of complicating and taking away from our decoder for one particular case. You're acting like we're adding UTF-32 support for the first time. Of course you are; you're adding UTF-16 and UTF-32 support to the responseType == json API. Also, since JSON uses zero-byte detection, which isn't used by HTML at all, you'd still need code in your decoder to support that--which means you're forcing everyone else to complicate *their* decoders with this special case. XHR's behavior, if the change I suggested is accepted, shouldn't require special cases in a decoding layer. I'd have the decoder expose the final encoding in use (which I'd expect to be available already), and when .response is queried, return null if the final encoding used by the decoder wasn't UTF-8. This means the decoding would still take place for other encodings, but the end result would be discarded by XHR. This puts the handling for this restriction within the XHR layer, rather than at the decoder layer. That's why I'd like to see the spec changed to clarify the discarding if the encoding was supplied and isn't UTF-8. I said: Also, I'm a bit confused. You talk about the rudimentary encoding detection in the JSON spec (rfc4627 sec3), but you also mention HTTP mechanisms (HTTP headers and overrideMimeType). These are separate and unrelated. If you're using HTTP mechanisms, then the JSON spec doesn't enter into it. If you're using both HTTP headers (HTTP) and UTF-32 BOM detection (rfc4627), then you're using a strange mix of the two. I can't tell what mechanism you're actually using. Correction: rfc4627 doesn't describe BOM detection, it describes zero-byte detection. My question remains, though: what exactly are you doing? Do you do zero-byte detection? Do you do BOM detection? What's the order of precedence between zero-byte and/or BOM detection, HTTP Content-Type headers, and overrideMimeType if they disagree? All of this would need to be specified; currently none of it is. None of that matters if a specific codec is the one all be all. If that's the consensus then that's it, period. WebKit shares a single text decoder globally for HTML, XML, plain text, etc. the XHR payload runs through it before it would pass to JSON.parse. Read the code if you're interested. I would need to change the text decoder to skip BOM detection for this one case unless the spec added that wording of discarding when encoding != UTF-8, then that can be enforced all in XHR with no decoder changes. I don't want to get hung on explaining WebKit's specific impl. details. without breaking existing content and yet killing UTF-16 and UTF-32 support just for responseType json would break existing UTF-16 and UTF-32 JSON. Well, which is it? This is a new feature; there isn't yet existing content using a responseType of json to be broken. Don't get me wrong, I agree with pushing UTF-8 as the sole text encoding for the web platform. But it's also plausible to push these restrictions not just in one spot in XHR, but across the web platform I've yet to see a workable proposal to do this across the web platform, due to backwards-compatibility. That's why it's being done more narrowly, where it can be done without breaking existing pages. If you have any novel ideas to do this across the platform, I guarantee everyone on the list would like to hear them. Failing that, we should do what we can where we can. and also where the web platform defers to external specs (e.g. JSON). In this particular case, an author will be more likely to just use responseText + JSON.parse for content he/she cannot control - the content won't end up changing and our initiative is circumvented. Of course not. It tells the developer that something's wrong, and he has the choice of working around it or fixing his service. If just 25% of those people make the right choice, this is a
Re: [XHR] responseType json
On Fri, Jan 6, 2012 at 7:36 PM, Jarred Nicholls jar...@webkit.org wrote: Correction: rfc4627 doesn't describe BOM detection, it describes zero-byte detection. My question remains, though: what exactly are you doing? Do you do zero-byte detection? Do you do BOM detection? What's the order of precedence between zero-byte and/or BOM detection, HTTP Content-Type headers, and overrideMimeType if they disagree? All of this would need to be specified; currently none of it is. None of that matters if a specific codec is the one all be all. If that's the consensus then that's it, period. WebKit shares a single text decoder globally for HTML, XML, plain text, etc. the XHR payload runs through it before it would pass to JSON.parse. Read the code if you're interested. I would need to change the text decoder to skip BOM detection for this one case unless the spec added that wording of discarding when encoding != UTF-8, then that can be enforced all in XHR with no decoder changes. I don't want to get hung on explaining WebKit's specific impl. details. All of the details I asked about are user-visible, not WebKit implementation details, and would need to be specified if encodings other than UTF-8 were allowed. I do think this should remain UTF-8 only, but if you want to discuss allowing other encodings, these are things that would need to be defined (which requires a clear proposal, not read the code). I assume it's not using the exact same decoder logic as HTML. After all, that would allow non-Unicode encodings. -- Glenn Maynard
Re: Pressing Enter in contenteditable: p or br or div?
BCC: whatwg, CC:public-webapps since discussion of the editing spec has moved I'm OK with this conclusion, but I still strongly prefer div to be the default single-line container name. Also, I'd really like the default single-line container name to be configurable in some way. Different apps have different needs and it's crappy for them to have to handle enter themselves just to get a different block type on enter. Something like document.execCommand(DefaultBlock, false, tagName). What values are valid for tagName are open to discussion. At a minimum, I'd want to see div, p and br. As one proof that this is valuable, the Closure editor supports these three with custom code and they are all used in different apps. I'm tempted to say that any block type should be allowed, but I'd be OK with starting with the tree above. For example, I could see a use-case for li if you wanted an editable widget that only contained a single list. Ojan On Mon, May 30, 2011 at 1:16 PM, Aryeh Gregor simetrical+...@gmail.comwrote: On Thu, May 12, 2011 at 4:28 PM, Aryeh Gregor simetrical+...@gmail.com wrote: Behavior for Enter in contenteditable in current browsers seems to be as follows: * IE9 wraps all lines in p (including if you start typing in an empty text box). If you hit Enter multiple times, it inserts empty ps. Shift-Enter inserts br. * Firefox 4.0 just uses br _moz_dirty= for Enter and Shift-Enter, always. (What's _moz_dirty for?) * Chrome 12 dev doesn't wrap a line when you start typing, but when you hit Enter it wraps the new line in a div. Hitting Enter multiple times outputs divbr/div, and Shift-Enter always inserts br. * Opera 11.10 wraps in p like IE, but for blank lines it uses pbr/p instead of just p/p (they render the same). What behavior do we want? I ended up going with the general approach of IE/Opera: http://aryeh.name/spec/editcommands/editcommands.html#additional-requirements It turns out WebKit and Opera make the insertParagraph command behave essentially like hitting Enter, so I actually wrote all the requirements there (IE's and Firefox's behavior for insertParagraph was very different and didn't seem useful): http://aryeh.name/spec/editcommands/editcommands.html#the-insertparagraph-command The basic idea is that if the cursor isn't wrapped in a single-line container (address, dd, div, dt, h*, li, p, pre) then the current line gets wrapped in a p. Then the current single-line container is split in two, mostly. Exceptions are roughly: * For pre and address, insert a br instead of splitting the element. (This matches Firefox for pre and address, and Opera for pre but not address. IE/Chrome make multiple pres/addresses.) * For an empty li/dt/dd, destroy it and break out of its container, so hitting Enter twice in a list breaks out of the list. (Everyone does this for li, only Firefox does for dt/dd.) * If the cursor is at the end of an h* element, make the new element a p instead of a header. (Everyone does this.) * If the cursor is at the end of a dd/dt element, it switches to dt/dd respectively. (Only Firefox does this, but it makes sense.) Like the rest of the spec, this is still a rough draft and I haven't tried to pin corner cases down yet, so it's probably not advisable to try implementing it yet as written. As always, you can see how the spec implementation behaves for various input by looking at autoimplementation.html: http://aryeh.name/spec/editcommands/autoimplementation.html#insertparagraph
Re: [XHR] responseType json
On Jan 6, 2012, at 8:10 PM, Glenn Maynard gl...@zewt.org wrote: On Fri, Jan 6, 2012 at 7:36 PM, Jarred Nicholls jar...@webkit.org wrote: Correction: rfc4627 doesn't describe BOM detection, it describes zero-byte detection. My question remains, though: what exactly are you doing? Do you do zero-byte detection? Do you do BOM detection? What's the order of precedence between zero-byte and/or BOM detection, HTTP Content-Type headers, and overrideMimeType if they disagree? All of this would need to be specified; currently none of it is. None of that matters if a specific codec is the one all be all. If that's the consensus then that's it, period. WebKit shares a single text decoder globally for HTML, XML, plain text, etc. the XHR payload runs through it before it would pass to JSON.parse. Read the code if you're interested. I would need to change the text decoder to skip BOM detection for this one case unless the spec added that wording of discarding when encoding != UTF-8, then that can be enforced all in XHR with no decoder changes. I don't want to get hung on explaining WebKit's specific impl. details. All of the details I asked about are user-visible, not WebKit implementation details, and would need to be specified if encodings other than UTF-8 were allowed. I do think this should remain UTF-8 only, but if you want to discuss allowing other encodings, these are things that would need to be defined (which requires a clear proposal, not read the code). Of course, I apologize I didn't mean it as a dismissal, I just figured if we are settled on one codec then I'd spare ourselves the time. I'm also mobile :) I could provide you those details if no decoding changes (enforcement) were done in WebKit, if you'd like. But since this is a new API, might as well just stick to UTF-8. I assume it's not using the exact same decoder logic as HTML. After all, that would allow non-Unicode encodings. Not exact, but close. For discussion's sake and in this context, you could call it the Unicode text decoder that does BOM detection and switches Unicode codecs automatically. For enforced UTF-8 I'd (have to) disable the BOM detection, but additionally could avoid decoding altogether if the specified encoding is not explicitly UTF-8 (and that was a part of the spec). We'll make it work either way :) -- Glenn Maynard
[editing] tab in an editable area WAS: [whatwg] behavior when typing in contentEditable elements
BCC: whatwg, CC:public-webapps since discussion of the editing spec has moved On Tue, Jun 14, 2011 at 12:54 PM, Aryeh Gregor simetrical+...@gmail.comwrote: You suggest that the tab key in browsers should act like indent, as in dedicated text editors. This isn't tenable -- it means that if you're using Tab to cycle through focusable elements on the page, as soon as it hits a contenteditable area it will get stuck and start doing something different. No current browser does this, for good reason. There are strong use-cases for both. In an app like Google Docs you certainly want tab to act like indent. In a mail app, it's more of a toss-up. In something like the Google+ sharing widget, you certainly want it to maintain normal web tabbing behavior. Anecdotally, gmail has an internal lab to enable document-like tabbing behavior and it is crazy popular. People gush over it. We should make this configurable via execCommand: document.execCommand(TabBehavior, false, bitmask); The bitmask is because you might want a different set of behaviors: -Tabbing in lists -Tabbing in table cells -Tabbing blockquotes -Tab in none of the above insert a tab -Tab in none of the above insert X spaces (X is controlled by the CSS tab-size property?) Ojan
Re: [XHR] responseType json
On Fri, Jan 6, 2012 at 1:45 PM, Jarred Nicholls jar...@webkit.org wrote: On Fri, Jan 6, 2012 at 4:34 PM, Ms2ger ms2...@gmail.com wrote: On 01/06/2012 10:28 PM, Jarred Nicholls wrote: This is an editor's draft of a spec, it's not a recommendation, so it's hardly a violation of anything. With this kind of attitude, frankly, you shouldn't be implementing a spec. I resent that comment, because I'm one of the few that fight in WebKit to get us 100% spec compliant in XHR (don't even get me started with how many violations there are in Firefox, IE, and Opera...WebKit isn't the only one mind you), but that doesn't mean any spec addition, as fluid as it is in the early stages, is gospel. In this case I simply think it wasn't debated enough before going in - actually it wasn't debated at all, it was just placed in there and now I'm a bad guy for pointing out its disconnect? I think your attitude is far poorer. The web platform changes all the time - if this matter is sured up, then implementations will change accordingly. While Ms2ger was a bit short, there's a reason. Long experience shows that people who say things like I'm going to code against the Rec instead of the draft, because the Rec is more stable often end up causing pain for everyone else, because that more stable Rec is also *more wrong*, precisely because stable means hasn't been updated to take into account new information or to fix bugs. This happens even for smaller differences - well-meaning devs coding to the Working Draft of a spec on /TR instead of the error-corrected Editor's Draft cause never-ending pain. Old RFCs are also often a source of pain, because we quite often find that the authors aren't fully versed in the complexities and subtleties of the public web. They may be operating from an academic or corporate standpoint, or otherwise be contained in a local experience-minimum that affects their view of what's reasonable. RFC4627, for example, is six years old. This was right about the beginning of the time when UTF-8 everywhere, dammit was really starting to gain hold as a reasonable solution to encoding hell. Crockford, as well, is not a browser dev, nor is he closely connected to browser devs in a capacity that would really inform him of why supporting multiple encodings on the web is so painful. So, looking to that RFC for guidance on current best-practice is not a good idea. This issue has been debated and argued over for a long time, far predating the current XHR bit. There's a reason why new file formats produced in connection with web stuff are utf8-only. It's good for the web if we're consistent about this. ~TJ
Re: [XHR] responseType json
On Fri, Jan 6, 2012 at 12:36 PM, Ojan Vafai o...@chromium.org wrote: I'm ambivalent about whether we should restrict to utf8 or not. On the one hand, having everyone on utf8 would greatly simplify the web. On the other hand, I can imagine this hurting download size for japanese/chinese websites (i.e. they'd want utf-16). Note that this may be subject to the same counter-intuitive forces that cause UTF-8 to usually be better for CJK HTML pages (because a lot of the source is ASCII markup). In JSON, all of the markup artifacts (braces, brackets, quotes, colon, commas, spaces) are ASCII, along with numbers, bools, and null. Only the contents of strings can be non-ascii. JSON is generally lighter on markup than XML-like languages, so the effect may not be as pronounced, but it shouldn't be dismissed without some study. At minimum, it will *reduce* the size difference between the two. ~TJ
Re: [XHR] responseType json
On Fri, Jan 6, 2012 at 4:45 PM, Tab Atkins Jr. jackalm...@gmail.com wrote: Note that this may be subject to the same counter-intuitive forces that cause UTF-8 to usually be better for CJK HTML pages (because a lot of the source is ASCII markup). In JSON, all of the markup artifacts (braces, brackets, quotes, colon, commas, spaces) are ASCII, along with numbers, bools, and null. Only the contents of strings can be non-ascii. JSON is generally lighter on markup than XML-like languages, so the effect may not be as pronounced, but it shouldn't be dismissed without some study. At minimum, it will *reduce* the size difference between the two. And more fundamentally, this is trying to repurpose charsets as a compression mechanism. If you want compression, use compression (Transfer-Encoding: gzip): -rw-rw-r-- 1 glenn glenn 7274 Jan 06 23:59 test-utf8.txt -rw-rw-r-- 1 glenn glenn 3672 Jan 06 23:59 test-utf8.txt.gz -rw-rw-r-- 1 glenn glenn 6150 Jan 06 23:59 test-utf16.txt -rw-rw-r-- 1 glenn glenn 3468 Jan 06 23:59 test-utf16.txt.gz The difference even without compression isn't enough to warrant the complexity (~15%), and with compression the difference is under 10%. (Test case is simply copying the rendered text from http://ja.wikipedia.org/wiki/%E3%83%A1%E3%82%A4%E3%83%B3%E3%83%9A%E3%83%BC%E3%82%B8in Firefox.) -- Glenn Maynard