Sent from my iPhone
On Jan 6, 2012, at 7:11 PM, Glenn Maynard <[email protected]> wrote: > On Fri, Jan 6, 2012 at 12:13 PM, Jarred Nicholls <[email protected]> wrote: > WebKit is used in many walled garden environments, so we consider these > scenarios, but as a secondary goal to our primary goal of being a standards > compliant browser engine. The point being, there will always be content > that's created solely for WebKit, so that's not a good argument to make. So > generally speaking, if someone is aiming to create content that's x-browser > compatible, they'll do just that and use the least common denominators. > > If you support UTF-16 here, then people will use it. That's always the > pattern on the web--one browser implements something extra, and everyone else > ends up having to implement it--whether or not it was a good idea--because > people accidentally started depending on it. I don't know why we have to > keep repeating this mistake. > > We're not adding anything here, it's a matter of complicating and "taking > away" from our decoder for one particular case. You're acting like we're > adding UTF-32 support for the first time. > > Of course you are; you're adding UTF-16 and UTF-32 support to the > responseType == "json" API. > > Also, since JSON uses zero-byte detection, which isn't used by HTML at all, > you'd still need code in your decoder to support that--which means you're > forcing everyone else to complicate *their* decoders with this special case. > > XHR's behavior, if the change I suggested is accepted, shouldn't require > special cases in a decoding layer. I'd have the decoder expose the final > encoding in use (which I'd expect to be available already), and when > .response is queried, return null if the final encoding used by the decoder > wasn't UTF-8. This means the decoding would still take place for other > encodings, but the end result would be discarded by XHR. This puts the > handling for this restriction within the XHR layer, rather than at the > decoder layer. That's why I'd like to see the spec changed to clarify the discarding if the encoding was supplied and isn't UTF-8. > > I said: > Also, I'm a bit confused. You talk about the rudimentary encoding > detection in the JSON spec (rfc4627 sec3), but you also mention HTTP > mechanisms (HTTP headers and overrideMimeType). These are separate > and unrelated. If you're using HTTP mechanisms, then the JSON spec > doesn't enter into it. If you're using both HTTP headers (HTTP) and > UTF-32 BOM detection (rfc4627), then you're using a strange mix of the > two. I can't tell what mechanism you're actually using. > > Correction: rfc4627 doesn't describe BOM detection, it describes zero-byte > detection. My question remains, though: what exactly are you doing? Do you > do zero-byte detection? Do you do BOM detection? What's the order of > precedence between zero-byte and/or BOM detection, HTTP Content-Type headers, > and overrideMimeType if they disagree? All of this would need to be > specified; currently none of it is. None of that matters if a specific codec is the one all be all. If that's the consensus then that's it, period. WebKit shares a single text decoder globally for HTML, XML, plain text, etc. the XHR payload runs through it before it would pass to JSON.parse. Read the code if you're interested. I would need to change the text decoder to skip BOM detection for this one case unless the spec added that wording of discarding when encoding != UTF-8, then that can be enforced all in XHR with no decoder changes. I don't want to get hung on explaining WebKit's specific impl. details. > > > "without breaking existing content" and yet killing UTF-16 and UTF-32 support > just for responseType "json" would break existing UTF-16 and UTF-32 JSON. > Well, which is it? > > This is a new feature; there isn't yet existing content using a responseType > of "json" to be broken. > > Don't get me wrong, I agree with pushing UTF-8 as the sole text encoding for > the web platform. But it's also plausible to push these restrictions not > just in one spot in XHR, but across the web platform > > I've yet to see a workable proposal to do this across the web platform, due > to backwards-compatibility. That's why it's being done more narrowly, where > it can be done without breaking existing pages. If you have any novel ideas > to do this across the platform, I guarantee everyone on the list would like > to hear them. Failing that, we should do what we can where we can. > > and also where the web platform defers to external specs (e.g. JSON). In > this particular case, an author will be more likely to just use responseText > + JSON.parse for content he/she cannot control - the content won't end up > changing and our initiative is circumvented. > > Of course not. It tells the developer that something's wrong, and he has the > choice of working around it or fixing his service. If just 25% of those > people make the right choice, this is a win. It also helps discourage new > services from being written using legacy encodings. We can't stop people > from doing the wrong thing, but that doesn't mean we shouldn't point people > in the right direction. > > This is an editor's draft of a spec, it's not a recommendation, so it's > hardly a violation of anything. > > This is the worst thing I've seen anyone say in here in a long time. Wtaf, why is everyone taking this point and driving it so out of context? I was trying to make a point that things change overnight...I've already explained and I won't do it again. Relax already, it's Friday! > > On Fri, Jan 6, 2012 at 12:25 PM, Julian Reschke <[email protected]> wrote: > One could argue that it isn't a race "to the bottom" when the component > accepts what is defined as valid (by the media type); and that the real > problem is that another spec tries to profile that. > > First off, it's common and perfectly normal for an API exposing features from > another spec to explicitly limit the allowed profile of that spec. Saying > "JSON through this API must be UTF-8" is perfectly OK. > > Second, this isn't an issue of the JSON spec at all. As described so far > (somewhat vaguely), his charset detection *isn't* what's described by > rfc4627, which only describes UTF-16 and UTF-32 zero-byte detection (and that > vaguely--it isn't even normative). Rather, it's also mixing in bits from > HTTP (the Content-Type header, which I assume is what was meant by "dictated > by the server" in the original message) and XHR (the overrideMimeType > method). None of that is defined by rfc4627, which makes WebKit's behavior > ad hoc, and none of this will be fixed by changes to rfc4627 (which obviously > shouldn't talk about HTTP headers). > > -- > Glenn Maynard >
