Re: [XHR] responseType json

2012-01-07 Thread Anne van Kesteren

On Fri, 06 Jan 2012 17:20:04 +0100, Glenn Maynard gl...@zewt.org wrote:

Anne: There's one related change I'd suggest.  Currently, if a JSON
response says Content-Encoding: application/json; charset=Shift_JIS,
the explicit charset will be silently ignored and UTF-8 will be used.
I think this should be explicitly rejected, returning null as the JSON
response entity body.  Don't decode as UTF-8 despite an explicitly
conflicting header, or people will start sending bogus charset values
without realizing it.


I don't think there's a single media type parameter that causes fatal  
error handling so I do not think that would be a good idea. E.g.  
text/event-stream;charset=hz-gb-2312 will give you utf-8 decoding too.  
text/html;charset=foobar will give you whatever is the default in HTML,  
etc.



--
Anne van Kesteren
http://annevankesteren.nl/



Re: [XHR] responseType json

2012-01-07 Thread Julian Reschke

On 2012-01-06 22:58, Tab Atkins Jr. wrote:
 ...

RFC4627, for example, is six years old.  This was right about the
beginning of the time when UTF-8 everywhere, dammit was really
starting to gain hold as a reasonable solution to encoding hell.
Crockford, as well, is not a browser dev, nor is he closely connected
to browser devs in a capacity that would really inform him of why
supporting multiple encodings on the web is so painful.  So, looking
to that RFC for guidance on current best-practice is not a good idea.
...


This is misleading. RFC 4627 is *written* by Douglas, but not owned by 
him. There is a change procedure in place. If you really really believe 
something needs to change, use it. (First step would be to subsribe to 
IETF apps-discuss and explain the problem)


Best regards, Julian



Re: [XHR] responseType json

2012-01-07 Thread Julian Reschke

On 2012-01-07 10:48, Anne van Kesteren wrote:

On Fri, 06 Jan 2012 17:20:04 +0100, Glenn Maynard gl...@zewt.org wrote:

Anne: There's one related change I'd suggest. Currently, if a JSON
response says Content-Encoding: application/json; charset=Shift_JIS,
the explicit charset will be silently ignored and UTF-8 will be used.
I think this should be explicitly rejected, returning null as the JSON
response entity body. Don't decode as UTF-8 despite an explicitly
conflicting header, or people will start sending bogus charset values
without realizing it.


I don't think there's a single media type parameter that causes fatal
error handling so I do not think that would be a good idea. E.g.
text/event-stream;charset=hz-gb-2312 will give you utf-8 decoding too.
text/html;charset=foobar will give you whatever is the default in HTML,
etc.


charset is undefined on application/json, so ignoring it is the right thing.

text/event-stream;charset=hz-gb-2312 on the other hand is invalid (as 
far as I understand the spec), so if this defaults to UTF-8 this is just 
an effect of the specified error handling.


Best regards, Julian




Re: [XHR] responseType json

2012-01-07 Thread Anne van Kesteren
On Sat, 07 Jan 2012 11:30:42 +0100, Julian Reschke julian.resc...@gmx.de  
wrote:
charset is undefined on application/json, so ignoring it is the right  
thing.


text/event-stream;charset=hz-gb-2312 on the other hand is invalid (as  
far as I understand the spec), so if this defaults to UTF-8 this is just  
an effect of the specified error handling.


I guess so. FWIW, the theory that 'charset' is defined for certain media  
types and not for others is not necessarily implemented that way. E.g.  
XMLHttpRequest text decoding just searches for a 'charset' parameter  
regardless of what the media type is. Not sure if that is the only context  
in which implementations diverge from the theoretical model (the  
theoretical model is kind of impossible to work with for generic code).



--
Anne van Kesteren
http://annevankesteren.nl/



Re: [XHR] responseType json

2012-01-07 Thread Julian Reschke

On 2012-01-07 15:15, Anne van Kesteren wrote:

On Sat, 07 Jan 2012 11:30:42 +0100, Julian Reschke
julian.resc...@gmx.de wrote:

charset is undefined on application/json, so ignoring it is the right
thing.

text/event-stream;charset=hz-gb-2312 on the other hand is invalid (as
far as I understand the spec), so if this defaults to UTF-8 this is
just an effect of the specified error handling.


I guess so. FWIW, the theory that 'charset' is defined for certain media
types and not for others is not necessarily implemented that way. E.g.
XMLHttpRequest text decoding just searches for a 'charset' parameter
regardless of what the media type is. Not sure if that is the only
context in which implementations diverge from the theoretical model (the
theoretical model is kind of impossible to work with for generic code).


For text/* this is ok. For others, maybe not. It's still better than 
having to special-case each and every media type...


Best regards, Julian



Re: [XHR] responseType json

2012-01-07 Thread Anne van Kesteren
On Sat, 07 Jan 2012 02:55:15 +0100, Jarred Nicholls jar...@webkit.org  
wrote:
Not exact, but close.  For discussion's sake and in this context, you  
could call it the Unicode text decoder that does BOM detection and  
switches Unicode codecs automatically.  For enforced UTF-8 I'd (have to)  
disable the BOM detection, but additionally could avoid decoding  
altogether if the specified encoding is not explicitly UTF-8 (and that  
was a part of the spec).  We'll make it work either way :)


FYI, if WebKit cannot do pure UTF-8 decoding (i.e. ignoring everything  
else), WebKit has bugs in its server-sent events (EventSource), Web  
Workers, WebVTT, and Web Sockets implementation. Potentially more, I'm not  
sure if this list is still complete.



--
Anne van Kesteren
http://annevankesteren.nl/



Re: [XHR] responseType json

2012-01-06 Thread Jarred Nicholls
On Mon, Dec 5, 2011 at 3:15 PM, Glenn Adams gl...@skynav.com wrote:

 But, if the browser does not support UTF-32, then the table in step (4) of
  [1] is supposed to apply, which would interpret the initial two bytes FF
 FE
  as UTF-16LE according to the current language of [1], and further,
 return a
  confidence level of certain.
 
  I see the problem now. It seems that the table in step (4) should be
  changed to interpret an initial FF FE as UTF-16BE only if the following
 two
  bytes are not 00.
 


 That wouldn't actually bring browsers and the spec closer together; it
 would actually bring them further apart.


 At first glance, it looks like it makes the spec allow WebKit and IE's
 behavior, which (unfortunately) includes UTF-32 detection, by allowing them
 to fall through to step 7, where they're allowed to detect things however
 they want.


 However, that's ignoring step 5.  If step 4 passes through, then step 5
 would happen next.  That means this carefully-constructed file would be
 detected as UTF-8 by step 5:


 http://zewt.org/~glenn/test-utf32-with-ascii-meta.html-no-encoding


 That's not what happens in any browser; FF detects it as UTF-16 and WebKit
 and IE detect it as UTF-32.  This change would require it to be detected as
 UTF-8, which would have security implications if implemented, eg. a page
 outputting escaped user-inputted text in UTF-32 might contain a string like
 this, followed by a hostile script, when interpreted as UTF-8.


 This really isn't worth spending time on; you've free to press this if you
 like, but I'm moving on.


 --
 Glenn Maynard


I'm getting responseType json landed in WebKit, and going to do so
without the restriction of the JSON source being UTF-8.  We default our
decoding to UTF-8 if none is dictated by the server or overrideMIMEType(),
but we also do BOM detection and will gracefully switch to UTF-16(BE/LE) or
UTF-32(BE/LE) if the context is encoded as such, and accept the source
as-is.

It's a matter of having that perfect recipe of easiest implementation +
most interoperability.  It actually adds complication to our decoder if we
do something special just for (perfectly legit) JSON payloads.  I think
keeping that UTF-8 bit in the spec is fine, but I don't think WebKit will
be reducing our interoperability and complicating our code base.  If we
don't want JSON to be UTF-16 or UTF-32, let's change the JSON spec and the
JSON grammar and JSON.parse will do the leg work.  As someone else stated,
this is a good fight but probably not the right battlefield.


Re: [XHR] responseType json

2012-01-06 Thread Glenn Maynard
Please be careful with quote markers; you quoted text written by me as
written by Glenn Adams.

On Fri, Jan 6, 2012 at 10:00 AM, Jarred Nicholls jar...@webkit.org wrote:
 I'm getting responseType json landed in WebKit, and going to do so without
 the restriction of the JSON source being UTF-8.  We default our decoding to
 UTF-8 if none is dictated by the server or overrideMIMEType(), but we also
 do BOM detection and will gracefully switch to UTF-16(BE/LE) or
 UTF-32(BE/LE) if the context is encoded as such, and accept the source
 as-is.

 It's a matter of having that perfect recipe of easiest implementation +
 most interoperability.  It actually adds complication to our decoder if we

Accepting content that other browsers don't will result in pages being
created that work only in WebKit.  That gives the least
interoperability, not the most.

If this behavior gets propagated into other browsers, that's even
worse.  Gecko doesn't support UTF-32, and adding it would be a huge
step backwards.

 do something special just for (perfectly legit) JSON payloads.  I think
 keeping that UTF-8 bit in the spec is fine, but I don't think WebKit will be
 reducing our interoperability and complicating our code base.  If we don't
 want JSON to be UTF-16 or UTF-32, let's change the JSON spec and the JSON
 grammar and JSON.parse will do the leg work.

Big -1 to perpetuating UTF-16 and UTF-32 due to braindamage in an IETF spec.

Also, I'm a bit confused.  You talk about the rudimentary encoding
detection in the JSON spec (rfc4627 sec3), but you also mention HTTP
mechanisms (HTTP headers and overrideMimeType).  These are separate
and unrelated.  If you're using HTTP mechanisms, then the JSON spec
doesn't enter into it.  If you're using both HTTP headers (HTTP) and
UTF-32 BOM detection (rfc4627), then you're using a strange mix of the
two.  I can't tell what mechanism you're actually using.

 As someone else stated, this is a good fight but probably not the right 
 battlefield.

Strongly disagree.  Preventing legacy messes from being perpetuated
into new APIs is one of the *only* battlefields available, where we
can get people to stop using legacy encodings without breaking
existing content.

Anne: There's one related change I'd suggest.  Currently, if a JSON
response says Content-Encoding: application/json; charset=Shift_JIS,
the explicit charset will be silently ignored and UTF-8 will be used.
I think this should be explicitly rejected, returning null as the JSON
response entity body.  Don't decode as UTF-8 despite an explicitly
conflicting header, or people will start sending bogus charset values
without realizing it.

-- 
Glenn Maynard



Re: [XHR] responseType json

2012-01-06 Thread Julian Reschke

On 2012-01-06 17:20, Glenn Maynard wrote:
 ...

Big -1 to perpetuating UTF-16 and UTF-32 due to braindamage in an IETF spec.
...


You seem to feel strongly about this (and I might agree for UTF-32).

How about raising this issue in a place where there's an actual chance 
to cause changes? (- IETF apps-discuss)


Best regards, Julian



Re: [XHR] responseType json

2012-01-06 Thread Boris Zbarsky

On 1/6/12 11:20 AM, Glenn Maynard wrote:

Accepting content that other browsers don't will result in pages being
created that work only in WebKit.  That gives the least
interoperability, not the most.


I assume Jarred was talking about interoperability with content, not 
with other browsers.


And thus start most races to the bottom in web-land

-Boris



Re: [XHR] responseType json

2012-01-06 Thread Julian Reschke

On 2012-01-06 17:56, Boris Zbarsky wrote:

On 1/6/12 11:20 AM, Glenn Maynard wrote:

Accepting content that other browsers don't will result in pages being
created that work only in WebKit. That gives the least
interoperability, not the most.


I assume Jarred was talking about interoperability with content, not
with other browsers.

And thus start most races to the bottom in web-land


One could argue that it isn't a race to the bottom when the component 
accepts what is defined as valid (by the media type); and that the real 
problem is that another spec tries to profile that.


Best regards, Julian



Re: [XHR] responseType json

2012-01-06 Thread Jarred Nicholls
On Fri, Jan 6, 2012 at 11:20 AM, Glenn Maynard gl...@zewt.org wrote:

 Please be careful with quote markers; you quoted text written by me as
 written by Glenn Adams.


Sorry, copying from the archives into Gmail is a pain.



 On Fri, Jan 6, 2012 at 10:00 AM, Jarred Nicholls jar...@webkit.org
 wrote:
  I'm getting responseType json landed in WebKit, and going to do so
 without
  the restriction of the JSON source being UTF-8.  We default our decoding
 to
  UTF-8 if none is dictated by the server or overrideMIMEType(), but we
 also
  do BOM detection and will gracefully switch to UTF-16(BE/LE) or
  UTF-32(BE/LE) if the context is encoded as such, and accept the source
  as-is.
 
  It's a matter of having that perfect recipe of easiest implementation +
  most interoperability.  It actually adds complication to our decoder if
 we

 Accepting content that other browsers don't will result in pages being
 created that work only in WebKit.


WebKit is used in many walled garden environments, so we consider these
scenarios, but as a secondary goal to our primary goal of being a standards
compliant browser engine.  The point being, there will always be content
that's created solely for WebKit, so that's not a good argument to make.
 So generally speaking, if someone is aiming to create content that's
x-browser compatible, they'll do just that and use the least common
denominators.


  That gives the least
 interoperability, not the most.


 If this behavior gets propagated into other browsers, that's even
 worse.  Gecko doesn't support UTF-32, and adding it would be a huge
 step backwards.


We're not adding anything here, it's a matter of complicating and taking
away from our decoder for one particular case.  You're acting like we're
adding UTF-32 support for the first time.



  do something special just for (perfectly legit) JSON payloads.  I think
  keeping that UTF-8 bit in the spec is fine, but I don't think WebKit
 will be
  reducing our interoperability and complicating our code base.  If we
 don't
  want JSON to be UTF-16 or UTF-32, let's change the JSON spec and the JSON
  grammar and JSON.parse will do the leg work.

 Big -1 to perpetuating UTF-16 and UTF-32 due to braindamage in an IETF
 spec.


So let's change the IETF spec as well - are we even fighting that battle
yet?



 Also, I'm a bit confused.  You talk about the rudimentary encoding
 detection in the JSON spec (rfc4627 sec3), but you also mention HTTP
 mechanisms (HTTP headers and overrideMimeType).  These are separate
 and unrelated.  If you're using HTTP mechanisms, then the JSON spec
 doesn't enter into it.  If you're using both HTTP headers (HTTP) and
 UTF-32 BOM detection (rfc4627), then you're using a strange mix of the
 two.  I can't tell what mechanism you're actually using.






  As someone else stated, this is a good fight but probably not the right
 battlefield.

 Strongly disagree.  Preventing legacy messes from being perpetuated
 into new APIs is one of the *only* battlefields available, where we
 can get people to stop using legacy encodings without breaking
 existing content.


without breaking existing content and yet killing UTF-16 and UTF-32
support just for responseType json would break existing UTF-16 and UTF-32
JSON.  Well, which is it?

Don't get me wrong, I agree with pushing UTF-8 as the sole text encoding
for the web platform.  But it's also plausible to push these restrictions
not just in one spot in XHR, but across the web platform and also where the
web platform defers to external specs (e.g. JSON).  In this particular
case, an author will be more likely to just use responseText + JSON.parse
for content he/she cannot control - the content won't end up changing and
our initiative is circumvented.

I suggest taking this initiative elsewhere (at least in parallel), i.e.,
getting RFC4627 to only support UTF-8 encoding if that's the larger
picture.  To say that a legit JSON source can be stored as any Unicode
encoding but can only be transported as UTF-8 in this one particular XHR
case is inconsistent and only leads to worse interoperability and confusion
to those looking up these specs - if I go to JSON spec first, I'll see all
those encodings are supported and wonder why it doesn't work in this one
instance.  Are we out to totally confuse the hell out of authors?



 Anne: There's one related change I'd suggest.  Currently, if a JSON
 response says Content-Encoding: application/json; charset=Shift_JIS,
 the explicit charset will be silently ignored and UTF-8 will be used.
 I think this should be explicitly rejected, returning null as the JSON
 response entity body.  Don't decode as UTF-8 despite an explicitly
 conflicting header, or people will start sending bogus charset values
 without realizing it.


+1


 --
 Glenn Maynard


Re: [XHR] responseType json

2012-01-06 Thread Boris Zbarsky

On 1/6/12 12:13 PM, Jarred Nicholls wrote:

WebKit is used in many walled garden environments, so we consider these
scenarios, but as a secondary goal to our primary goal of being a
standards compliant browser engine.  The point being, there will always
be content that's created solely for WebKit, so that's not a good
argument to make.  So generally speaking, if someone is aiming to create
content that's x-browser compatible, they'll do just that and use the
least common denominators.


People never aim to create content that's cross-browser compatible per 
se, with a tiny minority of exceptions.


People aim to create content that reaches users.

What that means is that right now people are busy authoring webkit-only 
websites on the open web because they think that webkit is the only UA 
that will ever matter on mobile.  And if you point out this assumption 
to these people, they will tell you right to your face that it's a 
perfectly justified assumption.  The problem is bad enough that both 
Trident and Gecko have seriously considered implementing support for 
some subset of -webkit CSS properties.  Note that people here includes 
divisions of Google.


As a result, any time WebKit deviates from standards, that _will_ 100% 
guaranteed cause sites to be created that depend on those deviations; 
the other UAs then have the choice of not working on those sites or 
duplicating the deviations.


We've seen all this before, circa 2001 or so.

Maybe in this particular case it doesn't matter, and maybe the spec in 
this case should just change, but if so, please argue for that, as the 
rest of your mail does, not for the principle of shipping random spec 
violations just because you want to.   In general if WebKit wants to do 
special webkitty things in walled gardens that's fine.  Don't pollute 
the web with them if it can be avoided.  Same thing applies to other 
UAs, obviously.


-Boris



Re: [XHR] responseType json

2012-01-06 Thread Jarred Nicholls
On Fri, Jan 6, 2012 at 3:18 PM, Boris Zbarsky bzbar...@mit.edu wrote:

 On 1/6/12 12:13 PM, Jarred Nicholls wrote:

 WebKit is used in many walled garden environments, so we consider these
 scenarios, but as a secondary goal to our primary goal of being a
 standards compliant browser engine.  The point being, there will always
 be content that's created solely for WebKit, so that's not a good
 argument to make.  So generally speaking, if someone is aiming to create
 content that's x-browser compatible, they'll do just that and use the
 least common denominators.


 People never aim to create content that's cross-browser compatible per se,
 with a tiny minority of exceptions.

 People aim to create content that reaches users.

 What that means is that right now people are busy authoring webkit-only
 websites on the open web because they think that webkit is the only UA that
 will ever matter on mobile.  And if you point out this assumption to these
 people, they will tell you right to your face that it's a perfectly
 justified assumption.  The problem is bad enough that both Trident and
 Gecko have seriously considered implementing support for some subset of
 -webkit CSS properties.  Note that people here includes divisions of
 Google.

 As a result, any time WebKit deviates from standards, that _will_ 100%
 guaranteed cause sites to be created that depend on those deviations; the
 other UAs then have the choice of not working on those sites or duplicating
 the deviations.

 We've seen all this before, circa 2001 or so.

 Maybe in this particular case it doesn't matter, and maybe the spec in
 this case should just change, but if so, please argue for that, as the rest
 of your mail does, not for the principle of shipping random spec violations
 just because you want to.


I think my entire mail was quite clear that the spec is inconsistent with
rfc4627 and perhaps that's where the changes need to happen, or else yield
to it.  Let's not be dogmatic here, I'm just pointing out the obvious
disconnect.

This is an editor's draft of a spec, it's not a recommendation, so it's
hardly a violation of anything.  This is a 2-way street, and often times
it's the spec that needs to change, not the implementation.  The point is,
there needs to be a very compelling reason to breach the contract of a
media type's existing spec that would yield inconsistent results from the
rest of the web platform layers, and involve taking away functionality that
is working perfectly fine and can handle all the legit content that's
already out there (as rare as it might be).

Let's get Crockford on our side, let him know there's a lot of support for
banishing UTF-16 and UTF-32 forever and change rfc4627.


   In general if WebKit wants to do special webkitty things in walled
 gardens that's fine.  Don't pollute the web with them if it can be avoided.
  Same thing applies to other UAs, obviously.


IE and WebKit have gracefully handled UTF-32 for a long time in other parts
of the platform, and despite it being an unsupported codec of the HTML
spec, they've continued to do so.  I've had nothing to do with this, so I'm
not to be held responsible for its present perpetuation ;)  My argument is
focused around the JSON media type's spec, which blatantly contradicts.




 -Boris




-- 


*Sencha*
Jarred Nicholls, Senior Software Architect
@jarrednicholls
http://twitter.com/jarrednicholls


Re: [XHR] responseType json

2012-01-06 Thread Ms2ger

On 01/06/2012 10:28 PM, Jarred Nicholls wrote:

This is an editor's draft of a spec, it's not a recommendation, so it's
hardly a violation of anything.


With this kind of attitude, frankly, you shouldn't be implementing a spec.

HTH
Ms2ger




Re: [XHR] responseType json

2012-01-06 Thread Ojan Vafai
On Fri, Jan 6, 2012 at 12:18 PM, Boris Zbarsky bzbar...@mit.edu wrote:

 On 1/6/12 12:13 PM, Jarred Nicholls wrote:

 WebKit is used in many walled garden environments, so we consider these
 scenarios, but as a secondary goal to our primary goal of being a
 standards compliant browser engine.  The point being, there will always
 be content that's created solely for WebKit, so that's not a good
 argument to make.  So generally speaking, if someone is aiming to create
 content that's x-browser compatible, they'll do just that and use the
 least common denominators.


 People never aim to create content that's cross-browser compatible per se,
 with a tiny minority of exceptions.

 People aim to create content that reaches users.

 What that means is that right now people are busy authoring webkit-only
 websites on the open web because they think that webkit is the only UA that
 will ever matter on mobile.  And if you point out this assumption to these
 people, they will tell you right to your face that it's a perfectly
 justified assumption.  The problem is bad enough that both Trident and
 Gecko have seriously considered implementing support for some subset of
 -webkit CSS properties.  Note that people here includes divisions of
 Google.

 As a result, any time WebKit deviates from standards, that _will_ 100%
 guaranteed cause sites to be created that depend on those deviations; the
 other UAs then have the choice of not working on those sites or duplicating
 the deviations.

 We've seen all this before, circa 2001 or so.

 Maybe in this particular case it doesn't matter, and maybe the spec in
 this case should just change, but if so, please argue for that, as the rest
 of your mail does, not for the principle of shipping random spec violations
 just because you want to.   In general if WebKit wants to do special
 webkitty things in walled gardens that's fine.  Don't pollute the web with
 them if it can be avoided.  Same thing applies to other UAs, obviously.


I'm ambivalent about whether we should restrict to utf8 or not. On the one
hand, having everyone on utf8 would greatly simplify the web. On the other
hand, I can imagine this hurting download size for japanese/chinese
websites (i.e. they'd want utf-16).

I agree with Boris that we don't need to pollute the web if we want to
expose this to WebKit's walled-garden environments. We have mechanisms for
exposing things only to those environments specifically to avoid this
problem. Lets keep this discussion focused on what's best for the web. We
can make WebKit do the appropriate thing.


Re: [XHR] responseType json

2012-01-06 Thread Jarred Nicholls
On Fri, Jan 6, 2012 at 4:34 PM, Ms2ger ms2...@gmail.com wrote:

 On 01/06/2012 10:28 PM, Jarred Nicholls wrote:

 This is an editor's draft of a spec, it's not a recommendation, so it's
 hardly a violation of anything.


 With this kind of attitude, frankly, you shouldn't be implementing a spec.


I resent that comment, because I'm one of the few that fight in WebKit to
get us 100% spec compliant in XHR (don't even get me started with how many
violations there are in Firefox, IE, and Opera...WebKit isn't the only
one mind you), but that doesn't mean any spec addition, as fluid as it is
in the early stages, is gospel.  In this case I simply think it wasn't
debated enough before going in - actually it wasn't debated at all, it was
just placed in there and now I'm a bad guy for pointing out its disconnect?
 I think your attitude is far poorer.

The web platform changes all the time - if this matter is sured up, then
implementations will change accordingly.



 HTH
 Ms2ger





Re: [XHR] responseType json

2012-01-06 Thread Bjoern Hoehrmann
* Jarred Nicholls wrote:
This is an editor's draft of a spec, it's not a recommendation, so it's
hardly a violation of anything.  This is a 2-way street, and often times
it's the spec that needs to change, not the implementation.  The point is,
there needs to be a very compelling reason to breach the contract of a
media type's existing spec that would yield inconsistent results from the
rest of the web platform layers, and involve taking away functionality that
is working perfectly fine and can handle all the legit content that's
already out there (as rare as it might be).

You have yet to explain how you propose Webkit should behave, and it is
rather unclear to me whether the proposed behavior is in line with the
existing HTTP, MIME, and JSON specifications. A HTTP response with

  Content-Type: application/json;charset=iso-8859-15

for instance must not be treated as ISO-8859-15 encoded as there is no
charset parameter for the application/json media type, and there is no
other reason to treat it as ISO-8859-15, so it's either an error, or
you silently ignore the unrecognized parameter.
-- 
Björn Höhrmann · mailto:bjo...@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 



Re: [XHR] responseType json

2012-01-06 Thread Jarred Nicholls
On Fri, Jan 6, 2012 at 4:58 PM, Tab Atkins Jr. jackalm...@gmail.com wrote:

 Long experience shows that people who say things like I'm going to code
 against the Rec
 instead of the draft, because the Rec is more stable


I know that's a common error, but I never said I was going against a Rec.
 My point was that the editor's draft is fluid enough that it can be
debated and changed, as it's clearly not perfect at any point in time.
 Debating a change to it doesn't put anyone in the wrong, and certainly
doesn't mean I'm violating it - because tomorrow, my proposed violation
could be the current state of the spec.



 RFC4627, for example, is six years old.  This was right about the
 beginning of the time when UTF-8 everywhere, dammit was really
 starting to gain hold as a reasonable solution to encoding hell.
 Crockford, as well, is not a browser dev, nor is he closely connected
 to browser devs in a capacity that would really inform him of why
 supporting multiple encodings on the web is so painful.  So, looking
 to that RFC for guidance on current best-practice is not a good idea.

 This issue has been debated and argued over for a long time, far
 predating the current XHR bit.  There's a reason why new file formats
 produced in connection with web stuff are utf8-only.  It's good for
 the web if we're consistent about this.


 ~TJ



Re: [XHR] responseType json

2012-01-06 Thread Jarred Nicholls
On Fri, Jan 6, 2012 at 4:54 PM, Bjoern Hoehrmann derhoe...@gmx.net wrote:

 * Jarred Nicholls wrote:
 This is an editor's draft of a spec, it's not a recommendation, so it's
 hardly a violation of anything.  This is a 2-way street, and often times
 it's the spec that needs to change, not the implementation.  The point is,
 there needs to be a very compelling reason to breach the contract of a
 media type's existing spec that would yield inconsistent results from the
 rest of the web platform layers, and involve taking away functionality
 that
 is working perfectly fine and can handle all the legit content that's
 already out there (as rare as it might be).

 You have yet to explain how you propose Webkit should behave, and it is
 rather unclear to me whether the proposed behavior is in line with the
 existing HTTP, MIME, and JSON specifications. A HTTP response with

  Content-Type: application/json;charset=iso-8859-15

 for instance must not be treated as ISO-8859-15 encoded as there is no
 charset parameter for the application/json media type, and there is no
 other reason to treat it as ISO-8859-15, so it's either an error, or
 you silently ignore the unrecognized parameter.


I think the spec should clarify this.  I agree with Glenn Maynard's
proposal: if a server sends a specific charset to use that isn't UTF-8, we
should explicitly reject it, never decode or parse the text and return
null.  Silently decoding in UTF-8 when the server or author is dictating
something different could cause confusion.


 --
 Björn Höhrmann · mailto:bjo...@hoehrmann.de · http://bjoern.hoehrmann.de
 Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
 25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/




-- 


*Sencha*
Jarred Nicholls, Senior Software Architect
@jarrednicholls
http://twitter.com/jarrednicholls


Re: [XHR] responseType json

2012-01-06 Thread Glenn Maynard
On Fri, Jan 6, 2012 at 12:13 PM, Jarred Nicholls jar...@webkit.org wrote:

 WebKit is used in many walled garden environments, so we consider these
 scenarios, but as a secondary goal to our primary goal of being a standards
 compliant browser engine.  The point being, there will always be content
 that's created solely for WebKit, so that's not a good argument to make.
  So generally speaking, if someone is aiming to create content that's
 x-browser compatible, they'll do just that and use the least common
 denominators.


If you support UTF-16 here, then people will use it.  That's always the
pattern on the web--one browser implements something extra, and everyone
else ends up having to implement it--whether or not it was a good
idea--because people accidentally started depending on it.  I don't know
why we have to keep repeating this mistake.

We're not adding anything here, it's a matter of complicating and taking
 away from our decoder for one particular case.  You're acting like we're
 adding UTF-32 support for the first time.


Of course you are; you're adding UTF-16 and UTF-32 support to the
responseType == json API.

Also, since JSON uses zero-byte detection, which isn't used by HTML at all,
you'd still need code in your decoder to support that--which means you're
forcing everyone else to complicate *their* decoders with this special case.

XHR's behavior, if the change I suggested is accepted, shouldn't require
special cases in a decoding layer.  I'd have the decoder expose the final
encoding in use (which I'd expect to be available already), and when
.response is queried, return null if the final encoding used by the decoder
wasn't UTF-8.  This means the decoding would still take place for other
encodings, but the end result would be discarded by XHR.  This puts the
handling for this restriction within the XHR layer, rather than at the
decoder layer.

I said:

  Also, I'm a bit confused.  You talk about the rudimentary encoding
 detection in the JSON spec (rfc4627 sec3), but you also mention HTTP
 mechanisms (HTTP headers and overrideMimeType).  These are separate
 and unrelated.  If you're using HTTP mechanisms, then the JSON spec
 doesn't enter into it.  If you're using both HTTP headers (HTTP) and
 UTF-32 BOM detection (rfc4627), then you're using a strange mix of the
 two.  I can't tell what mechanism you're actually using.


Correction: rfc4627 doesn't describe BOM detection, it describes zero-byte
detection.  My question remains, though: what exactly are you doing?  Do
you do zero-byte detection?  Do you do BOM detection?  What's the order of
precedence between zero-byte and/or BOM detection, HTTP Content-Type
headers, and overrideMimeType if they disagree?  All of this would need to
be specified; currently none of it is.



 without breaking existing content and yet killing UTF-16 and UTF-32
 support just for responseType json would break existing UTF-16 and UTF-32
 JSON.  Well, which is it?


This is a new feature; there isn't yet existing content using a
responseType of json to be broken.

 Don't get me wrong, I agree with pushing UTF-8 as the sole text encoding
 for the web platform.  But it's also plausible to push these restrictions
 not just in one spot in XHR, but across the web platform


I've yet to see a workable proposal to do this across the web platform, due
to backwards-compatibility.  That's why it's being done more narrowly,
where it can be done without breaking existing pages.  If you have any
novel ideas to do this across the platform, I guarantee everyone on the
list would like to hear them.  Failing that, we should do what we can where
we can.

and also where the web platform defers to external specs (e.g. JSON).  In
 this particular case, an author will be more likely to just use
 responseText + JSON.parse for content he/she cannot control - the content
 won't end up changing and our initiative is circumvented.


Of course not.  It tells the developer that something's wrong, and he has
the choice of working around it or fixing his service.  If just 25% of
those people make the right choice, this is a win.  It also helps
discourage new services from being written using legacy encodings.  We
can't stop people from doing the wrong thing, but that doesn't mean we
shouldn't point people in the right direction.

This is an editor's draft of a spec, it's not a recommendation, so it's
 hardly a violation of anything.


This is the worst thing I've seen anyone say in here in a long time.

On Fri, Jan 6, 2012 at 12:25 PM, Julian Reschke julian.resc...@gmx.dewrote:

 One could argue that it isn't a race to the bottom when the component
 accepts what is defined as valid (by the media type); and that the real
 problem is that another spec tries to profile that.


First off, it's common and perfectly normal for an API exposing features
from another spec to explicitly limit the allowed profile of that spec.
Saying JSON through this API must be UTF-8 is perfectly OK.

Second, this 

Re: [XHR] responseType json

2012-01-06 Thread Jarred Nicholls


Sent from my iPhone

On Jan 6, 2012, at 7:11 PM, Glenn Maynard gl...@zewt.org wrote:

 On Fri, Jan 6, 2012 at 12:13 PM, Jarred Nicholls jar...@webkit.org wrote:
 WebKit is used in many walled garden environments, so we consider these 
 scenarios, but as a secondary goal to our primary goal of being a standards 
 compliant browser engine.  The point being, there will always be content 
 that's created solely for WebKit, so that's not a good argument to make.  So 
 generally speaking, if someone is aiming to create content that's x-browser 
 compatible, they'll do just that and use the least common denominators.
 
 If you support UTF-16 here, then people will use it.  That's always the 
 pattern on the web--one browser implements something extra, and everyone else 
 ends up having to implement it--whether or not it was a good idea--because 
 people accidentally started depending on it.  I don't know why we have to 
 keep repeating this mistake.
 
 We're not adding anything here, it's a matter of complicating and taking 
 away from our decoder for one particular case.  You're acting like we're 
 adding UTF-32 support for the first time.
 
 Of course you are; you're adding UTF-16 and UTF-32 support to the 
 responseType == json API.
 
 Also, since JSON uses zero-byte detection, which isn't used by HTML at all, 
 you'd still need code in your decoder to support that--which means you're 
 forcing everyone else to complicate *their* decoders with this special case.
 
 XHR's behavior, if the change I suggested is accepted, shouldn't require 
 special cases in a decoding layer.  I'd have the decoder expose the final 
 encoding in use (which I'd expect to be available already), and when 
 .response is queried, return null if the final encoding used by the decoder 
 wasn't UTF-8.  This means the decoding would still take place for other 
 encodings, but the end result would be discarded by XHR.  This puts the 
 handling for this restriction within the XHR layer, rather than at the 
 decoder layer.

That's why I'd like to see the spec changed to clarify the discarding if the 
encoding was supplied and isn't UTF-8.

 
 I said:
 Also, I'm a bit confused.  You talk about the rudimentary encoding
 detection in the JSON spec (rfc4627 sec3), but you also mention HTTP
 mechanisms (HTTP headers and overrideMimeType).  These are separate
 and unrelated.  If you're using HTTP mechanisms, then the JSON spec
 doesn't enter into it.  If you're using both HTTP headers (HTTP) and
 UTF-32 BOM detection (rfc4627), then you're using a strange mix of the
 two.  I can't tell what mechanism you're actually using.
 
 Correction: rfc4627 doesn't describe BOM detection, it describes zero-byte 
 detection.  My question remains, though: what exactly are you doing?  Do you 
 do zero-byte detection?  Do you do BOM detection?  What's the order of 
 precedence between zero-byte and/or BOM detection, HTTP Content-Type headers, 
 and overrideMimeType if they disagree?  All of this would need to be 
 specified; currently none of it is.

None of that matters if a specific codec is the one all be all.  If that's the 
consensus then that's it, period.

WebKit shares a single text decoder globally for HTML, XML, plain text, etc. 
the XHR payload runs through it before it would pass to JSON.parse.  Read the 
code if you're interested.  I would need to change the text decoder to skip BOM 
detection for this one case unless the spec added that wording of discarding 
when encoding != UTF-8, then that can be enforced all in XHR with no decoder 
changes.  I don't want to get hung on explaining WebKit's specific impl. 
details.

 
  
 without breaking existing content and yet killing UTF-16 and UTF-32 support 
 just for responseType json would break existing UTF-16 and UTF-32 JSON.  
 Well, which is it?
 
 This is a new feature; there isn't yet existing content using a responseType 
 of json to be broken.
 
 Don't get me wrong, I agree with pushing UTF-8 as the sole text encoding for 
 the web platform.  But it's also plausible to push these restrictions not 
 just in one spot in XHR, but across the web platform
 
 I've yet to see a workable proposal to do this across the web platform, due 
 to backwards-compatibility.  That's why it's being done more narrowly, where 
 it can be done without breaking existing pages.  If you have any novel ideas 
 to do this across the platform, I guarantee everyone on the list would like 
 to hear them.  Failing that, we should do what we can where we can.
 
 and also where the web platform defers to external specs (e.g. JSON).  In 
 this particular case, an author will be more likely to just use responseText 
 + JSON.parse for content he/she cannot control - the content won't end up 
 changing and our initiative is circumvented.
 
 Of course not.  It tells the developer that something's wrong, and he has the 
 choice of working around it or fixing his service.  If just 25% of those 
 people make the right choice, this is a 

Re: [XHR] responseType json

2012-01-06 Thread Glenn Maynard
On Fri, Jan 6, 2012 at 7:36 PM, Jarred Nicholls jar...@webkit.org wrote:

 Correction: rfc4627 doesn't describe BOM detection, it describes zero-byte
 detection.  My question remains, though: what exactly are you doing?  Do
 you do zero-byte detection?  Do you do BOM detection?  What's the order of
 precedence between zero-byte and/or BOM detection, HTTP Content-Type
 headers, and overrideMimeType if they disagree?  All of this would need to
 be specified; currently none of it is.


 None of that matters if a specific codec is the one all be all.  If that's
 the consensus then that's it, period.

 WebKit shares a single text decoder globally for HTML, XML, plain text,
 etc. the XHR payload runs through it before it would pass to JSON.parse.
  Read the code if you're interested.  I would need to change the text
 decoder to skip BOM detection for this one case unless the spec added that
 wording of discarding when encoding != UTF-8, then that can be enforced all
 in XHR with no decoder changes.  I don't want to get hung on explaining
 WebKit's specific impl. details.


All of the details I asked about are user-visible, not WebKit
implementation details, and would need to be specified if encodings other
than UTF-8 were allowed.  I do think this should remain UTF-8 only, but if
you want to discuss allowing other encodings, these are things that would
need to be defined (which requires a clear proposal, not read the code).

I assume it's not using the exact same decoder logic as HTML.  After all,
that would allow non-Unicode encodings.

-- 
Glenn Maynard


Re: [XHR] responseType json

2012-01-06 Thread Jarred Nicholls
On Jan 6, 2012, at 8:10 PM, Glenn Maynard gl...@zewt.org wrote:

 On Fri, Jan 6, 2012 at 7:36 PM, Jarred Nicholls jar...@webkit.org wrote:
 Correction: rfc4627 doesn't describe BOM detection, it describes zero-byte 
 detection.  My question remains, though: what exactly are you doing?  Do you 
 do zero-byte detection?  Do you do BOM detection?  What's the order of 
 precedence between zero-byte and/or BOM detection, HTTP Content-Type 
 headers, and overrideMimeType if they disagree?  All of this would need to 
 be specified; currently none of it is.
 
 None of that matters if a specific codec is the one all be all.  If that's 
 the consensus then that's it, period.
 
 WebKit shares a single text decoder globally for HTML, XML, plain text, etc. 
 the XHR payload runs through it before it would pass to JSON.parse.  Read the 
 code if you're interested.  I would need to change the text decoder to skip 
 BOM detection for this one case unless the spec added that wording of 
 discarding when encoding != UTF-8, then that can be enforced all in XHR with 
 no decoder changes.  I don't want to get hung on explaining WebKit's specific 
 impl. details.
 
 All of the details I asked about are user-visible, not WebKit implementation 
 details, and would need to be specified if encodings other than UTF-8 were 
 allowed.  I do think this should remain UTF-8 only, but if you want to 
 discuss allowing other encodings, these are things that would need to be 
 defined (which requires a clear proposal, not read the code).

Of course, I apologize I didn't mean it as a dismissal, I just figured if we 
are settled on one codec then I'd spare ourselves the time.  I'm also mobile :) 
I could provide you those details if no decoding changes (enforcement) were 
done in WebKit, if you'd like.  But since this is a new API, might as well just 
stick to UTF-8.

 
 I assume it's not using the exact same decoder logic as HTML.  After all, 
 that would allow non-Unicode encodings.

Not exact, but close.  For discussion's sake and in this context, you could 
call it the Unicode text decoder that does BOM detection and switches Unicode 
codecs automatically.  For enforced UTF-8 I'd (have to) disable the BOM 
detection, but additionally could avoid decoding altogether if the specified 
encoding is not explicitly UTF-8 (and that was a part of the spec).  We'll make 
it work either way :)

 
 -- 
 Glenn Maynard
 


Re: [XHR] responseType json

2012-01-06 Thread Tab Atkins Jr.
On Fri, Jan 6, 2012 at 1:45 PM, Jarred Nicholls jar...@webkit.org wrote:
 On Fri, Jan 6, 2012 at 4:34 PM, Ms2ger ms2...@gmail.com wrote:
 On 01/06/2012 10:28 PM, Jarred Nicholls wrote:
 This is an editor's draft of a spec, it's not a recommendation, so it's
 hardly a violation of anything.

 With this kind of attitude, frankly, you shouldn't be implementing a spec.

 I resent that comment, because I'm one of the few that fight in WebKit to
 get us 100% spec compliant in XHR (don't even get me started with how many
 violations there are in Firefox, IE, and Opera...WebKit isn't the only one
 mind you), but that doesn't mean any spec addition, as fluid as it is in the
 early stages, is gospel.  In this case I simply think it wasn't debated
 enough before going in - actually it wasn't debated at all, it was just
 placed in there and now I'm a bad guy for pointing out its disconnect?  I
 think your attitude is far poorer.

 The web platform changes all the time - if this matter is sured up, then
 implementations will change accordingly.

While Ms2ger was a bit short, there's a reason.  Long experience shows
that people who say things like I'm going to code against the Rec
instead of the draft, because the Rec is more stable often end up
causing pain for everyone else, because that more stable Rec is also
*more wrong*, precisely because stable means hasn't been updated to
take into account new information or to fix bugs.  This happens even
for smaller differences - well-meaning devs coding to the Working
Draft of a spec on /TR instead of the error-corrected Editor's Draft
cause never-ending pain.

Old RFCs are also often a source of pain, because we quite often find
that the authors aren't fully versed in the complexities and
subtleties of the public web.  They may be operating from an academic
or corporate standpoint, or otherwise be contained in a local
experience-minimum that affects their view of what's reasonable.

RFC4627, for example, is six years old.  This was right about the
beginning of the time when UTF-8 everywhere, dammit was really
starting to gain hold as a reasonable solution to encoding hell.
Crockford, as well, is not a browser dev, nor is he closely connected
to browser devs in a capacity that would really inform him of why
supporting multiple encodings on the web is so painful.  So, looking
to that RFC for guidance on current best-practice is not a good idea.

This issue has been debated and argued over for a long time, far
predating the current XHR bit.  There's a reason why new file formats
produced in connection with web stuff are utf8-only.  It's good for
the web if we're consistent about this.

~TJ



Re: [XHR] responseType json

2012-01-06 Thread Tab Atkins Jr.
On Fri, Jan 6, 2012 at 12:36 PM, Ojan Vafai o...@chromium.org wrote:
 I'm ambivalent about whether we should restrict to utf8 or not. On the one
 hand, having everyone on utf8 would greatly simplify the web. On the other
 hand, I can imagine this hurting download size for japanese/chinese websites
 (i.e. they'd want utf-16).

Note that this may be subject to the same counter-intuitive forces
that cause UTF-8 to usually be better for CJK HTML pages (because a
lot of the source is ASCII markup).  In JSON, all of the markup
artifacts (braces, brackets, quotes, colon, commas, spaces) are ASCII,
along with numbers, bools, and null.  Only the contents of strings can
be non-ascii.

JSON is generally lighter on markup than XML-like languages, so the
effect may not be as pronounced, but it shouldn't be dismissed without
some study.  At minimum, it will *reduce* the size difference between
the two.

~TJ



Re: [XHR] responseType json

2012-01-06 Thread Glenn Maynard
On Fri, Jan 6, 2012 at 4:45 PM, Tab Atkins Jr. jackalm...@gmail.com wrote:

 Note that this may be subject to the same counter-intuitive forces
 that cause UTF-8 to usually be better for CJK HTML pages (because a
 lot of the source is ASCII markup).  In JSON, all of the markup
 artifacts (braces, brackets, quotes, colon, commas, spaces) are ASCII,
 along with numbers, bools, and null.  Only the contents of strings can
 be non-ascii.

 JSON is generally lighter on markup than XML-like languages, so the
 effect may not be as pronounced, but it shouldn't be dismissed without
 some study.  At minimum, it will *reduce* the size difference between
 the two.


And more fundamentally, this is trying to repurpose charsets as a
compression mechanism.  If you want compression, use compression
(Transfer-Encoding: gzip):

-rw-rw-r-- 1 glenn glenn 7274 Jan 06 23:59 test-utf8.txt
-rw-rw-r-- 1 glenn glenn 3672 Jan 06 23:59 test-utf8.txt.gz
-rw-rw-r-- 1 glenn glenn 6150 Jan 06 23:59 test-utf16.txt
-rw-rw-r-- 1 glenn glenn 3468 Jan 06 23:59 test-utf16.txt.gz

The difference even without compression isn't enough to warrant the
complexity (~15%), and with compression the difference is under 10%.

(Test case is simply copying the rendered text from
http://ja.wikipedia.org/wiki/%E3%83%A1%E3%82%A4%E3%83%B3%E3%83%9A%E3%83%BC%E3%82%B8in
Firefox.)

-- 
Glenn Maynard


Re: [XHR] responseType json

2011-12-13 Thread Henri Sivonen
On Mon, Dec 12, 2011 at 7:08 PM, Jarred Nicholls jar...@sencha.com wrote:
 There's no feeding (re: streaming) of data to a parser, it's buffered until
 the state is DONE (readyState == 4) and then an XML doc is created upon the
 first access to responseXML or response.  Same will go for the JSON parser
 in our first iteration of implementing the json responseType.

FWIW, Gecko parses XML and HTML in a streaming way as data arrives
from the network. When readyState changes to DONE, the document has
already been parsed.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/



Re: [XHR] responseType json

2011-12-12 Thread Anne van Kesteren
On Sun, 11 Dec 2011 15:44:58 +0100, Jarred Nicholls jar...@sencha.com  
wrote:

I understand that's how you spec'ed it, but it's not how it's implemented
in IE nor WebKit for legacy purposes - which is what I meant in the above
statement.


What do you mean legacy purposes? responseType is a new feature. And we  
added it in this way in part because of feedback from the WebKit community  
that did not want to keep the raw data around.


In the thread where we discussed adding it the person working on it for  
WebKit did seem to plan on implementing it per the specification:


http://lists.w3.org/Archives/Public/public-webapps/2010OctDec/thread.html#msg799



In WebKit and IE =9, a responseType of , text,
or document means access to both responseXML and responseText.  I don't
know what IE10's behavior is yet.


IE8 could not have supported this feature and for IE9 I could not find any  
documentation. Are you sure they implemented it?



Given that Gecko does the right thing and Opera will too (next major  
release I believe) I do not really see any reason to change the  
specification.



--
Anne van Kesteren
http://annevankesteren.nl/



Re: [XHR] responseType json

2011-12-12 Thread Jarred Nicholls
I'd like to bring up an issue with the spec with regards to responseText +
the new json responseType.  Currently it is written that responseText
should throw an exception if the responseType is not  or text.  I would
argue that responseText should also return the plain text when the type is
json.

Take the scenario of debugging an application, or an application that has a
Error Reporting feature; If XHR.response returns null, meaning the JSON
payload was not successfully parsed and/or was invalid, there is no means
to retrieve the plain text that caused the error.  null is rather useless
at that point.  See my WebKit bug for more context:
https://bugs.webkit.org/show_bug.cgi?id=73648

For legacy reasons, responseText and responseXML continue to work together
despite the responseType that is set.  In other words, a responseType of
text still allows access to responseXML, and responseType of document
still allows access to responseText.  And it makes sense that this is so;
if a strong-typed Document from responseXML is unable to be created,
responseText is the fallback to get the payload and either debug it, submit
it as an error report, etc.  I would argue that json responseType would
be more valuable if it behaved the same.  Unlike the binary types
(ArrayBuffer, Blob), json and document are backed by a plain text
payload and therefore responseText has value in being accessible.

If all we can get on a bad JSON response is null, I think there is little
incentive for anyone to use the json type when they can use text and
JSON.parse it themselves.

Comments, questions, and flames are welcomed!

Thanks,
Jarred


Re: [XHR] responseType json

2011-12-12 Thread Jarred Nicholls
I'd like to bring up an issue with the spec with regards to responseText +
the new json responseType.  Currently it is written that responseText
should throw an exception if the responseType is not  or text.  I would
argue that responseText should also return the plain text when the type is
json.

Take the scenario of debugging an application, or an application that has a
Error Reporting feature; If XHR.response returns null, meaning the JSON
payload was not successfully parsed and/or was invalid, there is no means
to retrieve the plain text that caused the error.  null is rather useless
at that point.  See my WebKit bug for more context:
https://bugs.webkit.org/show_bug.cgi?id=73648

For legacy reasons, responseText and responseXML continue to work together
despite the responseType that is set.  In other words, a responseType of
text still allows access to responseXML, and responseType of document
still allows access to responseText.  And it makes sense that this is so;
if a strong-typed Document from responseXML is unable to be created,
responseText is the fallback to get the payload and either debug it, submit
it as an error report, etc.  I would argue that json responseType would
be more valuable if it behaved the same.  Unlike the binary types
(ArrayBuffer, Blob), json and document are backed by a plain text
payload and therefore responseText has value in being accessible.

If all we can get on a bad JSON response is null, I think there is little
incentive for anyone to use the json type when they can use text and
JSON.parse it themselves.

Comments, questions, and flames are welcomed!

Thanks,
Jarred


Re: [XHR] responseType json

2011-12-12 Thread Henri Sivonen
On Sun, Dec 11, 2011 at 4:08 PM, Jarred Nicholls jar...@sencha.com wrote:
  A good compromise would be to only throw it away (and thus restrict
 responseText access) upon the first successful parse when accessing
 .response.

I disagree. Even though conceptually, the spec says that you first
accumulate text and then you invoke JSON.parse, I think we should
allow for implementations that feed an incremental JSON parser as data
arrives from the network and throws away each input buffer after
pushing it to the incremental JSON parser.

That is, in order to allow more memory-efficient implementations in
the future, I think we shouldn't expose responseText for JSON.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/



Re: [XHR] responseType json

2011-12-12 Thread Jarred Nicholls
On Mon, Dec 12, 2011 at 5:37 AM, Anne van Kesteren ann...@opera.com wrote:

 On Sun, 11 Dec 2011 15:44:58 +0100, Jarred Nicholls jar...@sencha.com
 wrote:

 I understand that's how you spec'ed it, but it's not how it's implemented
 in IE nor WebKit for legacy purposes - which is what I meant in the above
 statement.


 What do you mean legacy purposes? responseType is a new feature. And we
 added it in this way in part because of feedback from the WebKit community
 that did not want to keep the raw data around.


I wasn't talking about responseType, I was referring to the pair of
responseText and responseXML being accessible together since the dawn of
time.  I don't know why WebKit and IE didn't take the opportunity to use
responseType and kill that behavior; don't ask me, I wasn't responsible for
it ;)



 In the thread where we discussed adding it the person working on it for
 WebKit did seem to plan on implementing it per the specification:

 http://lists.w3.org/Archives/**Public/public-webapps/**
 2010OctDec/thread.html#msg799http://lists.w3.org/Archives/Public/public-webapps/2010OctDec/thread.html#msg799


Clearly not - shame, because now I'm trying to clean up the mess.





  In WebKit and IE =9, a responseType of , text,
 or document means access to both responseXML and responseText.  I don't
 know what IE10's behavior is yet.


 IE8 could not have supported this feature and for IE9 I could not find any
 documentation. Are you sure they implemented it?


I'm not positive if they did to be honest - I haven't found it documented
anywhere.




 Given that Gecko does the right thing and Opera will too (next major
 release I believe) I do not really see any reason to change the
 specification.


I started an initiative to bring XHR in WebKit up-to-spec (see
https://bugs.webkit.org/show_bug.cgi?id=54162) and got a lot of push back.
 All I'm asking is that if I run into push back again, that I can send them
your way ;)





 --
 Anne van Kesteren
 http://annevankesteren.nl/




-- 


*Sencha*
Jarred Nicholls, Senior Software Architect
@jarrednicholls
http://twitter.com/jarrednicholls


Re: [XHR] responseType json

2011-12-12 Thread Jarred Nicholls
On Mon, Dec 12, 2011 at 6:39 AM, Henri Sivonen hsivo...@iki.fi wrote:

 On Sun, Dec 11, 2011 at 4:08 PM, Jarred Nicholls jar...@sencha.com
 wrote:
   A good compromise would be to only throw it away (and thus restrict
  responseText access) upon the first successful parse when accessing
  .response.

 I disagree. Even though conceptually, the spec says that you first
 accumulate text and then you invoke JSON.parse, I think we should
 allow for implementations that feed an incremental JSON parser as data
 arrives from the network and throws away each input buffer after
 pushing it to the incremental JSON parser.

 That is, in order to allow more memory-efficient implementations in
 the future, I think we shouldn't expose responseText for JSON.


I'm completely down with that.  It still leaves an unsatisfied use case;
but one that, after a nice weekend of relaxation, I no longer care about.



 --
 Henri Sivonen
 hsivo...@iki.fi
 http://hsivonen.iki.fi/




-- 


*Sencha*
Jarred Nicholls, Senior Software Architect
@jarrednicholls
http://twitter.com/jarrednicholls


Re: [XHR] responseType json

2011-12-12 Thread Anne van Kesteren
On Mon, 12 Dec 2011 14:12:57 +0100, Jarred Nicholls jar...@sencha.com  
wrote:

I started an initiative to bring XHR in WebKit up-to-spec (see
https://bugs.webkit.org/show_bug.cgi?id=54162) and got a lot of push  
back. All I'm asking is that if I run into push back again, that I can  
send them your way ;)


So a) thanks a lot for doing that and b) please do send them here.  
Discussing the XMLHttpRequest standard should happen here, not in  
bugs.webkit.org :-)



--
Anne van Kesteren
http://annevankesteren.nl/



Re: [XHR] responseType json

2011-12-12 Thread Olli Pettay

On 12/12/2011 03:12 PM, Jarred Nicholls wrote:

On Mon, Dec 12, 2011 at 5:37 AM, Anne van Kesteren ann...@opera.com
mailto:ann...@opera.com wrote:

On Sun, 11 Dec 2011 15:44:58 +0100, Jarred Nicholls
jar...@sencha.com mailto:jar...@sencha.com wrote:

I understand that's how you spec'ed it, but it's not how it's
implemented
in IE nor WebKit for legacy purposes - which is what I meant in
the above
statement.


What do you mean legacy purposes? responseType is a new feature. And
we added it in this way in part because of feedback from the WebKit
community that did not want to keep the raw data around.


I wasn't talking about responseType, I was referring to the pair of
responseText and responseXML being accessible together since the dawn of
time.


In case responseType is not set. If responseType is set, implementations
can optimize certain things.


 I don't know why WebKit and IE didn't take the opportunity to use
responseType

responseType is a new thing. Gecko hasn't changed behavior in case
responseType is not set.


and kill that behavior; don't ask me, I wasn't responsible
for it ;)


In the thread where we discussed adding it the person working on it
for WebKit did seem to plan on implementing it per the specification:


http://lists.w3.org/Archives/__Public/public-webapps/__2010OctDec/thread.html#msg799

http://lists.w3.org/Archives/Public/public-webapps/2010OctDec/thread.html#msg799


Clearly not - shame, because now I'm trying to clean up the mess.




In WebKit and IE =9, a responseType of , text,
or document means access to both responseXML and responseText.
  I don't
know what IE10's behavior is yet.


IE8 could not have supported this feature and for IE9 I could not
find any documentation. Are you sure they implemented it?


I'm not positive if they did to be honest - I haven't found it
documented anywhere.



Given that Gecko does the right thing and Opera will too (next major
release I believe) I do not really see any reason to change the
specification.


I started an initiative to bring XHR in WebKit up-to-spec (see
https://bugs.webkit.org/show_bug.cgi?id=54162) and got a lot of push
back.  All I'm asking is that if I run into push back again, that I can
send them your way ;)




--
Anne van Kesteren
http://annevankesteren.nl/




--


*Sencha*
Jarred Nicholls, Senior Software Architect
@jarrednicholls
http://twitter.com/jarrednicholls






Re: [XHR] responseType json

2011-12-12 Thread Jarred Nicholls
On Mon, Dec 12, 2011 at 9:28 AM, Boris Zbarsky bzbar...@mit.edu wrote:

 On 12/12/11 8:12 AM, Jarred Nicholls wrote:

 I started an initiative to bring XHR in WebKit up-to-spec (see
 https://bugs.webkit.org/show_**bug.cgi?id=54162https://bugs.webkit.org/show_bug.cgi?id=54162)
 and got a lot of push
 back.


 That seems to be about a different issue than responseType, right?

 I just tried the following testcase:

 script
  var xhr = new XMLHttpRequest();
  xhr.open(GET, window.location, false);
  xhr.responseType = document
  xhr.send();
  try { alert(xhr.responseText); } catch (e) { alert(e); }
  try { alert(xhr.responseXML); } catch (e) { alert(e); }

  xhr.open(GET, window.location, false);
  xhr.responseType = text
  xhr.send();
  try { alert(xhr.responseText); } catch (e) { alert(e); }
  try { alert(xhr.responseXML); } catch (e) { alert(e); }
 /script

 Gecko behavior seems to be per spec: the attempt to get responseText fails
 on the first XHR, and the attempt to get responseXML fails on the second
 XHR.

 WebKit (tested Chrome dev channel and Safari 5.1.1 behavior) seems to be
 partially per spec: the attempt to get responseText throws for the first
 XHR, but the attempt to get the responseXML succeeds for the second XHR.
  That sort of makes sense in terms of how I recall WebKit implementing
 feeding data to their parser in XHR, if the implementation of responseType
 just wasn't very careful.


There's no feeding (re: streaming) of data to a parser, it's buffered until
the state is DONE (readyState == 4) and then an XML doc is created upon the
first access to responseXML or response.  Same will go for the JSON parser
in our first iteration of implementing the json responseType.



 Given that WebKit already implements the right behavior when responseType
 = document, it sounds like the only bug on their end here is really
 responseType = text handling, right?  It'd definitely be good to just fix
 that...


Yeah I'm going to clean up all the mess.




 -Boris


Thanks!

-- 


*Sencha*
Jarred Nicholls, Senior Software Architect
@jarrednicholls
http://twitter.com/jarrednicholls


Re: [XHR] responseType json

2011-12-11 Thread Tab Atkins Jr.
On Sat, Dec 10, 2011 at 9:10 PM, Jarred Nicholls jar...@sencha.com wrote:
 I'd like to bring up an issue with the spec with regards to responseText +
 the new json responseType.  Currently it is written that responseText
 should throw an exception if the responseType is not  or text.  I would
 argue that responseText should also return the plain text when the type is
 json.

 Take the scenario of debugging an application, or an application that has a
 Error Reporting feature; If XHR.response returns null, meaning the JSON
 payload was not successfully parsed and/or was invalid, there is no means to
 retrieve the plain text that caused the error.  null is rather useless at
 that point.  See my WebKit bug for more
 context: https://bugs.webkit.org/show_bug.cgi?id=73648

 For legacy reasons, responseText and responseXML continue to work together
 despite the responseType that is set.  In other words, a responseType of
 text still allows access to responseXML, and responseType of document
 still allows access to responseText.  And it makes sense that this is so; if
 a strong-typed Document from responseXML is unable to be created,
 responseText is the fallback to get the payload and either debug it, submit
 it as an error report, etc.  I would argue that json responseType would be
 more valuable if it behaved the same.  Unlike the binary types (ArrayBuffer,
 Blob), json and document are backed by a plain text payload and
 therefore responseText has value in being accessible.

 If all we can get on a bad JSON response is null, I think there is little
 incentive for anyone to use the json type when they can use text and
 JSON.parse it themselves.

What's the problem with simply setting responseType to 'text' when debugging?

A nice benefit of *not* presenting the text by default is that the
browser can throw the text away immediately, rather than keeping
around the payload in both forms and paying for it twice in memory
(especially since the text form will, I believe, generally be larger
than the JSON form).

~TJ



Re: [XHR] responseType json

2011-12-11 Thread Anne van Kesteren
On Sun, 11 Dec 2011 06:10:26 +0100, Jarred Nicholls jar...@sencha.com  
wrote:
For legacy reasons, responseText and responseXML continue to work  
together despite the responseType that is set.


This is false. responseType text allows access to responseText, but not  
responseXML. document allows access to responseXML, but not responseText.


We made this exclusive to reduce memory usage. I hope that browsers will  
report the JSON errors to the console and I think at some point going  
forward we should probably introduce some kind of error object for  
XMLHttpRequest.



--
Anne van Kesteren
http://annevankesteren.nl/



Re: [XHR] responseType json

2011-12-11 Thread Jarred Nicholls
On Sun, Dec 11, 2011 at 5:08 AM, Tab Atkins Jr. jackalm...@gmail.comwrote:

 On Sat, Dec 10, 2011 at 9:10 PM, Jarred Nicholls jar...@sencha.com
 wrote:
  I'd like to bring up an issue with the spec with regards to responseText
 +
  the new json responseType.  Currently it is written that responseText
  should throw an exception if the responseType is not  or text.  I
 would
  argue that responseText should also return the plain text when the type
 is
  json.
 
  Take the scenario of debugging an application, or an application that
 has a
  Error Reporting feature; If XHR.response returns null, meaning the JSON
  payload was not successfully parsed and/or was invalid, there is no
 means to
  retrieve the plain text that caused the error.  null is rather useless
 at
  that point.  See my WebKit bug for more
  context: https://bugs.webkit.org/show_bug.cgi?id=73648
 
  For legacy reasons, responseText and responseXML continue to work
 together
  despite the responseType that is set.  In other words, a responseType of
  text still allows access to responseXML, and responseType of document
  still allows access to responseText.  And it makes sense that this is
 so; if
  a strong-typed Document from responseXML is unable to be created,
  responseText is the fallback to get the payload and either debug it,
 submit
  it as an error report, etc.  I would argue that json responseType
 would be
  more valuable if it behaved the same.  Unlike the binary types
 (ArrayBuffer,
  Blob), json and document are backed by a plain text payload and
  therefore responseText has value in being accessible.
 
  If all we can get on a bad JSON response is null, I think there is
 little
  incentive for anyone to use the json type when they can use text and
  JSON.parse it themselves.

 What's the problem with simply setting responseType to 'text' when
 debugging?


This does not satisfy the use cases of error reporting w/ contextual data
nor the use case of debugging a runtime error in a production environment.



 A nice benefit of *not* presenting the text by default is that the
 browser can throw the text away immediately, rather than keeping
 around the payload in both forms and paying for it twice in memory
 (especially since the text form will, I believe, generally be larger
 than the JSON form).


Yes I agree, and it's what everyone w/ WebKit wants to try and accomplish.
 A good compromise would be to only throw it away (and thus restrict
responseText access) upon the first successful parse when accessing
.response.



 ~TJ




-- 


*Sencha*
Jarred Nicholls, Senior Software Architect
@jarrednicholls
http://twitter.com/jarrednicholls


Re: [XHR] responseType json

2011-12-11 Thread Jarred Nicholls
On Sun, Dec 11, 2011 at 9:08 AM, Jarred Nicholls jar...@sencha.com wrote:

 On Sun, Dec 11, 2011 at 5:08 AM, Tab Atkins Jr. jackalm...@gmail.comwrote:

 On Sat, Dec 10, 2011 at 9:10 PM, Jarred Nicholls jar...@sencha.com
 wrote:
  I'd like to bring up an issue with the spec with regards to
 responseText +
  the new json responseType.  Currently it is written that responseText
  should throw an exception if the responseType is not  or text.  I
 would
  argue that responseText should also return the plain text when the type
 is
  json.
 
  Take the scenario of debugging an application, or an application that
 has a
  Error Reporting feature; If XHR.response returns null, meaning the
 JSON
  payload was not successfully parsed and/or was invalid, there is no
 means to
  retrieve the plain text that caused the error.  null is rather
 useless at
  that point.  See my WebKit bug for more
  context: https://bugs.webkit.org/show_bug.cgi?id=73648
 
  For legacy reasons, responseText and responseXML continue to work
 together
  despite the responseType that is set.  In other words, a responseType of
  text still allows access to responseXML, and responseType of
 document
  still allows access to responseText.  And it makes sense that this is
 so; if
  a strong-typed Document from responseXML is unable to be created,
  responseText is the fallback to get the payload and either debug it,
 submit
  it as an error report, etc.  I would argue that json responseType
 would be
  more valuable if it behaved the same.  Unlike the binary types
 (ArrayBuffer,
  Blob), json and document are backed by a plain text payload and
  therefore responseText has value in being accessible.
 
  If all we can get on a bad JSON response is null, I think there is
 little
  incentive for anyone to use the json type when they can use text and
  JSON.parse it themselves.

 What's the problem with simply setting responseType to 'text' when
 debugging?


 This does not satisfy the use cases of error reporting w/ contextual data
 nor the use case of debugging a runtime error in a production environment.


Given that most user agents send the payload to the console, the debugging
scenario is satisfied; so I renege on that.  Error reporting is still a
valid use case, albeit a rare requirement.





 A nice benefit of *not* presenting the text by default is that the
 browser can throw the text away immediately, rather than keeping
 around the payload in both forms and paying for it twice in memory
 (especially since the text form will, I believe, generally be larger
 than the JSON form).


 Yes I agree, and it's what everyone w/ WebKit wants to try and accomplish.
  A good compromise would be to only throw it away (and thus restrict
 responseText access) upon the first successful parse when accessing
 .response.



 ~TJ




 --
 

 *Sencha*
 Jarred Nicholls, Senior Software Architect
 @jarrednicholls
 http://twitter.com/jarrednicholls




-- 


*Sencha*
Jarred Nicholls, Senior Software Architect
@jarrednicholls
http://twitter.com/jarrednicholls


Re: [XHR] responseType json

2011-12-11 Thread Jarred Nicholls
On Sun, Dec 11, 2011 at 6:55 AM, Anne van Kesteren ann...@opera.com wrote:

 On Sun, 11 Dec 2011 06:10:26 +0100, Jarred Nicholls jar...@sencha.com
 wrote:

 For legacy reasons, responseText and responseXML continue to work
 together despite the responseType that is set.


 This is false. responseType text allows access to responseText, but not
 responseXML. document allows access to responseXML, but not responseText.


I understand that's how you spec'ed it, but it's not how it's implemented
in IE nor WebKit for legacy purposes - which is what I meant in the above
statement.  Firefox (tested on 8) is the only one adhering to the spec as
you described above.  In WebKit and IE =9, a responseType of , text,
or document means access to both responseXML and responseText.  I don't
know what IE10's behavior is yet.

I'll be fighting a battle soon to get WebKit to be 100% compliant with the
spec - and it's hard to convince others (harder than it should be) to
change when IE doesn't behave in the same manner.  The use case of error
reporting w/ contextual data (i.e. the bad payload) is still unsatisfied,
but it's not a common scenario.



 We made this exclusive to reduce memory usage. I hope that browsers will
 report the JSON errors to the console


The net response is always logged in the console, so this is satisfactory
for debugging purposes, just not realtime error handling.  I think an error
object would be good.  JSON errors reporting to the console will unlikely
be seen unless it is a defined exception being thrown, per spec.


 and I think at some point going forward we should probably introduce some
 kind of error object for XMLHttpRequest.


One of the inconsistencies with browsers (including IE and WebKit) are
how'when exceptions are being thrown when accessing different properties
(getResponseHeader, statusText, etc.).  The spec often says to fail
gracefully and return null or an empty string, etc., while IE and WebKit
tend to throw exceptions instead.  Perhaps an XHR error object would be
useful there.





 --
 Anne van Kesteren
 http://annevankesteren.nl/




-- 


*Sencha*
Jarred Nicholls, Senior Software Architect
@jarrednicholls
http://twitter.com/jarrednicholls


Re: [XHR] responseType json

2011-12-10 Thread Jarred Nicholls
I'd like to bring up an issue with the spec with regards to responseText +
the new json responseType.  Currently it is written that responseText
should throw an exception if the responseType is not  or text.  I would
argue that responseText should also return the plain text when the type is
json.

Take the scenario of debugging an application, or an application that has a
Error Reporting feature; If XHR.response returns null, meaning the JSON
payload was not successfully parsed and/or was invalid, there is no means
to retrieve the plain text that caused the error.  null is rather useless
at that point.  See my WebKit bug for more context:
https://bugs.webkit.org/show_bug.cgi?id=73648

For legacy reasons, responseText and responseXML continue to work together
despite the responseType that is set.  In other words, a responseType of
text still allows access to responseXML, and responseType of document
still allows access to responseText.  And it makes sense that this is so;
if a strong-typed Document from responseXML is unable to be created,
responseText is the fallback to get the payload and either debug it, submit
it as an error report, etc.  I would argue that json responseType would
be more valuable if it behaved the same.  Unlike the binary types
(ArrayBuffer, Blob), json and document are backed by a plain text
payload and therefore responseText has value in being accessible.

If all we can get on a bad JSON response is null, I think there is little
incentive for anyone to use the json type when they can use text and
JSON.parse it themselves.

Comments, questions, and flames are welcomed!

Thanks,
Jarred


Re: [XHR] responseType json

2011-12-05 Thread Anne van Kesteren
On Sun, 04 Dec 2011 21:38:53 +0100, Bjoern Hoehrmann derhoe...@gmx.net  
wrote:

I did not reverse-engineer the current proposal, but my impression is it
would handle text and json differently with respect to the Unicode
signature. I do not think that would be particularily desirable if true.


Thanks, fixed; that was an oversight:

http://dvcs.w3.org/hg/xhr/rev/edfeab9138a4
http://dvcs.w3.org/hg/xhr/raw-file/tip/Overview.html#json-response-entity-body


--
Anne van Kesteren
http://annevankesteren.nl/



Re: [XHR] responseType json

2011-12-05 Thread Anne van Kesteren
On Fri, 02 Dec 2011 14:00:26 +0100, Anne van Kesteren ann...@opera.com  
wrote:
I tied it to UTF-8 to further the fight on encoding proliferation and  
encourage developers to always use that encoding.


FYI, I also tied it to ECMAScript's definition of JSON, which has some  
restrictions in place that the JSON RFC does not have. Given that  
ECMAScript thus far had the only platform-based implementation of JSON it  
made sense for XMLHttpRequest to follow that.



--
Anne van Kesteren
http://annevankesteren.nl/



Re: [XHR] responseType json

2011-12-05 Thread Glenn Adams
What do you mean by treat content that clearly is UTF-32 as
UTF-16-encoded? Do you mean interpreting it as a sequence of unsigned
shorts? That would be a direct violation of the semantics of UTF-32, would
it not?

I'm not advocating the use of UTF-32 for interchange, but it does have the
advantage of being fixed length encoding covering the entirety of Unicode.

On Sun, Dec 4, 2011 at 1:41 PM, Bjoern Hoehrmann derhoe...@gmx.net wrote:

 * Henri Sivonen wrote:
 Browsers don't support UTF-32. It has no use cases as an interchange
 encoding beyond writing evil test cases. Defining it as a valid
 encoding is reprehensible.

 If UTF-32 is bad, then it should be detected as such and be rejected.
 The current idea, from what I can tell, is to ignore UTF-32 exists,
 and treat content that clearly is UTF-32 as UTF-16-encoded, which is
 much worse, as some components are likely to actually detect UTF-32,
 they would disagree with other components, and that tends to cause
 strange bugs and security issues. Thankfully, that is not a problem
 in this particular case.




Re: [XHR] responseType json

2011-12-05 Thread Bjoern Hoehrmann
* Glenn Adams wrote:
What do you mean by treat content that clearly is UTF-32 as
UTF-16-encoded? Do you mean interpreting it as a sequence of unsigned
shorts? That would be a direct violation of the semantics of UTF-32, would
it not?

Consider you have

  ...
  Content-Type: example/example;charset=utf-32

  FF FE 00 00 ...

Some would like to treat this as UTF-16 encoded document starting with
U+ after the Unicode signature, even though it clearly is UTF-32.
-- 
Björn Höhrmann · mailto:bjo...@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 



Re: [XHR] responseType json

2011-12-05 Thread Glenn Adams
In the example you give, there is consistency between the content metadata
(charset param) and the content itself (as determined by sniffing). So why
would both the metadata and content be ignored?

If there were an inconsistency (but there isn't) then [1] would apply, in
which case the metadata can't be ignored without user consent.

[1] http://www.w3.org/TR/webarch/#metadata-inconsistencies

In any case, what is suggested below would be a direct violation of [2] as
well.

[2] http://www.w3.org/TR/charmod/#C030

On Mon, Dec 5, 2011 at 8:20 AM, Bjoern Hoehrmann derhoe...@gmx.net wrote:

 * Glenn Adams wrote:
 What do you mean by treat content that clearly is UTF-32 as
 UTF-16-encoded? Do you mean interpreting it as a sequence of unsigned
 shorts? That would be a direct violation of the semantics of UTF-32, would
 it not?

 Consider you have

  ...
  Content-Type: example/example;charset=utf-32

  FF FE 00 00 ...

 Some would like to treat this as UTF-16 encoded document starting with
 U+ after the Unicode signature, even though it clearly is UTF-32.
 --
 Björn Höhrmann · mailto:bjo...@hoehrmann.de · http://bjoern.hoehrmann.de
 Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
 25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/



Re: [XHR] responseType json

2011-12-05 Thread Ian Hickson
On Sun, 4 Dec 2011, Bjoern Hoehrmann wrote:
 
 The fight here is for standards.

The fight, if you want to characterise it as such, is for 
interoperability, not standards. Standards are just a tool we use today 
for that purpose.

For these purposes, we can ignore UTF-32. It's poorly implemented if at 
all, it's hardly ever used, and it provides no useful benefits for 
transport. Anything we can do to steer people more towards UTF-8 is a win.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'



Re: [XHR] responseType json

2011-12-05 Thread Glenn Maynard
On Mon, Dec 5, 2011 at 11:12 AM, Glenn Adams gl...@skynav.com wrote:

 In the example you give, there is consistency between the content metadata
 (charset param) and the content itself (as determined by sniffing). So why
 would both the metadata and content be ignored?


Because in the real world, UTF-32 isn't a transfer encoding.  Browsers
shouldn't have to waste time supporting it, and if someone accidentally
creates content in that encoding somehow, it should be immediately clear
that something is wrong.

It would take a major disconnect from reality to insist that browsers
support UTF-32.

 In any case, what is suggested below would be a direct violation of [2]
as well.

 [2] http://www.w3.org/TR/charmod/#C030

No, it wouldn't.  That doesn't say that UTF-32 must be recognized.

-- 
Glenn Maynard


Re: [XHR] responseType json

2011-12-05 Thread Glenn Maynard
On Mon, Dec 5, 2011 at 1:00 PM, Glenn Adams gl...@skynav.com wrote:

  [2] http://www.w3.org/TR/charmod/#C030


 No, it wouldn't.  That doesn't say that UTF-32 must be recognized.


 You misread me. I am not saying or supporting that UTF-32 must be
 recognized. I am saying that MIS-recognizing UTF-32 as UTF-16 violates [2].


It's impossible to violate that rule if the encoding isn't recognized.
When an IANA-registered charset name *is recognized*; UTF-32 isn't
recognized, so this is irrelevant.

If a browser doesn't support UTF-32 as an incoming interchange format, then
 it should treat it as any other character encoding it does not recognize.
 It must not pretend it is another encoding.


When an encoding is not recognized by the browser, the browser has full
discretion in guessing the encoding.  (See step 7 of
http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#determining-the-character-encoding.)
It's perfectly reasonable for UTF-32 data to be detected as UTF-16.  For
example, UTF-32 data is likely to contain null bytes when scanned bytewise,
and UTF-16 is the only supported encoding where that's likely to happen.
Steps 7 and 8 gives browsers unrestricted freedom in selecting the encoding
when the previous steps are unable to do so; if they choose to include if
the charset is declared as UTF-32, return UTF-16 as one of their
autodetection rules, the spec allows it.

-- 
Glenn Maynard


Re: [XHR] responseType json

2011-12-05 Thread Glenn Adams
Let me choose my words more carefully.

A browser may recognize UTF-32 (e.g., in a sniffer) without supporting it
(either internally or for transcoding into a different internal encoding).

If the browser supports UTF-32, then step (2) of [1] applies.

[1]
http://dev.w3.org/html5/spec/Overview.html#determining-the-character-encoding

But, if the browser does not support UTF-32, then the table in step (4) of
[1] is supposed to apply, which would interpret the initial two bytes FF FE
as UTF-16LE according to the current language of [1], and further, return a
confidence level of certain.

I see the problem now. It seems that the table in step (4) should be
changed to interpret an initial FF FE as UTF-16BE only if the following two
bytes are not 00.

On Mon, Dec 5, 2011 at 11:45 AM, Glenn Maynard gl...@zewt.org wrote:

 On Mon, Dec 5, 2011 at 1:00 PM, Glenn Adams gl...@skynav.com wrote:

  [2] http://www.w3.org/TR/charmod/#C030


 No, it wouldn't.  That doesn't say that UTF-32 must be recognized.


 You misread me. I am not saying or supporting that UTF-32 must be
 recognized. I am saying that MIS-recognizing UTF-32 as UTF-16 violates [2].


 It's impossible to violate that rule if the encoding isn't recognized.
 When an IANA-registered charset name *is recognized*; UTF-32 isn't
 recognized, so this is irrelevant.

 If a browser doesn't support UTF-32 as an incoming interchange format,
 then it should treat it as any other character encoding it does not
 recognize. It must not pretend it is another encoding.


 When an encoding is not recognized by the browser, the browser has full
 discretion in guessing the encoding.  (See step 7 of
 http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#determining-the-character-encoding.)
 It's perfectly reasonable for UTF-32 data to be detected as UTF-16.  For
 example, UTF-32 data is likely to contain null bytes when scanned bytewise,
 and UTF-16 is the only supported encoding where that's likely to happen.
 Steps 7 and 8 gives browsers unrestricted freedom in selecting the encoding
 when the previous steps are unable to do so; if they choose to include if
 the charset is declared as UTF-32, return UTF-16 as one of their
 autodetection rules, the spec allows it.

 --
 Glenn Maynard





Re: [XHR] responseType json

2011-12-05 Thread Ian Hickson
On Mon, 5 Dec 2011, Glenn Adams wrote:

 I see the problem now. It seems that the table in step (4) should be 
 changed to interpret an initial FF FE as UTF-16BE only if the following 
 two bytes are not 00.

The current text is intentional. UTF-32 is explicitly not supported by the 
HTML standard.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'



Re: [XHR] responseType json

2011-12-05 Thread Glenn Maynard
On Mon, Dec 5, 2011 at 3:15 PM, Glenn Adams gl...@skynav.com wrote:

 But, if the browser does not support UTF-32, then the table in step (4) of
 [1] is supposed to apply, which would interpret the initial two bytes FF FE
 as UTF-16LE according to the current language of [1], and further, return a
 confidence level of certain.

 I see the problem now. It seems that the table in step (4) should be
 changed to interpret an initial FF FE as UTF-16BE only if the following two
 bytes are not 00.


That wouldn't actually bring browsers and the spec closer together; it
would actually bring them further apart.

At first glance, it looks like it makes the spec allow WebKit and IE's
behavior, which (unfortunately) includes UTF-32 detection, by allowing them
to fall through to step 7, where they're allowed to detect things however
they want.

However, that's ignoring step 5.  If step 4 passes through, then step 5
would happen next.  That means this carefully-constructed file would be
detected as UTF-8 by step 5:

http://zewt.org/~glenn/test-utf32-with-ascii-meta.html-no-encoding

That's not what happens in any browser; FF detects it as UTF-16 and WebKit
and IE detect it as UTF-32.  This change would require it to be detected as
UTF-8, which would have security implications if implemented, eg. a page
outputting escaped user-inputted text in UTF-32 might contain a string like
this, followed by a hostile script, when interpreted as UTF-8.

This really isn't worth spending time on; you've free to press this if you
like, but I'm moving on.

-- 
Glenn Maynard


Re: [XHR] responseType json

2011-12-05 Thread Glenn Adams
The problem as I see it is that the current spec text for charset detection
effectively *requires* a browser that does not support UTF-32 to
explicitly ignore content metadata that may be correct (if it specifies
UTF-32 as charset param), and further, to explicitly mis-label such content
as UTF-16LE in the case that the first four bytes are FF FE 00 00. Indeed,
the current algorithm requires mis-labelling such content as UTF-16LE with
a confidence of certain.

The current text is also ambiguous with respect to what support means in
step (2) of Section 8.2.2.1 of [1]. Which of the following are meant by
support?

   - recognize with sniffer
   - be capable of using directly as internal coding
   - be capable of transcoding to internal coding

[1]
http://dev.w3.org/html5/spec/Overview.html#determining-the-character-encoding

On Mon, Dec 5, 2011 at 3:10 PM, Ian Hickson i...@hixie.ch wrote:

 On Mon, 5 Dec 2011, Glenn Adams wrote:
 
  I see the problem now. It seems that the table in step (4) should be
  changed to interpret an initial FF FE as UTF-16BE only if the following
  two bytes are not 00.

 The current text is intentional. UTF-32 is explicitly not supported by the
 HTML standard.

 --
 Ian Hickson   U+1047E)\._.,--,'``.fL
 http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
 Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'



Re: [XHR] responseType json

2011-12-05 Thread Ian Hickson
On Mon, 5 Dec 2011, Glenn Adams wrote:

 The problem as I see it is that the current spec text for charset 
 detection effectively *requires* a browser that does not support 
 UTF-32 to explicitly ignore content metadata that may be correct (if it 
 specifies UTF-32 as charset param), and further, to explicitly mis-label 
 such content as UTF-16LE in the case that the first four bytes are FF FE 
 00 00. Indeed, the current algorithm requires mis-labelling such content 
 as UTF-16LE with a confidence of certain.

Yes, this is explicitly intentional.


 The current text is also ambiguous with respect to what support means 
 in step (2) of Section 8.2.2.1 of [1]. Which of the following are meant 
 by support?

To quote from the terminology section: The specification uses the term 
supported when referring to whether a user agent has an implementation 
capable of decoding the semantics of an external resource.


- recognize with sniffer
- be capable of using directly as internal coding
- be capable of transcoding to internal coding

I don't know how to distinguish the latter two in a black-box manner. 
Either of the latter two is a correct interpretation as far as I can tell.

I suppose the current spec could be read such that the user agent could 
autodetect an unsupported encoding, but that wouldn't be very clever. I 
guess I can add some text to the spec to make that more obviously bad.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'



Re: [XHR] responseType json

2011-12-04 Thread Anne van Kesteren

On Fri, 02 Dec 2011 17:03:37 +0100, Henri Sivonen hsivo...@iki.fi wrote:

Does anyone actually transfer JSON as UTF-16?


Note that you cannot transmit UTF-16 JSON from a page (sending text is  
UTF-8 only) so not being able to receive it either seems fine. I actually  
think that we should make text UTF-8-only too to enforce this symmetry.



--
Anne van Kesteren
http://annevankesteren.nl/



Re: [XHR] responseType json

2011-12-04 Thread Julian Reschke

On 2011-12-04 16:52, Anne van Kesteren wrote:

On Fri, 02 Dec 2011 17:03:37 +0100, Henri Sivonen hsivo...@iki.fi wrote:

Does anyone actually transfer JSON as UTF-16?


Note that you cannot transmit UTF-16 JSON from a page (sending text is
UTF-8 only) so not being able to receive it either seems fine. I
actually think that we should make text UTF-8-only too to enforce this
symmetry.


I think you're confusing client abilities with server abilities. Just 
because XHR can't *send* Json as UTF-16 doesn't mean servers don't send 
it to clients.


It appears this needs more research.

Best regards, Julian



Re: [XHR] responseType json

2011-12-04 Thread Bjoern Hoehrmann
* Anne van Kesteren wrote:
I tied it to UTF-8 to further the fight on encoding proliferation and  
encourage developers to always use that encoding.

The fight here is for standards. You know, you read the specification,
create some content, and then that content works in all implementations
that claim to implement the specification as you would assume based on
reading the specification. You want to know how JSON content is handled
by reading the JSON specification, and not the documentation for each
and every JSON processors.

That said, there are a number of media types by now that use the +json
convention that's not actually defined anywhere authoriatively, and it
is common to use other media types than application/json for JSON con-
tent, like application/javascript, and the rules there vary. Should it
be possible to use the UTF-8 Unicode Signature? Types differ on that and
it seems likely that implementations do aswell.

I did not reverse-engineer the current proposal, but my impression is it
would handle text and json differently with respect to the Unicode
signature. I do not think that would be particularily desirable if true.

Anyway, given that it's difficult to tell which rules apply without some
specification for +json and other things, I can't find much wrong with
forcing the encoding to be UTF-8, especially because the other options
that the JSON specification allows would result in a fatal error, which
would be the same if implementations tried to detect the encoding, but
then decided they do not support, say, UTF-32 encoded JSON. But it's not
clear to me, that the Unicode signature should result in a fatal error,
if you ignore what the JSON specification says about encodings.
-- 
Björn Höhrmann · mailto:bjo...@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 



Re: [XHR] responseType json

2011-12-04 Thread Bjoern Hoehrmann
* Henri Sivonen wrote:
Browsers don't support UTF-32. It has no use cases as an interchange
encoding beyond writing evil test cases. Defining it as a valid
encoding is reprehensible.

If UTF-32 is bad, then it should be detected as such and be rejected.
The current idea, from what I can tell, is to ignore UTF-32 exists,
and treat content that clearly is UTF-32 as UTF-16-encoded, which is
much worse, as some components are likely to actually detect UTF-32,
they would disagree with other components, and that tends to cause
strange bugs and security issues. Thankfully, that is not a problem
in this particular case.
-- 
Björn Höhrmann · mailto:bjo...@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 



Re: [XHR] responseType json

2011-12-02 Thread Julian Reschke

On 2011-12-02 14:00, Anne van Kesteren wrote:

I added a json responseType
http://dvcs.w3.org/hg/xhr/raw-file/tip/Overview.html#the-responsetype-attribute
and JSON response entity body description:
http://dvcs.w3.org/hg/xhr/raw-file/tip/Overview.html#json-response-entity-body
This is based on a proposal by Gecko from a while back.

I tied it to UTF-8 to further the fight on encoding proliferation and
encourage developers to always use that encoding.


Well, it breaks legitimate JSON resources. What's the benefit?

Best regards, Julian



Re: [XHR] responseType json

2011-12-02 Thread Karl Dubost

Le 2 déc. 2011 à 08:00, Anne van Kesteren a écrit :
 I tied it to UTF-8 to further the fight on encoding proliferation and 
 encourage developers to always use that encoding.

Do we have stats on what is currently done on the Web with regards to the 
encoding?

-- 
Karl Dubost - http://dev.opera.com/
Developer Relations  Tools, Opera Software




Re: [XHR] responseType json

2011-12-02 Thread Robin Berjon
On Dec 2, 2011, at 14:00 , Anne van Kesteren wrote:
 I tied it to UTF-8 to further the fight on encoding proliferation and 
 encourage developers to always use that encoding.

That's a good fight, but I think this is the wrong battlefield. IIRC (valid) 
JSON can only be in UTF-8,16,32 (with BE/LE variants) and all of those are 
detectable rather easily. The only thing this limitation is likely to bring is 
pain when dealing with resources outside one's control.

-- 
Robin Berjon - http://berjon.com/ - @robinberjon




Re: [XHR] responseType json

2011-12-02 Thread Julian Reschke

On 2011-12-02 14:41, Robin Berjon wrote:

On Dec 2, 2011, at 14:00 , Anne van Kesteren wrote:

I tied it to UTF-8 to further the fight on encoding proliferation and encourage 
developers to always use that encoding.


That's a good fight, but I think this is the wrong battlefield. IIRC (valid) 
JSON can only be in UTF-8,16,32 (with BE/LE variants) and all of those are 
detectable rather easily. The only thing this limitation is likely to bring is 
pain when dealing with resources outside one's control.


If there's agreement that UTF-8 should be mandated for JSON then this 
should apply to *all* of JSON, not only this use case.


Best regards, Julian




Re: [XHR] responseType json

2011-12-02 Thread Henri Sivonen
On Fri, Dec 2, 2011 at 3:41 PM, Robin Berjon ro...@berjon.com wrote:
 On Dec 2, 2011, at 14:00 , Anne van Kesteren wrote:
 I tied it to UTF-8 to further the fight on encoding proliferation and 
 encourage developers to always use that encoding.

 That's a good fight, but I think this is the wrong battlefield. IIRC (valid) 
 JSON can only be in UTF-8,16,32 (with BE/LE variants) and all of those are 
 detectable rather easily. The only thing this limitation is likely to bring 
 is pain when dealing with resources outside one's control.

Browsers don't support UTF-32. It has no use cases as an interchange
encoding beyond writing evil test cases. Defining it as a valid
encoding is reprehensible.

Does anyone actually transfer JSON as UTF-16?

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/